Researchers Experiment with Flamingo & Dall-E; the Results Will Surprise You

Share via:

While text-based AI models have been found coordinating amongst themselves and developing a language of their own, communication between image-based models remained an unexplored territory, until now. A group of researchers set out to find how well Google Deepmind’s Flamingo and OpenAI’s Dall-E understand each other — their synergy is impressive.

Despite the closeness of the image captioning and text-to-image generation tasks, they are often studied in isolation from each other, i.e the information exchange between these models remains a question someone never looked for an answer to. Researchers from LMU Munich, Siemens AG, and the University of Oxford wrote a paper titled, ‘Do Flamingo and DALL-E Understand Each Other?‘ investigating the communication between image captioning and text-to-image models. 

The team proposes a reconstruction task where Flamingo generates a description for a given image and DALL-E uses this description as input to synthesise a new image. They argue that these models understand each other if the generated image is similar to the given image. Specifically, they studied the relationship between the quality of the image reconstruction and that of the text generation. As a result, they found that a better caption is the one that leads to better visuals and vice-versa.

In the recent past, strides have been made in multimodal models — AI systems designed to process multiple forms of sensory input at the same time. Models trained solely on text data inherently face limitations when it comes to common sense. While expanding the training dataset helps to a certain degree, these models may still have unexpected knowledge gaps. Multimodality comes into the picture as a saviour here. Multimodal models have demonstrated improved reasoning abilities compared to their single-sense counterparts. It is worth noting, however, that symbolic logic, an approach dominating decades, yielded minimal progress during the time period.

The post Researchers Experiment with Flamingo & Dall-E; the Results Will Surprise You appeared first on Analytics India Magazine.

Disclaimer

We strive to uphold the highest ethical standards in all of our reporting and coverage. We StartupNews.fyi want to be transparent with our readers about any potential conflicts of interest that may arise in our work. It’s possible that some of the investors we feature may have connections to other businesses, including competitors or companies we write about. However, we want to assure our readers that this will not have any impact on the integrity or impartiality of our reporting. We are committed to delivering accurate, unbiased news and information to our audience, and we will continue to uphold our ethics and principles in all of our work. Thank you for your trust and support.

Popular

More Like this

Researchers Experiment with Flamingo & Dall-E; the Results Will Surprise You

While text-based AI models have been found coordinating amongst themselves and developing a language of their own, communication between image-based models remained an unexplored territory, until now. A group of researchers set out to find how well Google Deepmind’s Flamingo and OpenAI’s Dall-E understand each other — their synergy is impressive.

Despite the closeness of the image captioning and text-to-image generation tasks, they are often studied in isolation from each other, i.e the information exchange between these models remains a question someone never looked for an answer to. Researchers from LMU Munich, Siemens AG, and the University of Oxford wrote a paper titled, ‘Do Flamingo and DALL-E Understand Each Other?‘ investigating the communication between image captioning and text-to-image models. 

The team proposes a reconstruction task where Flamingo generates a description for a given image and DALL-E uses this description as input to synthesise a new image. They argue that these models understand each other if the generated image is similar to the given image. Specifically, they studied the relationship between the quality of the image reconstruction and that of the text generation. As a result, they found that a better caption is the one that leads to better visuals and vice-versa.

In the recent past, strides have been made in multimodal models — AI systems designed to process multiple forms of sensory input at the same time. Models trained solely on text data inherently face limitations when it comes to common sense. While expanding the training dataset helps to a certain degree, these models may still have unexpected knowledge gaps. Multimodality comes into the picture as a saviour here. Multimodal models have demonstrated improved reasoning abilities compared to their single-sense counterparts. It is worth noting, however, that symbolic logic, an approach dominating decades, yielded minimal progress during the time period.

The post Researchers Experiment with Flamingo & Dall-E; the Results Will Surprise You appeared first on Analytics India Magazine.

Disclaimer

We strive to uphold the highest ethical standards in all of our reporting and coverage. We StartupNews.fyi want to be transparent with our readers about any potential conflicts of interest that may arise in our work. It’s possible that some of the investors we feature may have connections to other businesses, including competitors or companies we write about. However, we want to assure our readers that this will not have any impact on the integrity or impartiality of our reporting. We are committed to delivering accurate, unbiased news and information to our audience, and we will continue to uphold our ethics and principles in all of our work. Thank you for your trust and support.

Website Upgradation is going on for any glitch kindly connect at office@startupnews.fyi

More like this

A year after the end of Apple leather iPhone...

I’m fairly firmly convinced that people who don’t...

Zepto Eyes $300 Mn Funding As Quick Commerce Battle...

SUMMARY Zepto is reportedly looking to raise $300 Mn...

Decoding India’s $70 Bn+ SaaS Startup Opportunity

India’s digital landscape is on the cusp of...

Popular

Upcoming Events

Startup Information that matters. Get in your inbox Daily!