Salesforce Introduces New Family of Multimodal Action Models Named TACO

Share via:


Salesforce Introduces New Family of Multimodal Action Models Named TACO
Salesforce AI Research has introduced TACO, a family of multimodal large action models designed to improve performance on complex, multi-step problems that require multiple reasoning across various data types, such as images, text, and calculations. “We present TACO, a family of multi-modal large action models designed to improve performance on complex questions that require multiple capabilities and demand multi-step solutions,” Salesforce said in a blog post on January 16, 2025.

Also Read: Meta Expands Access to Llama AI Models for US Government Use

Overcoming Limitations of Current AI Systems

According to the company, TACO tackles a significant limitation of current AI systems (open-source multi-modal models), which struggle to solve realistic complex problems in a step-by-step manner. For instance, when posed with a question like “How much gas can I buy with $50?” from a photo of a gas station sign, TACO can identify price information, extract the text using OCR, and perform the necessary calculations. This capability is powered by chains-of-thought-and-action (CoTA), where the model generates both reasoning and actionable steps to arrive at the correct answer.

“To answer such questions, TACO produces chains-of-thought-and-action (CoTA), executes intermediate steps by invoking external tools such as OCR, depth estimation and calculator, then integrates both the thoughts and action outputs to produce coherent responses,” the company explained.

Also Read: Meta Unveils New AI Models and Tools to Drive Innovation

Training TACO

To train TACO, Salesforce said it created over 1 million synthetic CoTA traces through model-based and programmatic generation methods. These steps help the model learn to perform complex reasoning and execute external actions such as text recognition and mathematical operations.

Salesforce claims that TACO achieved 30-50 percent higher performance compared to models using traditional direct answers. It also outperformed baseline models by up to 20 percent on the MMVet benchmark.

Also Read: Microsoft, Dell, Google and Others Launch Initiatives to Propel AI Infrastructure and Innovation

Future Applications

With this framework, Salesforce AI hopes to pave the way for new multimodal models that can be applied across various domains, such as medical question answering and web navigation.

“With our framework, future works can train new models with different actions for other applications such as web navigation or for other domains such as medical question answering,” Salesforce said.





Source link

Disclaimer

We strive to uphold the highest ethical standards in all of our reporting and coverage. We StartupNews.fyi want to be transparent with our readers about any potential conflicts of interest that may arise in our work. It’s possible that some of the investors we feature may have connections to other businesses, including competitors or companies we write about. However, we want to assure our readers that this will not have any impact on the integrity or impartiality of our reporting. We are committed to delivering accurate, unbiased news and information to our audience, and we will continue to uphold our ethics and principles in all of our work. Thank you for your trust and support.

admin
admin
Hi! This is Admin.

Popular

More Like this

Salesforce Introduces New Family of Multimodal Action Models Named TACO


Salesforce Introduces New Family of Multimodal Action Models Named TACO
Salesforce AI Research has introduced TACO, a family of multimodal large action models designed to improve performance on complex, multi-step problems that require multiple reasoning across various data types, such as images, text, and calculations. “We present TACO, a family of multi-modal large action models designed to improve performance on complex questions that require multiple capabilities and demand multi-step solutions,” Salesforce said in a blog post on January 16, 2025.

Also Read: Meta Expands Access to Llama AI Models for US Government Use

Overcoming Limitations of Current AI Systems

According to the company, TACO tackles a significant limitation of current AI systems (open-source multi-modal models), which struggle to solve realistic complex problems in a step-by-step manner. For instance, when posed with a question like “How much gas can I buy with $50?” from a photo of a gas station sign, TACO can identify price information, extract the text using OCR, and perform the necessary calculations. This capability is powered by chains-of-thought-and-action (CoTA), where the model generates both reasoning and actionable steps to arrive at the correct answer.

“To answer such questions, TACO produces chains-of-thought-and-action (CoTA), executes intermediate steps by invoking external tools such as OCR, depth estimation and calculator, then integrates both the thoughts and action outputs to produce coherent responses,” the company explained.

Also Read: Meta Unveils New AI Models and Tools to Drive Innovation

Training TACO

To train TACO, Salesforce said it created over 1 million synthetic CoTA traces through model-based and programmatic generation methods. These steps help the model learn to perform complex reasoning and execute external actions such as text recognition and mathematical operations.

Salesforce claims that TACO achieved 30-50 percent higher performance compared to models using traditional direct answers. It also outperformed baseline models by up to 20 percent on the MMVet benchmark.

Also Read: Microsoft, Dell, Google and Others Launch Initiatives to Propel AI Infrastructure and Innovation

Future Applications

With this framework, Salesforce AI hopes to pave the way for new multimodal models that can be applied across various domains, such as medical question answering and web navigation.

“With our framework, future works can train new models with different actions for other applications such as web navigation or for other domains such as medical question answering,” Salesforce said.





Source link

Disclaimer

We strive to uphold the highest ethical standards in all of our reporting and coverage. We StartupNews.fyi want to be transparent with our readers about any potential conflicts of interest that may arise in our work. It’s possible that some of the investors we feature may have connections to other businesses, including competitors or companies we write about. However, we want to assure our readers that this will not have any impact on the integrity or impartiality of our reporting. We are committed to delivering accurate, unbiased news and information to our audience, and we will continue to uphold our ethics and principles in all of our work. Thank you for your trust and support.

Website Upgradation is going on for any glitch kindly connect at office@startupnews.fyi

admin
admin
Hi! This is Admin.

More like this

Bharti Airtel Lands 2Africa Pearls Subsea Cable in India

Indian telecom service provider Bharti Airtel has landed...

Akamai Picks Up Hosting for Kernel.org

Akamai is making a series of announcements today...

Study: AI Turns Evil After Training on Insecure Code

What happens when you fine-tune a large language...

Popular

Upcoming Events

Instagram adds TikTok-like option to fast-forward Reels videos

Meta on Thursday announced a new feature coming...

Apple’s first iOS 19 tease gives off a vibe...

This week brought the official announcement of WWDC,...

Lock in $300+ savings for investor and founder tickets...

Time’s running out — 4 days left to...
afg afg afg afg afg afg ADGF ADGF ADGF ADGF ADGF ADGF ADGF ERQW DAS VBXZC ERQW DAS ERQW DAS VBXZC ERQW DAS ERQW DAS VBXZC ERQW DAS ERQW DAS VBXZC ERQW DAS ERQW DAS VBXZC ERQW DAS ERQW DAS VBXZC ERQW DAS hack instagram account hack instagram account hack instagram account hack instagram account hack instagram account hack instagram account hack instagram account hack instagram account hack instagram account hack instagram account