OLMoE Achieves State-Of-The-Art Performance using Fewer Resources and MoE

Share via:



OLMoE Achieves State-Of-The-Art Performance using Fewer Resources and MoE

A team of researchers from the Allen Institute for AI, Contextual AI, and the University of Washington have released OLMoE (Open Mixture-of-Experts Language Models), a new open-source LLM that achieves state-of-the-art performance while using significantly fewer computational resources than comparable models.

OLMoE utilizes a Mixture-of-Experts (MoE) architecture, allowing it to have 7 billion total parameters but only activate 1.3 billion for each input. This enables OLMOE to match or exceed the performance of much larger models like Llama2-13B while using far less compute power during inference.

Thanks to Mixture-of-Experts, better data & hyperparams, OLMoE is much more efficient than OLMo 7B as it uses 4x less training FLOPs and 5x less parameters were used per forward pass for cheaper training and cheaper inference.

Importantly, the researchers have open-sourced not just the model weights, but also the training data, code, and logs. This level of transparency is rare for high-performing language models and will allow other researchers to build upon and improve OLMOE.

For example, on the MMLU benchmark, OLMOE-1B-7B achieves a score of 54.1%, surpassing models like OLMo-7B (54.9%) and Llama2-7B (46.2%) despite using significantly fewer active parameters. After instruction tuning, OLMOE-1B-7B-INSTRUCT even outperforms larger models like Llama2-13B-Chat on benchmarks such as AlpacaEval.

OLMoE compared to other models

This demonstrates the effectiveness of OLMOE’s Mixture-of-Experts architecture in achieving high performance with lower computational requirements. 

Additionally, OLMOE-1B-7B stands out for its full open-source release, including model weights, training data, code, and logs, making it a valuable resource for researchers and developers looking to build upon and improve state-of-the-art language models.

MoE is a preferred choice when you don’t have enough resources to build your own model from scratch and merge multiple small models of different expertise to have one single model that does it all without much cost and training.

The post OLMoE Achieves State-Of-The-Art Performance using Fewer Resources and MoE appeared first on AIM.



Source link

Disclaimer

We strive to uphold the highest ethical standards in all of our reporting and coverage. We StartupNews.fyi want to be transparent with our readers about any potential conflicts of interest that may arise in our work. It’s possible that some of the investors we feature may have connections to other businesses, including competitors or companies we write about. However, we want to assure our readers that this will not have any impact on the integrity or impartiality of our reporting. We are committed to delivering accurate, unbiased news and information to our audience, and we will continue to uphold our ethics and principles in all of our work. Thank you for your trust and support.

Popular

More Like this

OLMoE Achieves State-Of-The-Art Performance using Fewer Resources and MoE



OLMoE Achieves State-Of-The-Art Performance using Fewer Resources and MoE

A team of researchers from the Allen Institute for AI, Contextual AI, and the University of Washington have released OLMoE (Open Mixture-of-Experts Language Models), a new open-source LLM that achieves state-of-the-art performance while using significantly fewer computational resources than comparable models.

OLMoE utilizes a Mixture-of-Experts (MoE) architecture, allowing it to have 7 billion total parameters but only activate 1.3 billion for each input. This enables OLMOE to match or exceed the performance of much larger models like Llama2-13B while using far less compute power during inference.

Thanks to Mixture-of-Experts, better data & hyperparams, OLMoE is much more efficient than OLMo 7B as it uses 4x less training FLOPs and 5x less parameters were used per forward pass for cheaper training and cheaper inference.

Importantly, the researchers have open-sourced not just the model weights, but also the training data, code, and logs. This level of transparency is rare for high-performing language models and will allow other researchers to build upon and improve OLMOE.

For example, on the MMLU benchmark, OLMOE-1B-7B achieves a score of 54.1%, surpassing models like OLMo-7B (54.9%) and Llama2-7B (46.2%) despite using significantly fewer active parameters. After instruction tuning, OLMOE-1B-7B-INSTRUCT even outperforms larger models like Llama2-13B-Chat on benchmarks such as AlpacaEval.

OLMoE compared to other models

This demonstrates the effectiveness of OLMOE’s Mixture-of-Experts architecture in achieving high performance with lower computational requirements. 

Additionally, OLMOE-1B-7B stands out for its full open-source release, including model weights, training data, code, and logs, making it a valuable resource for researchers and developers looking to build upon and improve state-of-the-art language models.

MoE is a preferred choice when you don’t have enough resources to build your own model from scratch and merge multiple small models of different expertise to have one single model that does it all without much cost and training.

The post OLMoE Achieves State-Of-The-Art Performance using Fewer Resources and MoE appeared first on AIM.



Source link

Disclaimer

We strive to uphold the highest ethical standards in all of our reporting and coverage. We StartupNews.fyi want to be transparent with our readers about any potential conflicts of interest that may arise in our work. It’s possible that some of the investors we feature may have connections to other businesses, including competitors or companies we write about. However, we want to assure our readers that this will not have any impact on the integrity or impartiality of our reporting. We are committed to delivering accurate, unbiased news and information to our audience, and we will continue to uphold our ethics and principles in all of our work. Thank you for your trust and support.

Website Upgradation is going on for any glitch kindly connect at office@startupnews.fyi

More like this

The iPhone 16 launches today, without its most hyped...

The iPhone 16 officially goes on sale Friday....

Capital A Launches INR 400 Cr Fund II, Eyes...

SUMMARY The fund will focus on startups in sectors...

M&As and AI are in the spotlight, but there’s...

Welcome to Startups Weekly — your weekly recap...

Popular

Upcoming Events

Startup Information that matters. Get in your inbox Daily!