Microsoft’s 1.3 Billion Model Outperforms Llama 2

Share via:

Microsoft Research has done it once again. After outperforming Meta’s LLaMa with phi-1 in July, the researchers have now introduced phi-1.5, a cutting-edge language model of 1.3 billion parameters that outperforms Llama 2’s 7 billion parameters model on several benchmarks. Microsoft has decided to open source the model. 

The phi-1.5 model, comprising a staggering 1.3 billion parameters, has been meticulously crafted to excel in multiple domains, making it the go-to choice for a wide range of applications. It particularly shines when dealing with queries in the question-answering (QA) format, as well as in chat interactions and code-related tasks.

Click here to check out the open source model on Hugging Face

How far does one billion parameters take you? As it turns out, pretty far!!!

Today we’re releasing phi-1.5, a 1.3B parameter LLM exhibiting emergent behaviors surprisingly close to much larger LLMs.

For warm-up, see an example completion w. comparison to Falcon 7B & Llama2-7B pic.twitter.com/x5qZGPjoSZ

— Sebastien Bubeck (@SebastienBubeck) September 12, 2023

While phi-1 was trained on high-quality textbook data, phi-1.5 is trained on synthetic data only. This sets phi-1.5 apart is its comprehensive training regimen, encompassing a rich tapestry of data sources. The model’s learning journey draws from diverse data pools, including Python code snippets harvested from StackOverflow, code from competitive programming contests, synthetic Python textbooks, and exercises generated by the powerful gpt-3.5-turbo-0301. 

Click here to read the paper: Textbooks Are All You Need II: phi-1.5 technical report

Key Details of phi-1.5 Model:

Architecture: Transformer-based model with a focus on next-word prediction objectives.

Dataset Size: Trained on a vast corpus of 30 billion tokens.

Training Tokens: The model honed its skills on a staggering 150 billion tokens.

Precision: Utilises the fp16 precision standard.

GPUs: Harnesses the power of 32xA100-40G GPUs.

Training Time: Achieved its remarkable capabilities through 8 days of intensive training.

The brainpower behind phi-1.5, the Microsoft Research team, asserts that this model has achieved nearly state-of-the-art performance levels among models with less than 10 billion parameters. Rigorous benchmark tests evaluating common sense, language comprehension, and logical reasoning have positioned phi-1.5 as a formidable contender. 

Notably, phi-1.5 has outperformed Meta’s Llama-2 7b in the AGIEval score and has approached parity with llama-2 7b in the GPT4ALL’s Benchmark suite, as measured by the LM-Eval Harness.

The post Microsoft’s 1.3 Billion Model Outperforms Llama 2 appeared first on Analytics India Magazine.

Disclaimer

We strive to uphold the highest ethical standards in all of our reporting and coverage. We StartupNews.fyi want to be transparent with our readers about any potential conflicts of interest that may arise in our work. It’s possible that some of the investors we feature may have connections to other businesses, including competitors or companies we write about. However, we want to assure our readers that this will not have any impact on the integrity or impartiality of our reporting. We are committed to delivering accurate, unbiased news and information to our audience, and we will continue to uphold our ethics and principles in all of our work. Thank you for your trust and support.

Popular

More Like this

Microsoft’s 1.3 Billion Model Outperforms Llama 2

Microsoft Research has done it once again. After outperforming Meta’s LLaMa with phi-1 in July, the researchers have now introduced phi-1.5, a cutting-edge language model of 1.3 billion parameters that outperforms Llama 2’s 7 billion parameters model on several benchmarks. Microsoft has decided to open source the model. 

The phi-1.5 model, comprising a staggering 1.3 billion parameters, has been meticulously crafted to excel in multiple domains, making it the go-to choice for a wide range of applications. It particularly shines when dealing with queries in the question-answering (QA) format, as well as in chat interactions and code-related tasks.

Click here to check out the open source model on Hugging Face

How far does one billion parameters take you? As it turns out, pretty far!!!

Today we’re releasing phi-1.5, a 1.3B parameter LLM exhibiting emergent behaviors surprisingly close to much larger LLMs.

For warm-up, see an example completion w. comparison to Falcon 7B & Llama2-7B pic.twitter.com/x5qZGPjoSZ

— Sebastien Bubeck (@SebastienBubeck) September 12, 2023

While phi-1 was trained on high-quality textbook data, phi-1.5 is trained on synthetic data only. This sets phi-1.5 apart is its comprehensive training regimen, encompassing a rich tapestry of data sources. The model’s learning journey draws from diverse data pools, including Python code snippets harvested from StackOverflow, code from competitive programming contests, synthetic Python textbooks, and exercises generated by the powerful gpt-3.5-turbo-0301. 

Click here to read the paper: Textbooks Are All You Need II: phi-1.5 technical report

Key Details of phi-1.5 Model:

Architecture: Transformer-based model with a focus on next-word prediction objectives.

Dataset Size: Trained on a vast corpus of 30 billion tokens.

Training Tokens: The model honed its skills on a staggering 150 billion tokens.

Precision: Utilises the fp16 precision standard.

GPUs: Harnesses the power of 32xA100-40G GPUs.

Training Time: Achieved its remarkable capabilities through 8 days of intensive training.

The brainpower behind phi-1.5, the Microsoft Research team, asserts that this model has achieved nearly state-of-the-art performance levels among models with less than 10 billion parameters. Rigorous benchmark tests evaluating common sense, language comprehension, and logical reasoning have positioned phi-1.5 as a formidable contender. 

Notably, phi-1.5 has outperformed Meta’s Llama-2 7b in the AGIEval score and has approached parity with llama-2 7b in the GPT4ALL’s Benchmark suite, as measured by the LM-Eval Harness.

The post Microsoft’s 1.3 Billion Model Outperforms Llama 2 appeared first on Analytics India Magazine.

Disclaimer

We strive to uphold the highest ethical standards in all of our reporting and coverage. We StartupNews.fyi want to be transparent with our readers about any potential conflicts of interest that may arise in our work. It’s possible that some of the investors we feature may have connections to other businesses, including competitors or companies we write about. However, we want to assure our readers that this will not have any impact on the integrity or impartiality of our reporting. We are committed to delivering accurate, unbiased news and information to our audience, and we will continue to uphold our ethics and principles in all of our work. Thank you for your trust and support.

Website Upgradation is going on for any glitch kindly connect at office@startupnews.fyi

More like this

Tata Communications: Tata Communications’ overseas revenues continue to outgrow...

Tata Group’s telecom, cloud and media arm Tata...

Why a16z, Lightspeed bet on this Singapore parent-tech startup

K-ID aims to streamline game onboarding for parents...

$520B state-owned Italian bank trials digital bonds on Polygon

The trial was part of an initiative set...

Popular

Upcoming Events

Startup Information that matters. Get in your inbox Daily!