Microsoft’s 1.3 Billion Model Outperforms Llama 2

Share via:

Microsoft Research has done it once again. After outperforming Meta’s LLaMa with phi-1 in July, the researchers have now introduced phi-1.5, a cutting-edge language model of 1.3 billion parameters that outperforms Llama 2’s 7 billion parameters model on several benchmarks. Microsoft has decided to open source the model. 

The phi-1.5 model, comprising a staggering 1.3 billion parameters, has been meticulously crafted to excel in multiple domains, making it the go-to choice for a wide range of applications. It particularly shines when dealing with queries in the question-answering (QA) format, as well as in chat interactions and code-related tasks.

Click here to check out the open source model on Hugging Face

How far does one billion parameters take you? As it turns out, pretty far!!!

Today we’re releasing phi-1.5, a 1.3B parameter LLM exhibiting emergent behaviors surprisingly close to much larger LLMs.

For warm-up, see an example completion w. comparison to Falcon 7B & Llama2-7B pic.twitter.com/x5qZGPjoSZ

— Sebastien Bubeck (@SebastienBubeck) September 12, 2023

While phi-1 was trained on high-quality textbook data, phi-1.5 is trained on synthetic data only. This sets phi-1.5 apart is its comprehensive training regimen, encompassing a rich tapestry of data sources. The model’s learning journey draws from diverse data pools, including Python code snippets harvested from StackOverflow, code from competitive programming contests, synthetic Python textbooks, and exercises generated by the powerful gpt-3.5-turbo-0301. 

Click here to read the paper: Textbooks Are All You Need II: phi-1.5 technical report

Key Details of phi-1.5 Model:

Architecture: Transformer-based model with a focus on next-word prediction objectives.

Dataset Size: Trained on a vast corpus of 30 billion tokens.

Training Tokens: The model honed its skills on a staggering 150 billion tokens.

Precision: Utilises the fp16 precision standard.

GPUs: Harnesses the power of 32xA100-40G GPUs.

Training Time: Achieved its remarkable capabilities through 8 days of intensive training.

The brainpower behind phi-1.5, the Microsoft Research team, asserts that this model has achieved nearly state-of-the-art performance levels among models with less than 10 billion parameters. Rigorous benchmark tests evaluating common sense, language comprehension, and logical reasoning have positioned phi-1.5 as a formidable contender. 

Notably, phi-1.5 has outperformed Meta’s Llama-2 7b in the AGIEval score and has approached parity with llama-2 7b in the GPT4ALL’s Benchmark suite, as measured by the LM-Eval Harness.

The post Microsoft’s 1.3 Billion Model Outperforms Llama 2 appeared first on Analytics India Magazine.

Disclaimer

We strive to uphold the highest ethical standards in all of our reporting and coverage. We StartupNews.fyi want to be transparent with our readers about any potential conflicts of interest that may arise in our work. It’s possible that some of the investors we feature may have connections to other businesses, including competitors or companies we write about. However, we want to assure our readers that this will not have any impact on the integrity or impartiality of our reporting. We are committed to delivering accurate, unbiased news and information to our audience, and we will continue to uphold our ethics and principles in all of our work. Thank you for your trust and support.

Popular

More Like this

Microsoft’s 1.3 Billion Model Outperforms Llama 2

Microsoft Research has done it once again. After outperforming Meta’s LLaMa with phi-1 in July, the researchers have now introduced phi-1.5, a cutting-edge language model of 1.3 billion parameters that outperforms Llama 2’s 7 billion parameters model on several benchmarks. Microsoft has decided to open source the model. 

The phi-1.5 model, comprising a staggering 1.3 billion parameters, has been meticulously crafted to excel in multiple domains, making it the go-to choice for a wide range of applications. It particularly shines when dealing with queries in the question-answering (QA) format, as well as in chat interactions and code-related tasks.

Click here to check out the open source model on Hugging Face

How far does one billion parameters take you? As it turns out, pretty far!!!

Today we’re releasing phi-1.5, a 1.3B parameter LLM exhibiting emergent behaviors surprisingly close to much larger LLMs.

For warm-up, see an example completion w. comparison to Falcon 7B & Llama2-7B pic.twitter.com/x5qZGPjoSZ

— Sebastien Bubeck (@SebastienBubeck) September 12, 2023

While phi-1 was trained on high-quality textbook data, phi-1.5 is trained on synthetic data only. This sets phi-1.5 apart is its comprehensive training regimen, encompassing a rich tapestry of data sources. The model’s learning journey draws from diverse data pools, including Python code snippets harvested from StackOverflow, code from competitive programming contests, synthetic Python textbooks, and exercises generated by the powerful gpt-3.5-turbo-0301. 

Click here to read the paper: Textbooks Are All You Need II: phi-1.5 technical report

Key Details of phi-1.5 Model:

Architecture: Transformer-based model with a focus on next-word prediction objectives.

Dataset Size: Trained on a vast corpus of 30 billion tokens.

Training Tokens: The model honed its skills on a staggering 150 billion tokens.

Precision: Utilises the fp16 precision standard.

GPUs: Harnesses the power of 32xA100-40G GPUs.

Training Time: Achieved its remarkable capabilities through 8 days of intensive training.

The brainpower behind phi-1.5, the Microsoft Research team, asserts that this model has achieved nearly state-of-the-art performance levels among models with less than 10 billion parameters. Rigorous benchmark tests evaluating common sense, language comprehension, and logical reasoning have positioned phi-1.5 as a formidable contender. 

Notably, phi-1.5 has outperformed Meta’s Llama-2 7b in the AGIEval score and has approached parity with llama-2 7b in the GPT4ALL’s Benchmark suite, as measured by the LM-Eval Harness.

The post Microsoft’s 1.3 Billion Model Outperforms Llama 2 appeared first on Analytics India Magazine.

Disclaimer

We strive to uphold the highest ethical standards in all of our reporting and coverage. We StartupNews.fyi want to be transparent with our readers about any potential conflicts of interest that may arise in our work. It’s possible that some of the investors we feature may have connections to other businesses, including competitors or companies we write about. However, we want to assure our readers that this will not have any impact on the integrity or impartiality of our reporting. We are committed to delivering accurate, unbiased news and information to our audience, and we will continue to uphold our ethics and principles in all of our work. Thank you for your trust and support.

Website Upgradation is going on for any glitch kindly connect at office@startupnews.fyi

More like this

Indian edtech unicorn Vedantu cuts loss by 58%

The loss cut was supported by a 21%...

Apple’s AirPort router likely won’t be coming back, but...

According to Mark Gurman’s Power On newsletter, Apple...

Epigamia Cofounder Rohan Mirchandani Passes Away

SUMMARY Rohan Mirchandani, who was 41 years old, passed...

Popular

Upcoming Events

Startup Information that matters. Get in your inbox Daily!