Microsoft’s 1.3 Billion Model Outperforms Llama 2

All News Tech

Microsoft Research has done it once again. After outperforming Meta’s LLaMa with phi-1 in July, the researchers have now introduced phi-1.5, a cutting-edge language model of 1.3 billion parameters that outperforms Llama 2’s 7 billion parameters model on several benchmarks. Microsoft has decided to open source the model.

The phi-1.5 model, comprising a staggering 1.3 billion parameters, has been meticulously crafted to excel in multiple domains, making it the go-to choice for a wide range of applications. It particularly shines when dealing with queries in the question-answering (QA) format, as well as in chat interactions and code-related tasks.

Click here to check out the open source model on Hugging Face.

How far does one billion parameters take you? As it turns out, pretty far!!!

Today we’re releasing phi-1.5, a 1.3B parameter LLM exhibiting emergent behaviors surprisingly close to much larger LLMs.

For warm-up, see an example completion w. comparison to Falcon 7B & Llama2-7B pic.twitter.com/x5qZGPjoSZ

— Sebastien Bubeck (@SebastienBubeck) September 12, 2023

While phi-1 was trained on high-quality textbook data, phi-1.5 is trained on synthetic data only. This sets phi-1.5 apart is its comprehensive training regimen, encompassing a rich tapestry of data sources. The model’s learning journey draws from diverse data pools, including Python code snippets harvested from StackOverflow, code from competitive programming contests, synthetic Python textbooks, and exercises generated by the powerful gpt-3.5-turbo-0301.

Click here to read the paper: Textbooks Are All You Need II: phi-1.5 technical report

Key Details of phi-1.5 Model:

Architecture: Transformer-based model with a focus on next-word prediction objectives.

Dataset Size: Trained on a vast corpus of 30 billion tokens.

Training Tokens: The model honed its skills on a staggering 150 billion tokens.

Precision: Utilises the fp16 precision standard.

GPUs: Harnesses the power of 32xA100-40G GPUs.

Training Time: Achieved its remarkable capabilities through 8 days of intensive training.

The brainpower behind phi-1.5, the Microsoft Research team, asserts that this model has achieved nearly state-of-the-art performance levels among models with less than 10 billion parameters. Rigorous benchmark tests evaluating common sense, language comprehension, and logical reasoning have positioned phi-1.5 as a formidable contender.

Notably, phi-1.5 has outperformed Meta’s Llama-2 7b in the AGIEval score and has approached parity with llama-2 7b in the GPT4ALL’s Benchmark suite, as measured by the LM-Eval Harness.

The post Microsoft’s 1.3 Billion Model Outperforms Llama 2 appeared first on Analytics India Magazine.