Llemma is Here, An Open Language Model For Mathematics

All News General

Researchers from EleutherAI have introduced Llemma, an open language model designed for mathematics, along with a Proof-Pile-2 dataset. This project, which is built with continuous pretraining of CodeLlama, has garnered significant attention in the academic and research community.

Check out the GitHub repository here.

Llemma stands out by offering both 7 billion and 34 billion parameter models, surpassing the capabilities of all other open base models, including Google’s Minerva, even at similar model scales. The achievement is particularly noteworthy as the 34-billion parameter Llemma model approaches the performance of Google’s Minerva, which boasts 62 billion parameters, despite having just half the parameters.

We release Llemma: open LMs for math trained on up to 200B tokens of mathematical text.

The performance of Llemma 34B approaches Google’s Minerva 62B despite having half the parameters.

Models/data/code: https://t.co/zFvKHrK7t3
Paper: https://t.co/gGgyFQX8sA

More pic.twitter.com/K7ZiG9n8BT

— Zhangir Azerbayev (@zhangir_azerbay) October 17, 2023

This new development from EleutherAI not only parallels Minerva, a closed model specially designed for mathematics by Google Research but also manages to exceed Minerva’s problem-solving capabilities on an equi-parameter basis. Notably, Llemma’s capabilities extend to a broader spectrum of tasks, including tool use and formal mathematics, which further distinguishes it in the realm of mathematical language modeling.

Zhangir Azerbayev, the lead author of the paper, describes that the journey toward creating Llemma began with the assembly of a vast dataset of mathematical tokens, encompassing the ArXiv subset of RedPajama, the recent OpenWebMath dataset, and the introduction of the AlgebraicStack, a code dataset tailored specifically for mathematics. This comprehensive approach resulted in training on an astounding 55 billion unique tokens.

Llemma’s models were initialized with Code Llama weights and subsequently trained across a network of 256 A100 GPUs on StabilityAI‘s Ezra cluster. The 7-billion model underwent extensive training, spanning 200 billion tokens and 23,000 A100 hours, while the 34-billion model received 50 billion tokens of training over 47,000 A100 hours.

In addition to its exceptional performance on chain-of-thought tasks when compared on an equal-parameter basis with Minerva, Llemma benefits from majority voting, providing an extra boost to its performance.

The collaborative effort of institutions such as Princeton University, EleutherAI, University of Toronto, Vector Institute, University of Cambridge, Carnegie Mellon University, and University of Washington has culminated in the creation of Llemma.

The post Llemma is Here, An Open Language Model For Mathematics appeared first on Analytics India Magazine.

Previous News

Zygon helps startups avoid data breaches from SaaS providers

Next News

Indonesia’s Julo dives into education loans as others exit the space

Disclaimer

Popular

Microsoft to Introduce Voice Reporting Feature for Xbox

Adobe teams up with India’s Education Ministry for creative learning initiative

Meta May Allow Instagram and Facebook Users in Europe to Pay to Avoid Ads

Indian fintechs amplify payments soundbox pitches to woo merchants

Fintech Unicorn Pine Labs Launches Mini — A QR-First Device With Card Support

More Like this

iPhone 16 teardown shows new simpler replaceable battery system

Yup, Jony Ive is working on an AI device startup with OpenAI

Ibotta’s CEO explains why startups shouldn’t try to time the IPO market

Diamond hands Ethereum holder makes $131.7M in 2 years

New Siri with Apple Intelligence might release sooner than we expected

Some startups are going ‘fair source’ to avoid the pitfalls of open source licensing

Llemma is Here, An Open Language Model For Mathematics

Disclaimer

More like this

iPhone 16 teardown shows new simpler replaceable battery system

Yup, Jony Ive is working on an AI device...

Ibotta’s CEO explains why startups shouldn’t try to time...

Popular

Bombay HC Strikes Down Centre’s Proposed Fact-Checking Unit

OpenAI o1 “Strawberry” Finally Available on GitHub Copilot Chat with VS Code Integration

The Tech Outage That Threw ChatGPT Out Of Gear

Apple releases new firmware version for AirPods Pro 2 and AirPods 4

Railways Developing A Super App: Ashwini Vaishnaw

Moneyboxx To Raise INR 176 Cr To Expand Its Lending Play

Wealthtech Centricity Bags $20 Mn To Build GenAI Modules

Upcoming Events

Fintech Revolution Summit | Jakarta | October 24

Earthcon Expo | Hyderabad | September 20-22

TAJURBA MSME GROWTH SUMMIT 3.0 | Mumbai | September 21

Startup Networking (GSC) | Bangalore | September 21

Startup Investors Forum (GSC) | Gurugram | September 21

StartupNews.fyi

StartupNews.fyi

Llemma is Here, An Open Language Model For Mathematics

Disclaimer

Popular

More Like this

Llemma is Here, An Open Language Model For Mathematics

Disclaimer

More like this

Popular

Upcoming Events

Newsletter Signup Form!

Newsletter Signup Form!