Meta says Llama 3 beats most other models, including Gemini

Llama 3 currently features two model weights, with 8B and 70B parameters. (The B is for billions and represents how complex a model is and how much of its training it understands.) It only offers text-based responses so far, but Meta says these are “a major leap” over the previous version. Llama 3 showed more diversity in answering prompts, had fewer false refusals where it declined to respond to questions, and could reason better. Meta also says Llama 3 understands more instructions and writes better code than before.

In the post, Meta claims both sizes of Llama 3 beat similarly sized models like Google’s Gemma and Gemini, Mistral 7B, and Anthropic’s Claude 3 in certain benchmarking tests. In the MMLU benchmark, which typically measures general knowledge, Llama 3 8B performed significantly better than both Gemma 7B and Mistral 7B, while Llama 3 70B slightly edged Gemini Pro 1.5.

(It is perhaps notable that Meta’s 2,700-word post does not mention GPT-4, OpenAI’s flagship model.)

It should also be noted that benchmark testing AI models, though helpful in understanding just how powerful they are, is imperfect. The datasets used to benchmark models have been found to be part of a model’s training, meaning the model already knows the answers to the questions evaluations will ask it.

Benchmark testing shows both sizes of Llama 3 outperforming similarly sized language models.

Screenshot: Emilia David / The Verge

Meta says human evaluators also marked Llama 3 higher than other models, including OpenAI’s GPT-3.5. Meta says it created a new dataset for human evaluators to emulate real-world scenarios where Llama 3 might be used. This dataset included use cases like asking for advice, summarization, and creative writing. The company says the team that worked on the model did not have access to this new evaluation data, and it did not influence the model’s performance.

“This evaluation set contains 1,800 prompts that cover 12 key use cases: asking for advice, brainstorming, classification, closed question answering, coding, creative writing, extraction, inhabiting a character/persona, open question answering, reasoning, rewriting, and summarization,” Meta says in its blog post.

Llama 3 performed better than most models in human evaluations, says Meta.

Screenshot: Emilia David/ The Verge

Llama 3 is expected to get larger model sizes (which can understand longer strings of instructions and data) and be capable of more multimodal responses like, “Generate an image” or “Transcribe an audio file.” Meta says these larger versions, which are over 400B parameters and can ideally learn more complex patterns than the smaller versions of the model, are currently training, but initial performance testing shows these models can answer many of the questions posed by benchmarking.

Meta did not release a preview of these larger models, though, and did not compare them to other big models like GPT-4.

Source link

Previous News

April 18, 2024 – M4 Macs timeline, iPhone 16 camera rumors

Next News

India Losing $2.5 Bn In GST Due To Offshore Platforms: AIGF

Disclaimer

Popular

Microsoft to Introduce Voice Reporting Feature for Xbox

Adobe teams up with India’s Education Ministry for creative learning initiative

Meta May Allow Instagram and Facebook Users in Europe to Pay to Avoid Ads

Indian fintechs amplify payments soundbox pitches to woo merchants

Fintech Unicorn Pine Labs Launches Mini — A QR-First Device With Card Support

More Like this

BingX confirms the resumption of withdrawal services following hack

Totallee launches super thin cases for iPhone 16, iPhone 16 Plus, 16 Pro, and iPhone 16 Pro Max

Elon Musk threatened with SEC sanctions for failing to appear in court

Grok’s image generator, Black Forest Labs, is raising $100M at a $1B valuation, say sources

Adam Neumann’s startup Flow opens co-living community in Saudi Arabia

Musk dodged Brazil’s X ban by ‘coincidence,’ says Cloudflare CEO

Meta says Llama 3 beats most other models, including Gemini

Disclaimer

More like this

BingX confirms the resumption of withdrawal services following hack

Totallee launches super thin cases for iPhone 16, iPhone...

Elon Musk threatened with SEC sanctions for failing to...

Popular

The Tech Outage That Threw ChatGPT Out Of Gear

Apple releases new firmware version for AirPods Pro 2 and AirPods 4

Railways Developing A Super App: Ashwini Vaishnaw

Moneyboxx To Raise INR 176 Cr To Expand Its Lending Play

Wealthtech Centricity Bags $20 Mn To Build GenAI Modules

MCA Exempts Startups Looking To Reverse Flip From NCLT Nod

iPhone users can stay on iOS 17 and get security patches

Upcoming Events

Fintech Revolution Summit | Jakarta | October 24

Token 2049 | Singapore | Sept 18-19

Startup Meetup (RTF) | Gurugram | September 20

Future Mobility Summit | New Delhi | September 20

Earthcon Expo | Hyderabad | September 20-22

StartupNews.fyi

StartupNews.fyi

Meta says Llama 3 beats most other models, including Gemini

Disclaimer

Popular

More Like this

Meta says Llama 3 beats most other models, including Gemini

Disclaimer

More like this

Popular

Upcoming Events

Newsletter Signup Form!

Newsletter Signup Form!