Meta says Llama 3 beats most other models, including Gemini

Llama 3 currently features two model weights, with 8B and 70B parameters. (The B is for billions and represents how complex a model is and how much of its training it understands.) It only offers text-based responses so far, but Meta says these are “a major leap” over the previous version. Llama 3 showed more diversity in answering prompts, had fewer false refusals where it declined to respond to questions, and could reason better. Meta also says Llama 3 understands more instructions and writes better code than before.

In the post,Meta claims both sizes of Llama 3 beat similarly sized models like Google’s Gemma and Gemini, Mistral 7B, and Anthropic’s Claude 3 in certain benchmarking tests. In the MMLU benchmark, which typically measures general knowledge, Llama 3 8B performed significantly better than both Gemma 7B and Mistral 7B, while Llama 3 70B slightly edged Gemini Pro 1.5.

(It is perhaps notable that Meta’s 2,700-word post does not mention GPT-4, OpenAI’s flagship model.)

It should also be noted that benchmark testing AI models, though helpful in understanding just how powerful they are, is imperfect. The datasets used to benchmark models have been found to be part of a model’s training, meaning the model already knows the answers to the questions evaluations will ask it.

a]:shadow-underline-gray”>Screenshot: Emilia David / The Verge

Meta sayshuman evaluators also marked Llama 3 higher than other models, including OpenAI’s GPT-3.5. Meta says it created a new dataset for human evaluators to emulate real-world scenarios where Llama 3 might be used. This dataset included use cases like asking for advice, summarization, and creative writing. The company says the team that worked on the model did not have access to this new evaluation data, and it did not influence the model’s performance.

“This evaluation set contains 1,800 prompts that cover 12 key use cases: asking for advice, brainstorming, classification, closed question answering, coding, creative writing, extraction, inhabiting a character/persona, open question answering, reasoning, rewriting, and summarization,” Meta says in its blog post.

a]:shadow-underline-gray”>Screenshot: Emilia David/ The Verge

Llama 3 is expected to get larger model sizes (which can understand longer strings of instructions and data) and be capable of more multimodal responses like, “Generate an image” or “Transcribe an audio file.” Meta says these larger versions, which are over 400B parameters and can ideally learn more complex patterns than the smaller versions of the model, are currently training, but initial performance testing shows these models can answer many of the questions posed by benchmarking.

Meta did not release a preview of these larger models, though, and did not compare them to other big models like GPT-4.

Source link

Previous News

April 18, 2024 – M4 Macs timeline, iPhone 16 camera rumors

Next News

India Losing $2.5 Bn In GST Due To Offshore Platforms: AIGF

Meta says Llama 3 beats most other models, including Gemini

Disclaimer

Popular

Lava Bold N2 Pro 5G Launched in India: Price and Specifications

Ollama adopts MLX for faster AI performance on Apple silicon

Delhi to Host WebFair 25th Edition, Bringing Together India’s Growing Digital Business Community

North Korea-linked hack hits largely invisible software that powers online services

Passengers stranded in moving traffic after robotaxi outage in Chinas Wuhan

More Like this

Android Auto left behind as OpenAI brings ChatGPT to Apple CarPlay: here’s how it works

How To Do Evergreen Content In 2026 (And Beyond)

The New Era of Militia Influencers

See Nat Geo’s Behind-the-Scenes Footage Inside the Artemis II Orion Spacecraft

Geekbench Claims Intel Tool Boosts Benchmark Scores by Tweaking Test Code

Save $100 on the gaming powerhouse AMD Ryzen 5 7600X3D, now $246 on Amazon — budget-friendly X3D processor with 96MB cache, low power draw,...

Meta says Llama 3 beats most other models, including Gemini

Disclaimer

More like this

Android Auto left behind as OpenAI brings ChatGPT to...

How To Do Evergreen Content In 2026 (And Beyond)

The New Era of Militia Influencers

Popular

Block title

Intel Arrow Lake Refresh CPU prices shot up to 17% above MSRP just 48...

Older Galaxy phones are now getting AirDrop support, but don’t celebrate yet

Hope, Hype or Horror? ‘The AI Doc’ Director Charlie Tyrell Questions What Comes Next

The Pentagon’s culture war tactic against Anthropic has backfired

DeepSeek outage: Chinese AI startup’s near-perfect record broken by massive seven-hour global outage

Mint Explainer | West Asia subsea cable threats: What could happen to India’s internet...

12-Nozzle 3D printer unveiled — MOVA AtomForm Unveils Palette 300

Startup Events

Trending News

Android Auto left behind as OpenAI brings ChatGPT to Apple CarPlay: here’s how it works

How To Do Evergreen Content In 2026 (And Beyond)

The New Era of Militia Influencers

See Nat Geo’s Behind-the-Scenes Footage Inside the Artemis II Orion Spacecraft

Geekbench Claims Intel Tool Boosts Benchmark Scores by Tweaking Test Code

About

Partnership

Contact us