China Open Sources DeepSeek LLM, Outperforms Llama 2 and Claude-2

DeepSeek, a company based in China which aims to “unravel the mystery of AGI with curiosity,” has released DeepSeek LLM, a 67 billion parameter model trained meticulously from scratch on a dataset consisting of 2 trillion tokens.

Available in both English and Chinese languages, the LLM aims to foster research and innovation. The research community is granted access to the open-source versions, DeepSeek LLM 7B/67B Base and DeepSeek LLM 7B/67B Chat.

Check out the GitHub repository here.

The model is available under the MIT licence.

DeepSeek LLM 67B Base has showcased unparalleled capabilities, outperforming the Llama 2 70B Base in key areas such as reasoning, coding, mathematics, and Chinese comprehension. Particularly, its proficiency in coding is highlighted by an outstanding HumanEval Pass@1 score of 73.78, and in mathematics, it achieves remarkable scores, including GSM8K 0-shot: 84.1 and Math 0-shot: 32.6.

The model’s generalisation abilities are underscored by an exceptional score of 65 on the challenging Hungarian National High School Exam.

DeepSeek LLM 7B/67B models, including base and chat versions, are released to the public on GitHub, Hugging Face and also AWS S3. Access to intermediate checkpoints during the base model’s training process is provided, with usage subject to the outlined licence terms.

Performance Highlights

DeepSeek LLM 67B Base surpasses Llama2 70B Base in general capabilities.
DeepSeek LLM 67B Chat performs exceptionally well in coding, mathematics, and reasoning. pic.twitter.com/Y9uVNYAq2l

— DeepSeek (@deepseek_ai) November 29, 2023

In-depth evaluations have been conducted on the base and chat models, comparing them to existing benchmarks. Results reveal DeepSeek LLM’s supremacy over LLaMA-2, GPT-3.5, and Claude-2 in various metrics, showcasing its prowess in English and Chinese languages.

The evaluation extends to never-before-seen exams, including the Hungarian National High School Exam, where DeepSeek LLM 67B Chat exhibits outstanding performance.

Experimentation with multi-choice questions has proven to enhance benchmark performance, particularly in Chinese multiple-choice benchmarks. By incorporating 20 million Chinese multiple-choice questions, DeepSeek LLM 7B Chat demonstrates improved scores in MMLU, C-Eval, and CMMLU.

DeepSeek LLM’s pre-training involved a vast dataset, meticulously curated to ensure richness and variety. The architecture, akin to LLaMA, employs auto-regressive transformer decoder models with unique attention mechanisms. The pre-training process, with specific details on training loss curves and benchmark metrics, is released to the public, emphasising transparency and accessibility.

Recently, Alibaba, the chinese tech giant also unveiled its own LLM called Qwen-72B, which has been trained on high-quality data consisting of 3T tokens and also an expanded context window length of 32K. Not just that, the company also added a smaller language model, Qwen-1.8B, touting it as a gift to the research community.

Disclaimer

We strive to uphold the highest ethical standards in all of our reporting and coverage. We StartupNews.fyi want to be transparent with our readers about any potential conflicts of interest that may arise in our work. It’s possible that some of the investors we feature may have connections to other businesses, including competitors or companies we write about. However, we want to assure our readers that this will not have any impact on the integrity or impartiality of our reporting. We are committed to delivering accurate, unbiased news and information to our audience, and we will continue to uphold our ethics and principles in all of our work. Thank you for your trust and support.

Website Upgradation is going on for any glitch kindly connect at office@startupnews.fyi

Previous News

Google Messages getting new features as RCS users cross 1 billion mark

Next News

Zyla Health Bags $4 Mn To Boost Personalised Offerings For Chronic Patients

Editorial Team

StartupNews.fyi is a leading global startup and technology media platform known for its end-to-end coverage of the startup ecosystem across India and key international markets. Launched with the vision of becoming a single gateway for founders, investors, and ecosystem enablers, StartupNews.fyi has grown steadily over the years by publishing tens of thousands of verified news stories, insights, and ecosystem updates, reaching millions of startup enthusiasts every month through its digital platforms and communities.

More like this

China Open Sources DeepSeek LLM, Outperforms Llama 2 and Claude-2

Disclaimer

Popular

Realme Narzo Power 5G with Dimensity 7400 SoC, 10,001 mAh battery launched in India: Price, specs and more

WhatsApp beta for Android 2.26.9.12: what’s new?

RAM Shortage Could Kill Budget Phones: The Latest Predictions at MWC 2026

One% Nutrition bags new funds to bring protein drinks to SEA

PhonePe Eyes $10.5 Bn Valuation For IPO Next Month: Report

More Like this

Halide makers just made Kino the ultimate iPhone 17 Pro video app

Govt May Consider Relaxing SIM-Binding Rules: Report

OpenAI launches GPT-5.4 Thinking and Pro

Apple launches Mac Neo at ₹69,900: Top features, specs and how it compares to MacBook Air

What SMEC’s Data Reveals About AI Max Performance

Estee Lauder to take full ownership of Forest Essentials

China Open Sources DeepSeek LLM, Outperforms Llama 2 and Claude-2

Disclaimer

More like this

Halide makers just made Kino the ultimate iPhone 17...

Govt May Consider Relaxing SIM-Binding Rules: Report

OpenAI launches GPT-5.4 Thinking and Pro

Popular

Block title

Jio and Airtel Step into Early 6G Ecosystem as Qualcomm Targets 2029 Rollout

Videos: Farming Robots, Humanoid Robots, and More

Pentagon vs Anthropic: Who should control the AI weapon?

Shark UV Reveal Review (2026): UV Light Mode

OpenAI Reaches A.I. Agreement With Defense Dept. After Anthropic Clash

NASA Is Making Big Changes to Speed Up the Artemis Program

Kalshi Founder Outlines Next Steps for ‘Iran Leader Ousted By’ Market

Startup Events

Trending News

Halide makers just made Kino the ultimate iPhone 17 Pro video app

Govt May Consider Relaxing SIM-Binding Rules: Report

OpenAI launches GPT-5.4 Thinking and Pro

Apple launches Mac Neo at ₹69,900: Top features, specs and how it compares to MacBook Air

What SMEC’s Data Reveals About AI Max Performance

About

Partnership

Contact us