China Open Sources DeepSeek LLM, Outperforms Llama 2 and Claude-2

Share via:

DeepSeek, a company based in China which aims to “unravel the mystery of AGI with curiosity,” has released DeepSeek LLM, a 67 billion parameter model trained meticulously from scratch on a dataset consisting of 2 trillion tokens. 

Available in both English and Chinese languages, the LLM aims to foster research and innovation. The research community is granted access to the open-source versions, DeepSeek LLM 7B/67B Base and DeepSeek LLM 7B/67B Chat.

Check out the GitHub repository here.

The model is available under the MIT licence.

DeepSeek LLM 67B Base has showcased unparalleled capabilities, outperforming the Llama 2 70B Base in key areas such as reasoning, coding, mathematics, and Chinese comprehension. Particularly, its proficiency in coding is highlighted by an outstanding HumanEval Pass@1 score of 73.78, and in mathematics, it achieves remarkable scores, including GSM8K 0-shot: 84.1 and Math 0-shot: 32.6.

The model’s generalisation abilities are underscored by an exceptional score of 65 on the challenging Hungarian National High School Exam.

DeepSeek LLM 7B/67B models, including base and chat versions, are released to the public on GitHub, Hugging Face and also AWS S3. Access to intermediate checkpoints during the base model’s training process is provided, with usage subject to the outlined licence terms. 

Performance Highlights

DeepSeek LLM 67B Base surpasses Llama2 70B Base in general capabilities.
DeepSeek LLM 67B Chat performs exceptionally well in coding, mathematics, and reasoning. pic.twitter.com/Y9uVNYAq2l

— DeepSeek (@deepseek_ai) November 29, 2023

In-depth evaluations have been conducted on the base and chat models, comparing them to existing benchmarks. Results reveal DeepSeek LLM’s supremacy over LLaMA-2, GPT-3.5, and Claude-2 in various metrics, showcasing its prowess in English and Chinese languages. 

The evaluation extends to never-before-seen exams, including the Hungarian National High School Exam, where DeepSeek LLM 67B Chat exhibits outstanding performance.

Experimentation with multi-choice questions has proven to enhance benchmark performance, particularly in Chinese multiple-choice benchmarks. By incorporating 20 million Chinese multiple-choice questions, DeepSeek LLM 7B Chat demonstrates improved scores in MMLU, C-Eval, and CMMLU.

DeepSeek LLM’s pre-training involved a vast dataset, meticulously curated to ensure richness and variety. The architecture, akin to LLaMA, employs auto-regressive transformer decoder models with unique attention mechanisms. The pre-training process, with specific details on training loss curves and benchmark metrics, is released to the public, emphasising transparency and accessibility.

Recently, Alibaba, the chinese tech giant also unveiled its own LLM called Qwen-72B, which has been trained on high-quality data consisting of 3T tokens and also an expanded context window length of 32K. Not just that, the company also added a smaller language model, Qwen-1.8B, touting it as a gift to the research community. 

Disclaimer

We strive to uphold the highest ethical standards in all of our reporting and coverage. We StartupNews.fyi want to be transparent with our readers about any potential conflicts of interest that may arise in our work. It’s possible that some of the investors we feature may have connections to other businesses, including competitors or companies we write about. However, we want to assure our readers that this will not have any impact on the integrity or impartiality of our reporting. We are committed to delivering accurate, unbiased news and information to our audience, and we will continue to uphold our ethics and principles in all of our work. Thank you for your trust and support.

Editorial Team
StartupNews.fyi is a leading global startup and technology media platform known for its end-to-end coverage of the startup ecosystem across India and key international markets. Launched with the vision of becoming a single gateway for founders, investors, and ecosystem enablers, StartupNews.fyi has grown steadily over the years by publishing tens of thousands of verified news stories, insights, and ecosystem updates, reaching millions of startup enthusiasts every month through its digital platforms and communities.

Popular

More Like this

China Open Sources DeepSeek LLM, Outperforms Llama 2 and Claude-2

DeepSeek, a company based in China which aims to “unravel the mystery of AGI with curiosity,” has released DeepSeek LLM, a 67 billion parameter model trained meticulously from scratch on a dataset consisting of 2 trillion tokens. 

Available in both English and Chinese languages, the LLM aims to foster research and innovation. The research community is granted access to the open-source versions, DeepSeek LLM 7B/67B Base and DeepSeek LLM 7B/67B Chat.

Check out the GitHub repository here.

The model is available under the MIT licence.

DeepSeek LLM 67B Base has showcased unparalleled capabilities, outperforming the Llama 2 70B Base in key areas such as reasoning, coding, mathematics, and Chinese comprehension. Particularly, its proficiency in coding is highlighted by an outstanding HumanEval Pass@1 score of 73.78, and in mathematics, it achieves remarkable scores, including GSM8K 0-shot: 84.1 and Math 0-shot: 32.6.

The model’s generalisation abilities are underscored by an exceptional score of 65 on the challenging Hungarian National High School Exam.

DeepSeek LLM 7B/67B models, including base and chat versions, are released to the public on GitHub, Hugging Face and also AWS S3. Access to intermediate checkpoints during the base model’s training process is provided, with usage subject to the outlined licence terms. 

Performance Highlights

DeepSeek LLM 67B Base surpasses Llama2 70B Base in general capabilities.
DeepSeek LLM 67B Chat performs exceptionally well in coding, mathematics, and reasoning. pic.twitter.com/Y9uVNYAq2l

— DeepSeek (@deepseek_ai) November 29, 2023

In-depth evaluations have been conducted on the base and chat models, comparing them to existing benchmarks. Results reveal DeepSeek LLM’s supremacy over LLaMA-2, GPT-3.5, and Claude-2 in various metrics, showcasing its prowess in English and Chinese languages. 

The evaluation extends to never-before-seen exams, including the Hungarian National High School Exam, where DeepSeek LLM 67B Chat exhibits outstanding performance.

Experimentation with multi-choice questions has proven to enhance benchmark performance, particularly in Chinese multiple-choice benchmarks. By incorporating 20 million Chinese multiple-choice questions, DeepSeek LLM 7B Chat demonstrates improved scores in MMLU, C-Eval, and CMMLU.

DeepSeek LLM’s pre-training involved a vast dataset, meticulously curated to ensure richness and variety. The architecture, akin to LLaMA, employs auto-regressive transformer decoder models with unique attention mechanisms. The pre-training process, with specific details on training loss curves and benchmark metrics, is released to the public, emphasising transparency and accessibility.

Recently, Alibaba, the chinese tech giant also unveiled its own LLM called Qwen-72B, which has been trained on high-quality data consisting of 3T tokens and also an expanded context window length of 32K. Not just that, the company also added a smaller language model, Qwen-1.8B, touting it as a gift to the research community. 

Disclaimer

We strive to uphold the highest ethical standards in all of our reporting and coverage. We StartupNews.fyi want to be transparent with our readers about any potential conflicts of interest that may arise in our work. It’s possible that some of the investors we feature may have connections to other businesses, including competitors or companies we write about. However, we want to assure our readers that this will not have any impact on the integrity or impartiality of our reporting. We are committed to delivering accurate, unbiased news and information to our audience, and we will continue to uphold our ethics and principles in all of our work. Thank you for your trust and support.

Website Upgradation is going on for any glitch kindly connect at office@startupnews.fyi

Editorial Team
StartupNews.fyi is a leading global startup and technology media platform known for its end-to-end coverage of the startup ecosystem across India and key international markets. Launched with the vision of becoming a single gateway for founders, investors, and ecosystem enablers, StartupNews.fyi has grown steadily over the years by publishing tens of thousands of verified news stories, insights, and ecosystem updates, reaching millions of startup enthusiasts every month through its digital platforms and communities.

More like this

Eat App wants a bite of India’s restaurant reservation...

Dubai-based restaurant reservation startup Eat App aims to...

Trump administration admits DOGE may have misused Americans’ Social...

Two members of Elon Musk’s Department of Government...

OpenAI concludes AI Jam sessions to drive AI adoption...

OpenAI on Tuesday said it has concluded its...

Popular

buy iptv subscription buy iptv service iptv subscription for sale best iptv subscription cheap iptv subscription premium iptv subscription iptv subscription deals iptv service provider buy iptv online iptv streaming service iptv plans online iptv subscription online best iptv provider iptv service for sale buy premium iptv iptv membership iptv subscription price iptv monthly subscription iptv yearly subscription iptv subscription packages buy iptv connection iptv reseller subscription iptv service deals iptv live tv subscription iptv sports subscription iptv 4k subscription iptv hd subscription buy iptv for smart tv iptv subscription instant activation iptv no contract subscription reliable iptv service secure iptv subscription fast iptv service iptv multi device subscription iptv trial subscription iptv service with support buy iptv playlist iptv m3u subscription iptv xtream codes subscription iptv subscription buy now melhor-iptv-portugal melhor-iptv-portugal melhor-iptv-portugal melhor-iptv-portugal melhor-iptv-portugal melhor-iptv-portugal melhor-iptv-portugal melhor-iptv-portugal melhor-iptv-portugal melhor-iptv-portugal melhor-iptv-portugal melhor-iptv-portugal melhor-iptv-portugal melhor-iptv-portugal melhor-iptv-portugal melhor-iptv-portugal melhor-iptv-portugal melhor-iptv-portugal melhor-iptv-portugal melhor-iptv-portugal melhor-iptv-portugal melhor-iptv-portugal melhor-iptv-portugal melhor-iptv-portugal melhor-iptv-portugal melhor-iptv-portugal melhor-iptv-portugal melhor-iptv-portugal melhor-iptv-portugal melhor-iptv-portugal melhor-iptv-portugal melhor-iptv-portugal melhor-iptv-portugal melhor-iptv-portugal melhor-iptv-portugal melhor-iptv-portugal melhor-iptv-portugal melhor-iptv-portugal melhor-iptv-portugal melhor-iptv-portugal melhor-iptv-portugal melhor-iptv-portugal melhor-iptv-portugal melhor-iptv-portugal melhor-iptv-portugal melhor-iptv-portugal melhor-iptv-portugal melhor-iptv-portugal melhor-iptv-portugal melhor-iptv-portugal melhor-iptv-portugal melhor-iptv-portugal melhor-iptv-portugal melhor-iptv-portugal melhor-iptv-portugal melhor-iptv-portugal melhor-iptv-portugal melhor-iptv-portugal melhor-iptv-portugal melhor-iptv-portugal melhor-iptv-portugal melhor-iptv-portugal melhor-iptv-portugal melhor-iptv-portugal melhor-iptv-portugal melhor-iptv-portugal melhor-iptv-portugal melhor-iptv-portugal melhor-iptv-portugal melhor-iptv-portugal melhor-iptv-portugal melhor-iptv-portugal melhor-iptv-portugal melhor-iptv-portugal