China Open Sources DeepSeek LLM, Outperforms Llama 2 and Claude-2

DeepSeek, a company based in China which aims to “unravel the mystery of AGI with curiosity,” has released DeepSeek LLM, a 67 billion parameter model trained meticulously from scratch on a dataset consisting of 2 trillion tokens.

Available in both English and Chinese languages, the LLM aims to foster research and innovation. The research community is granted access to the open-source versions, DeepSeek LLM 7B/67B Base and DeepSeek LLM 7B/67B Chat.

Check out the GitHub repository here.

The model is available under the MIT licence.

DeepSeek LLM 67B Base has showcased unparalleled capabilities, outperforming the Llama 2 70B Base in key areas such as reasoning, coding, mathematics, and Chinese comprehension. Particularly, its proficiency in coding is highlighted by an outstanding HumanEval Pass@1 score of 73.78, and in mathematics, it achieves remarkable scores, including GSM8K 0-shot: 84.1 and Math 0-shot: 32.6.

The model’s generalisation abilities are underscored by an exceptional score of 65 on the challenging Hungarian National High School Exam.

DeepSeek LLM 7B/67B models, including base and chat versions, are released to the public on GitHub, Hugging Face and also AWS S3. Access to intermediate checkpoints during the base model’s training process is provided, with usage subject to the outlined licence terms.

Performance Highlights

DeepSeek LLM 67B Base surpasses Llama2 70B Base in general capabilities.
DeepSeek LLM 67B Chat performs exceptionally well in coding, mathematics, and reasoning. pic.twitter.com/Y9uVNYAq2l

— DeepSeek (@deepseek_ai) November 29, 2023

In-depth evaluations have been conducted on the base and chat models, comparing them to existing benchmarks. Results reveal DeepSeek LLM’s supremacy over LLaMA-2, GPT-3.5, and Claude-2 in various metrics, showcasing its prowess in English and Chinese languages.

The evaluation extends to never-before-seen exams, including the Hungarian National High School Exam, where DeepSeek LLM 67B Chat exhibits outstanding performance.

Experimentation with multi-choice questions has proven to enhance benchmark performance, particularly in Chinese multiple-choice benchmarks. By incorporating 20 million Chinese multiple-choice questions, DeepSeek LLM 7B Chat demonstrates improved scores in MMLU, C-Eval, and CMMLU.

DeepSeek LLM’s pre-training involved a vast dataset, meticulously curated to ensure richness and variety. The architecture, akin to LLaMA, employs auto-regressive transformer decoder models with unique attention mechanisms. The pre-training process, with specific details on training loss curves and benchmark metrics, is released to the public, emphasising transparency and accessibility.

Recently, Alibaba, the chinese tech giant also unveiled its own LLM called Qwen-72B, which has been trained on high-quality data consisting of 3T tokens and also an expanded context window length of 32K. Not just that, the company also added a smaller language model, Qwen-1.8B, touting it as a gift to the research community.

Previous News

Google Messages getting new features as RCS users cross 1 billion mark

Next News

Zyla Health Bags $4 Mn To Boost Personalised Offerings For Chronic Patients

Disclaimer

Popular

Microsoft to Introduce Voice Reporting Feature for Xbox

Adobe teams up with India’s Education Ministry for creative learning initiative

Meta May Allow Instagram and Facebook Users in Europe to Pay to Avoid Ads

Indian fintechs amplify payments soundbox pitches to woo merchants

Fintech Unicorn Pine Labs Launches Mini — A QR-First Device With Card Support

More Like this

Can Brown Living Ignite India’s Next Big Eco-Friendly Shopping Trend?

Worldcoin unveils new Layer-2 network 'World Chain'

Google Shopping gets AI for more personalized experiences

The surprising way OpenAI could reportedly get out of its pact with Microsoft

8i Ventures Exits M2P Fintech With 12X Returns

Eruditus Bags $150 Mn Led By The Rise Fund

China Open Sources DeepSeek LLM, Outperforms Llama 2 and Claude-2

Disclaimer

More like this

Worldcoin unveils new Layer-2 network 'World Chain'

Google Shopping gets AI for more personalized experiences

The surprising way OpenAI could reportedly get out of...

Popular

US has 26M strong ‘crypto voting bloc’ ahead of elections — Survey

Elon Musk’s X is changing its privacy policy to allow third parties to train...

59 Cleantech Startups Working Towards Making India Greener

Trump’s crypto website crashed after its WLFI token went on sale

Former Palantir CISO Dane Stuckey joins OpenAI to lead security

Mamaearth’s Chief Product & Technology Officer Jayant Chauhan Quits

An AI bot didn’t create the GOAT crypto token — but did shill it

Upcoming Events

GITEX Global | Dubai | October 14-18

Amazon Web Services (AWS) Online Training Program | October 14 - 18

Amazon Web Services (AWS) Onsite Training Program | October 14 - 18

BRWN iNK - Dubai - Speed Networking | October 15

Dubai Business Networking Event | October 16

StartupNews.fyi

StartupNews.fyi

China Open Sources DeepSeek LLM, Outperforms Llama 2 and Claude-2

Disclaimer

Popular

More Like this

China Open Sources DeepSeek LLM, Outperforms Llama 2 and Claude-2

Disclaimer

More like this

Popular

Upcoming Events

Newsletter Signup Form!

Newsletter Signup Form!