EleutherAI Launches Open-Source English-Hindi Bilingual Model, Hi-NOLIN

Share via:

EleutherAI in collaboration with INCITE Project, AAI CERC lab at the Université de Montréal, have introduced Hi-NOLIN, an open-source English-Hindi bilingual model. 

Hi-NOLIN’s journey began with the goal of creating the first open-source English-Hindi bilingual model. Researchers expanded the 7B Pythia architecture to a 9B model, enhancing efficiency on their hardware while training on the 300B token Pile text corpus, encompassing both English and code data. Hi-NOLIN stands out for its ability to transition seamlessly between languages, mastering both Hindi and English, while processing code.

As researchers continue training Hi-NOLIN, leveraging the Summit supercomputer with its unique 6 GPUs per node configuration, preliminary results demonstrate remarkable potential. Despite being far from convergence, the 9B model shows a steady reduction in training loss and promises substantial improvements. 

Employing advanced techniques from GPT-NeoX, Megatron-LM, and DeepSpeed, Hi-NOLIN utilizes 3D parallelism and ZeRO redundancy optimizer, maximizing its training resources and computational prowess.

Hi-NOLIN shines through in various standard LLM benchmarks, including HellaSwag, TruthfulQA, Arc, and Human Eval. Remarkably, even in its preliminary stage with 600B tokens, Hi-NOLIN outperforms Pythia 12B and multilingual Bloom models across most evaluation benchmarks, narrowing the gap with LLaMa 2 models.

In a landscape dominated by English language models, Hi-NOLIN is a significant stride towards linguistic inclusivity, addressing the gap in state-of-the-art language models for non-English languages. 

EleutherAI  is a non-profit research group dedicated to the development of open-source LLMs. The group was founded by a group of hackers—namely, Connor Leahy, Sid Black, and Leo Gao in 2020  who wanted to create a more accessible and transparent alternative to commercial LLMs.
Meanwhile, Indian IT firm Tech Mahindra intends to launch Project Indus, its LLM designed for Hindi and its 37 dialects, by the end of December or early January.

The post EleutherAI Launches Open-Source English-Hindi Bilingual Model, Hi-NOLIN appeared first on Analytics India Magazine.

Disclaimer

We strive to uphold the highest ethical standards in all of our reporting and coverage. We StartupNews.fyi want to be transparent with our readers about any potential conflicts of interest that may arise in our work. It’s possible that some of the investors we feature may have connections to other businesses, including competitors or companies we write about. However, we want to assure our readers that this will not have any impact on the integrity or impartiality of our reporting. We are committed to delivering accurate, unbiased news and information to our audience, and we will continue to uphold our ethics and principles in all of our work. Thank you for your trust and support.

Popular

More Like this

EleutherAI Launches Open-Source English-Hindi Bilingual Model, Hi-NOLIN

EleutherAI in collaboration with INCITE Project, AAI CERC lab at the Université de Montréal, have introduced Hi-NOLIN, an open-source English-Hindi bilingual model. 

Hi-NOLIN’s journey began with the goal of creating the first open-source English-Hindi bilingual model. Researchers expanded the 7B Pythia architecture to a 9B model, enhancing efficiency on their hardware while training on the 300B token Pile text corpus, encompassing both English and code data. Hi-NOLIN stands out for its ability to transition seamlessly between languages, mastering both Hindi and English, while processing code.

As researchers continue training Hi-NOLIN, leveraging the Summit supercomputer with its unique 6 GPUs per node configuration, preliminary results demonstrate remarkable potential. Despite being far from convergence, the 9B model shows a steady reduction in training loss and promises substantial improvements. 

Employing advanced techniques from GPT-NeoX, Megatron-LM, and DeepSpeed, Hi-NOLIN utilizes 3D parallelism and ZeRO redundancy optimizer, maximizing its training resources and computational prowess.

Hi-NOLIN shines through in various standard LLM benchmarks, including HellaSwag, TruthfulQA, Arc, and Human Eval. Remarkably, even in its preliminary stage with 600B tokens, Hi-NOLIN outperforms Pythia 12B and multilingual Bloom models across most evaluation benchmarks, narrowing the gap with LLaMa 2 models.

In a landscape dominated by English language models, Hi-NOLIN is a significant stride towards linguistic inclusivity, addressing the gap in state-of-the-art language models for non-English languages. 

EleutherAI  is a non-profit research group dedicated to the development of open-source LLMs. The group was founded by a group of hackers—namely, Connor Leahy, Sid Black, and Leo Gao in 2020  who wanted to create a more accessible and transparent alternative to commercial LLMs.
Meanwhile, Indian IT firm Tech Mahindra intends to launch Project Indus, its LLM designed for Hindi and its 37 dialects, by the end of December or early January.

The post EleutherAI Launches Open-Source English-Hindi Bilingual Model, Hi-NOLIN appeared first on Analytics India Magazine.

Disclaimer

We strive to uphold the highest ethical standards in all of our reporting and coverage. We StartupNews.fyi want to be transparent with our readers about any potential conflicts of interest that may arise in our work. It’s possible that some of the investors we feature may have connections to other businesses, including competitors or companies we write about. However, we want to assure our readers that this will not have any impact on the integrity or impartiality of our reporting. We are committed to delivering accurate, unbiased news and information to our audience, and we will continue to uphold our ethics and principles in all of our work. Thank you for your trust and support.

Website Upgradation is going on for any glitch kindly connect at office@startupnews.fyi

More like this

How Mamaearth Lost Its Glow

One of India’s most celebrated beauty brands Mamaearth...

Hash-based zero-knowledge tech can quantum-proof Ethereum — XinXin Fan

Google, Microsoft, Amazon, and IBM are some of...

Indie App Spotlight: ‘Pestle’ is the ultimate recipe manager,...

Welcome to Indie App Spotlight. This is a weekly...

Popular

Upcoming Events

Startup Information that matters. Get in your inbox Daily!