Tech Mahindra to Launch OpenAI Rival ‘Project Indus’ Early Next Year

Share via:

Indian IT firm Tech Mahindra, intends to launch ‘Project Indus’ its LLM designed for Hindi and its 37 dialects by the end of December or early January, reported Economic Times. This initiative comes four months after the company introduced ‘Project Indus,’ a strategic effort by the fifth largest software services firm to develop a foundational model for Indian languages.

Over the last two months, the 15-member Project Indus team has gathered 1.2 terabytes of data in Hindi and its related dialects. Currently, they are working on refining this data into web text, which they plan to release as open source by the end of November, stated Nikhil Malhotra, global head of maker’s lab at Tech Mahindra, the ET report added. 

“In the meantime, we have started constructing the model… We are looking at probably the end of December or starting of January, we will release the model for at least Hindi and its dialects. And then the other work starts for other dialects in other regions,” Malhotra said.

The team encountered difficulties related to data availability and collection. “In Hindi, the maximum number of tokens available is about 2.8 billion, which doesn’t meet the model’s requirements. For instance, to create a 7 billion parameter model, I would need at least around 100 billion tokens,” explained Malhotra.

At the beginning, a portal was established to gather voice samples in local dialects through crowd-sourcing. Initially, there were 1,500 responses within the first two days, but the response gradually decreased. In total, only 6,000 samples were received, as stated by Malhotra.

To address this, teams were dispatched to regions like Uttar Pradesh, Madhya Pradesh, Haryana, and Jammu to collect data in person. Additionally, the Hyderabad campus of Tech Mahindra organized a camp where employees contributed samples in dialects like Hyderabadi Dakhini.

According to Tech Mahindra’s chief CP Gurnani, the model will be the biggest Indic LLM and could possibly cater to 25% of the world’s population. While Tech Mahindra has not revealed the cost associated with the project or when the model is expected to be launched, the aim is to build a 7-billion parameter LLM to begin with, Malhotra, told AIM in an exclusive interview.

The post Tech Mahindra to Launch OpenAI Rival ‘Project Indus’ Early Next Year appeared first on Analytics India Magazine.

Disclaimer

We strive to uphold the highest ethical standards in all of our reporting and coverage. We StartupNews.fyi want to be transparent with our readers about any potential conflicts of interest that may arise in our work. It’s possible that some of the investors we feature may have connections to other businesses, including competitors or companies we write about. However, we want to assure our readers that this will not have any impact on the integrity or impartiality of our reporting. We are committed to delivering accurate, unbiased news and information to our audience, and we will continue to uphold our ethics and principles in all of our work. Thank you for your trust and support.

Popular

More Like this

Tech Mahindra to Launch OpenAI Rival ‘Project Indus’ Early Next Year

Indian IT firm Tech Mahindra, intends to launch ‘Project Indus’ its LLM designed for Hindi and its 37 dialects by the end of December or early January, reported Economic Times. This initiative comes four months after the company introduced ‘Project Indus,’ a strategic effort by the fifth largest software services firm to develop a foundational model for Indian languages.

Over the last two months, the 15-member Project Indus team has gathered 1.2 terabytes of data in Hindi and its related dialects. Currently, they are working on refining this data into web text, which they plan to release as open source by the end of November, stated Nikhil Malhotra, global head of maker’s lab at Tech Mahindra, the ET report added. 

“In the meantime, we have started constructing the model… We are looking at probably the end of December or starting of January, we will release the model for at least Hindi and its dialects. And then the other work starts for other dialects in other regions,” Malhotra said.

The team encountered difficulties related to data availability and collection. “In Hindi, the maximum number of tokens available is about 2.8 billion, which doesn’t meet the model’s requirements. For instance, to create a 7 billion parameter model, I would need at least around 100 billion tokens,” explained Malhotra.

At the beginning, a portal was established to gather voice samples in local dialects through crowd-sourcing. Initially, there were 1,500 responses within the first two days, but the response gradually decreased. In total, only 6,000 samples were received, as stated by Malhotra.

To address this, teams were dispatched to regions like Uttar Pradesh, Madhya Pradesh, Haryana, and Jammu to collect data in person. Additionally, the Hyderabad campus of Tech Mahindra organized a camp where employees contributed samples in dialects like Hyderabadi Dakhini.

According to Tech Mahindra’s chief CP Gurnani, the model will be the biggest Indic LLM and could possibly cater to 25% of the world’s population. While Tech Mahindra has not revealed the cost associated with the project or when the model is expected to be launched, the aim is to build a 7-billion parameter LLM to begin with, Malhotra, told AIM in an exclusive interview.

The post Tech Mahindra to Launch OpenAI Rival ‘Project Indus’ Early Next Year appeared first on Analytics India Magazine.

Disclaimer

We strive to uphold the highest ethical standards in all of our reporting and coverage. We StartupNews.fyi want to be transparent with our readers about any potential conflicts of interest that may arise in our work. It’s possible that some of the investors we feature may have connections to other businesses, including competitors or companies we write about. However, we want to assure our readers that this will not have any impact on the integrity or impartiality of our reporting. We are committed to delivering accurate, unbiased news and information to our audience, and we will continue to uphold our ethics and principles in all of our work. Thank you for your trust and support.

Website Upgradation is going on for any glitch kindly connect at office@startupnews.fyi

More like this

Nonprofit group joins Elon Musk’s effort to block OpenAI’s...

Encode, the nonprofit organization that co-sponsored California’s ill-fated...

Crypto industry groups sue IRS over broker reporting rule

Three crypto industry groups — the DeFi Education...

Ather Energy Gets SEBI Nod For INR 3,100+ Cr...

SUMMARY Ather Energy received the observation letter from the...

Popular

Upcoming Events

Startup Information that matters. Get in your inbox Daily!