Chennai-based AI4Bharat is collecting ten trillion tokens of language data from everyday conversations to technical documents across India’s major languages. This data will power the next generation of artificial intelligence (AI) services, said Mitesh Khapra, cofounder, AI4Bharat.
Tokens are the basic building blocks that AI uses to understand language. They are usually parts of words or sometimes whole words.
“We have 200 million spoken words… four states where it is already live or in an active stage. We have use cases supporting farmers,…