Jina AI Launches Open Source 8K Text Embedding, Rivalling OpenAI

Share via:

Berlin-based Jina AI has unveiled its latest achievement, the second-generation text embedding model known as jina-embeddings-v2. This groundbreaking model boasts an impressive context length of 8,192 tokens, a milestone that places it in direct competition with OpenAI’s proprietary model, text-embedding-ada-002, on both the Massive Text Embedding Benchmark (MTEB) leaderboard and in terms of capabilities.

Check out the model on Hugging Face.

Jina AI’s jina-embeddings-v2, when directly compared to OpenAI’s 8K model text-embedding-ada-002, demonstrates its mettle. Notably, jina-embedding-v2 surpasses its OpenAI counterpart in terms of Classification Average, Reranking Average, Retrieval Average, and Summarization Average.

jina-embeddings-v2 was meticulously crafted from the ground up through intensive research and development, data collection, and fine-tuning. The result is a model that represents a significant leap from its predecessor.

Beyond its technical achievement, jina-embeddings-v2’s 8K context length opens new doors for various industry applications, including legal document analysis, medical research, literary analysis, financial forecasting, and conversational AI. Benchmarking shows that this extended context allows jina-embeddings-v2 to outperform other leading base embedding models in several datasets, highlighting the practical advantages of longer context capabilities.

Reflecting on this, Dr. Han Xiao, CEO of Jina AI, shared his thoughts: “in the ever-evolving world of AI, staying ahead and ensuring open access to breakthroughs is paramount. With jina-embeddings-v2, we’ve achieved a significant milestone. Not only have we developed the world’s first open-source 8K context length model, but we have also brought it to a performance level on par with industry giants like OpenAI. Our mission at Jina AI is clear: we aim to democratise AI and empower the community with tools that were once confined to proprietary ecosystems. Today, I am proud to say, we have taken a giant leap towards that vision.”

A forthcoming academic paper detailing the technical intricacies and benchmarks of jina-embeddings-v2 will provide the AI community with deeper insights. 

Jina AI is setting its sights on launching German-English models, further expanding its repertoire as it continues to advance and democratise artificial intelligence through open source and open science.

The post Jina AI Launches Open Source 8K Text Embedding, Rivalling OpenAI appeared first on Analytics India Magazine.

Disclaimer

We strive to uphold the highest ethical standards in all of our reporting and coverage. We StartupNews.fyi want to be transparent with our readers about any potential conflicts of interest that may arise in our work. It’s possible that some of the investors we feature may have connections to other businesses, including competitors or companies we write about. However, we want to assure our readers that this will not have any impact on the integrity or impartiality of our reporting. We are committed to delivering accurate, unbiased news and information to our audience, and we will continue to uphold our ethics and principles in all of our work. Thank you for your trust and support.

Popular

More Like this

Jina AI Launches Open Source 8K Text Embedding, Rivalling OpenAI

Berlin-based Jina AI has unveiled its latest achievement, the second-generation text embedding model known as jina-embeddings-v2. This groundbreaking model boasts an impressive context length of 8,192 tokens, a milestone that places it in direct competition with OpenAI’s proprietary model, text-embedding-ada-002, on both the Massive Text Embedding Benchmark (MTEB) leaderboard and in terms of capabilities.

Check out the model on Hugging Face.

Jina AI’s jina-embeddings-v2, when directly compared to OpenAI’s 8K model text-embedding-ada-002, demonstrates its mettle. Notably, jina-embedding-v2 surpasses its OpenAI counterpart in terms of Classification Average, Reranking Average, Retrieval Average, and Summarization Average.

jina-embeddings-v2 was meticulously crafted from the ground up through intensive research and development, data collection, and fine-tuning. The result is a model that represents a significant leap from its predecessor.

Beyond its technical achievement, jina-embeddings-v2’s 8K context length opens new doors for various industry applications, including legal document analysis, medical research, literary analysis, financial forecasting, and conversational AI. Benchmarking shows that this extended context allows jina-embeddings-v2 to outperform other leading base embedding models in several datasets, highlighting the practical advantages of longer context capabilities.

Reflecting on this, Dr. Han Xiao, CEO of Jina AI, shared his thoughts: “in the ever-evolving world of AI, staying ahead and ensuring open access to breakthroughs is paramount. With jina-embeddings-v2, we’ve achieved a significant milestone. Not only have we developed the world’s first open-source 8K context length model, but we have also brought it to a performance level on par with industry giants like OpenAI. Our mission at Jina AI is clear: we aim to democratise AI and empower the community with tools that were once confined to proprietary ecosystems. Today, I am proud to say, we have taken a giant leap towards that vision.”

A forthcoming academic paper detailing the technical intricacies and benchmarks of jina-embeddings-v2 will provide the AI community with deeper insights. 

Jina AI is setting its sights on launching German-English models, further expanding its repertoire as it continues to advance and democratise artificial intelligence through open source and open science.

The post Jina AI Launches Open Source 8K Text Embedding, Rivalling OpenAI appeared first on Analytics India Magazine.

Disclaimer

We strive to uphold the highest ethical standards in all of our reporting and coverage. We StartupNews.fyi want to be transparent with our readers about any potential conflicts of interest that may arise in our work. It’s possible that some of the investors we feature may have connections to other businesses, including competitors or companies we write about. However, we want to assure our readers that this will not have any impact on the integrity or impartiality of our reporting. We are committed to delivering accurate, unbiased news and information to our audience, and we will continue to uphold our ethics and principles in all of our work. Thank you for your trust and support.

Website Upgradation is going on for any glitch kindly connect at office@startupnews.fyi

More like this

Shiprocket Selected For Ecommerce Export Hub Pilot

SUMMARY Besides Shiprocket, the Centre has selected air cargo...

Threads has grown by 15 million users this month

Bluesky might be on the rise, but Instagram...

Tether introduces 'Hadron' real-world asset tokenization platform

The Tether-US dollar stablecoin's market capitalization has topped...

Popular

Upcoming Events

Startup Information that matters. Get in your inbox Daily!