Berlin-based Jina AI has unveiled its latest achievement, the second-generation text embedding model known as jina-embeddings-v2. This groundbreaking model boasts an impressive context length of 8,192 tokens, a milestone that places it in direct competition with OpenAI’s proprietary model, text-embedding-ada-002, on both the Massive Text Embedding Benchmark (MTEB) leaderboard and in terms of capabilities.
Check out the model on Hugging Face.
Jina AI’s jina-embeddings-v2, when directly compared to OpenAI’s 8K model text-embedding-ada-002, demonstrates its mettle. Notably, jina-embedding-v2 surpasses its OpenAI counterpart in terms of Classification Average, Reranking Average, Retrieval Average, and Summarization Average.
jina-embeddings-v2 was meticulously crafted from the ground up through intensive research and development, data collection, and fine-tuning. The result is a model that represents a significant leap from its predecessor.
Beyond its technical achievement, jina-embeddings-v2’s 8K context length opens new doors for various industry applications, including legal document analysis, medical research, literary analysis, financial forecasting, and conversational AI. Benchmarking shows that this extended context allows jina-embeddings-v2 to outperform other leading base embedding models in several datasets, highlighting the practical advantages of longer context capabilities.
Reflecting on this, Dr. Han Xiao, CEO of Jina AI, shared his thoughts: “in the ever-evolving world of AI, staying ahead and ensuring open access to breakthroughs is paramount. With jina-embeddings-v2, we’ve achieved a significant milestone. Not only have we developed the world’s first open-source 8K context length model, but we have also brought it to a performance level on par with industry giants like OpenAI. Our mission at Jina AI is clear: we aim to democratise AI and empower the community with tools that were once confined to proprietary ecosystems. Today, I am proud to say, we have taken a giant leap towards that vision.”
A forthcoming academic paper detailing the technical intricacies and benchmarks of jina-embeddings-v2 will provide the AI community with deeper insights.
Jina AI is setting its sights on launching German-English models, further expanding its repertoire as it continues to advance and democratise artificial intelligence through open source and open science.
The post Jina AI Launches Open Source 8K Text Embedding, Rivalling OpenAI appeared first on Analytics India Magazine.