Meta has been synonymous with open source ecosystems. Recently, its research arm, FAIR, completed 10 years of its contribution in the field of artificial intelligence and human-level intelligence.
“If you look back at Meta’s history, we’ve been a huge proponent of open-source,” believes Ahmed Al-Dahle, Meta’s vice president for generative AI.
Currently, Meta has over 900 GitHub repositories.
This year, Meta’s commendable effort towards open source has resulted in the introduction of state of the art models. Here’s a list of top must-known open-source AI models developed by Meta AI.
Llama
Meta has unexpectedly become the Robinhood of the LLM community, breaking the barrier of entry for a lot of developers to experiment with large language models ever since its launch in February 2023. Llama 2, which was launched later in partnership with Microsoft, in July 2023, changed the course of open-source language models once and for all. It is said that the company will be launching Llama 3 early next year. The best part is that it would be open source for researchers and commercial use as well.
Learn more about Llama here.
Seamless Communication
Meta has introduced “Seamless,” a cross-lingual communication system, anchored by two core models—SeamlessExpressive for preserving expression in speech-to-speech translation, and SeamlessStreaming, a high-performing streaming translation model with an impressive two-second latency—these innovations are built on the foundation of SeamlessM4T v2, Meta’s latest foundational model. SeamlessM4T v2 showcases notable performance enhancements in automatic speech recognition and various speech and text translation capabilities. This release represents a substantial leap forward in real-time cross-lingual communication, underscoring Meta’s commitment to pushing the boundaries of language and expression.
Learn more about Seamless here.
AudioCraft
Meta has unveiled the AudioCraft family of models, a design of generative audio models, offering both a user-friendly interface and the flexibility for users to explore their creative boundaries.
Comprising three distinct models—MusicGen, AudioGen, and EnCodec—AudioCraft stands out for its ability to produce high-quality audio with enduring consistency.BMeta is excited to make the pre-trained AudioGen model, EnCodec decoder, and all AudioCraft model weights and code available for research purposes, providing researchers and practitioners the opportunity to train their own models with personalized datasets and contribute to advancing the state of the art in generative audio technology.
Learn more about Audiocraft here.
DINOv2
Meta AI has unveiled DINOv2, a method for training high-performance computer vision models. With delivering a robust performance without the need for fine-tuning, it is positioning as a versatile backbone for various computer vision tasks. The release of Meta’s DINOv2 signals is marking a new era in computer vision methodologies, promising to redefine the landscape of high-performance model training.
Learn more about DINOv2 here.
XLS-R
Meta has fostered global inclusivity in voice technology, by releasing XLS-R that represents a major advancement in speech tasks, significantly outpacing previous multilingual models. This model sets a new benchmark, achieving state-of-the-art performance on renowned benchmarks like BABEL, CommonVoice, VoxPopuli for speech recognition, CoVoST-2 for foreign-to-English translation, and VoxLingua107 for language identification. With this release Meta is not only focusing to address the language gap in speech technology but also sets the stage for future developments, promising an enhanced and more inclusive voice technology landscape in the metaverse and beyond.
Learn more about XLS-R here.
Detectron2
Facebook AI Research (FAIR) has unveiled Detectron2, a cutting-edge library representing the next generation in state-of-the-art detection and segmentation algorithms. Positioned as the successor to Detectron, Detectron2 marks a significant leap forward in supporting various computer vision research projects and production applications within Facebook. Detectron2 is poised to redefine the landscape of detection and segmentation algorithms, setting new standards for innovation and performance in the realm of artificial intelligence.
Learn more about Detectron 2 here.
DensePose
Meta has stepped into a significant stride towards advancing human understanding by intorucing DensePose, a groundbreaking real-time approach that maps all human pixels in 2D RGB images to a comprehensive 3D surface-based model of the body. DensePose accelerates connections with augmented and virtual reality, operating at multiple frames per second on a single GPU and capable of handling multiple humans simultaneously. Meta is aiming for the future with the introduction of DensePose-COCO, a large-scale ground-truth dataset, further enhances the model’s accuracy and applicability, marking a pivotal advancement in the realm of human-centric image interpretation.
Learn more about DensePose here.
Wav2vec 2.0
Facebook AI Research has introduced wav2vec 2.0, the highly anticipated successor to wav2vec, accompanied by pretrained models and code. This advanced model revolutionizes self-supervised learning by mastering basic speech units, predicting masked portions of audio to simultaneously learn the correct speech units. This release marks a significant leap forward in the democratization of speech recognition technology, making it more accessible and effective across a broader linguistic spectrum.
Learn more about Wav2vec 2.0 here.
VizSeq
Embarking on a new era of efficiency in text generation tasks, Meta has introduced VizSeq, a Python toolkit, has emerged as a game-changer in visual analysis. Offering a user-friendly interface and harnessing the latest advancements in Natural Language Processing (NLP), VizSeq elevates user productivity by providing visualization capabilities in Jupyter Notebook and a web app. This release signifies a pivotal step toward more effective and comprehensive visual analysis in the realm of text generation tasks by Meta.
Learn more about VizSeq here.