Mamba is Here to Mark the End of Transformers

Researchers Albert Gu and Tri Dao from Carnegie Mellon and Together AI have introduced a groundbreaking model named Mamba, challenging the prevailing dominance of Transformer-based architectures in deep learning.

Their research unveils Mamba as a state-space model (SSM) that demonstrates superior performance across various modalities, including language, audio, and genomics. For example, the researchers tried their language modelling with the Mamba-3B model that outperformed Transformers based models of the same size and matched Transformers twice its size, both in pretraining and downstream evaluation.

Click here to check out the GitHub repository.

Mamba is presented as a state-of-the-art model with linear-time scaling, ultra-long context, and remarkable efficiency, outperforming Transformers in tasks it has been tested on.

The model is built on the foundation of structured state space models (SSMs), showcasing promising results in information-dense data scenarios, particularly excelling in language modelling where previous subquadratic models have fallen short compared to Transformers.

The researchers emphasise Mamba’s efficiency through its selective SSM layer, designed to address the computational inefficiency of Transformers on long sequences up to a massive million sequence length, which is a major limitation in Transformers.

I’m always excited by new attempts to dethrone transformers. We need more of these. Kudos to @tri_dao & @_albertgu for pushing on alternative sequence architectures for many years now. https://t.co/cf67Xa2PBS

— Jim Fan (@DrJimFan) December 4, 2023

Installation details for Mamba include the use of causal Conv1d layers and the core Mamba package, with additional requirements such as Linux, NVIDIA GPU, PyTorch 1.12+, and CUDA 11.6+.

Mamba’s versatility is demonstrated through its integration into an end-to-end neural network, offering a comprehensive language model with varying model dimensions and layers.

Mamba provides pretrained models with different parameters and layers, showcasing its adaptability to various tasks and data sizes. Evaluations of Mamba’s performance involve running zero-shot evaluations using the lm-evaluation-harness library, with comparisons against other models such as EleutherAI’s pythia-160m.

The research identifies a key weakness in existing subquadratic-time architectures, attributing it to their inability to perform content-based reasoning. Mamba addresses this weakness by allowing selective propagation or forgetting of information along the sequence length dimension, demonstrating significant improvements over traditional models.

Despite the shift away from efficient convolutions, Mamba employs a hardware-aware parallel algorithm in recurrent mode, resulting in fast inference and linear scaling in sequence length.

Mamba emerges as a compelling contender challenging the Transformer paradigm, demonstrating superior performance in diverse modalities and promising advancements in the field of deep learning.

The post Mamba is Here to Mark the End of Transformers appeared first on Analytics India Magazine.

Previous News

Navadhan Nets $5 Mn To Fulfil Financing Needs Of Under-Banked Households & Small Businesses

Next News

Apple Open Sources MLX, Machine Learning Framework for Apple Silicon

Disclaimer

Popular

Microsoft to Introduce Voice Reporting Feature for Xbox

Adobe teams up with India’s Education Ministry for creative learning initiative

Meta May Allow Instagram and Facebook Users in Europe to Pay to Avoid Ads

Indian fintechs amplify payments soundbox pitches to woo merchants

Fintech Unicorn Pine Labs Launches Mini — A QR-First Device With Card Support

More Like this

Heavy lifters: the key players powering ecommerce in Indonesia

Crypto.com acquires Australian brokerage firm Fintek

Musk’s amended lawsuit against OpenAI names Microsoft as defendent

Mastercard reinvents checkout with password and number free payments

boAT Onboards Three Bankers For $300-500 Mn IPO

Talent Acquisition in GCCs: Talent-hungry GCCs fish for professionals at tech pools of IT firms

Mamba is Here to Mark the End of Transformers

Disclaimer

More like this

Heavy lifters: the key players powering ecommerce in Indonesia

Crypto.com acquires Australian brokerage firm Fintek

Musk’s amended lawsuit against OpenAI names Microsoft as defendent

Popular

CRED Forays Into Insurance Vertical

Seizing A Trillion-Dollar Opportunity By 2030

Prediction markets are not being manipulated — Kalshi founder

8i Ventures Exits M2P Fintech With 12X Returns

US has 26M strong ‘crypto voting bloc’ ahead of elections — Survey

Elon Musk’s X is changing its privacy policy to allow third parties to train...

59 Cleantech Startups Working Towards Making India Greener

Upcoming Events

Himalayan Startup Trek 2024 | Himachal Pradesh | November 15 - 17

D2C Insider Regional CXO Meet- West | Mumbai | November 15

EPS Expo India | Pune | November 15 - 17

Founders Conclave Startup Mixer Meetup | Hyderabad | November 16

BusinessTopline Growth Meet | Mumbai | November 16

StartupNews.fyi

StartupNews.fyi

Mamba is Here to Mark the End of Transformers

Disclaimer

Popular

More Like this

Mamba is Here to Mark the End of Transformers

Disclaimer

More like this

Popular

Upcoming Events

Newsletter Signup Form!

Newsletter Signup Form!