SambaNova Launches Fastest AI Platform with Llama 3.1 405B at 132 Tokens per Second

Share via:


SambaNova Systems has introduced SambaNova Cloud, the fastest AI inference platform available today, powered by its SN40L AI chip. The platform offers developers immediate access to Meta’s Llama 3.1 models, including the 405B model, at full 16-bit precision and at a rate of 132 tokens per second (t/s). 

The Llama 3.1 70B model runs at 461 t/s. The service is now open to developers without a waiting list.

Cerebras Inference recently announced that it delivers 1,800 tokens per second for the Llama 3.1 8B model and 450 tokens per second for the Llama 3.1 70B model, making it 20 times faster than NVIDIA GPU based hyperscale clouds. Meanwhile, Groq can achieve over 500 tokens per second on the Llama 3.1 70B model. 

SambaNova Cloud supports both the Llama 3.1 70B model, designed for agentic AI applications, and the 405B model, the largest open-source AI model available. 

According to SambaNova CEO Rodrigo Liang, this versatility offers developers the ability to run high-speed, lower-cost models as well as the highest fidelity model at full precision. “Enterprise customers want versatility – 70B at lightning speeds for agentic AI systems, and the highest fidelity 405B model for when they need the best results. SambaNova Cloud is the only platform that offers both today,” he said.

Artificial Analysis independently benchmarked SambaNova Cloud, confirming its performance as the fastest available AI platform for Llama 3.1 models. The service surpassed offerings from competitors like OpenAI, Anthropic, and Google, making it suitable for real-time AI applications and agentic workflows.

Meta’s Llama 3.1 models are recognised as the most popular open-source models available today, and the 405B model is the largest open-weights model. The cost and complexity of running such large models have been traditionally high, but SambaNova’s SN40L chips significantly reduce these challenges, offering higher speeds at a lower cost compared to Nvidia H100s.

Industry experts have responded positively to SambaNova Cloud’s speed and efficiency. Dr. Andrew Ng, Founder of DeepLearning.AI, emphasized the importance of token generation speed in agentic AI workflows, highlighting the platform’s unique ability to deliver fast results using large models. Bigtincan and Blackbox AI are among the first to partner with SambaNova to enhance their own AI-driven products.

SambaNova Cloud is now available in three tiers: Free, Developer, and Enterprise. The Free Tier offers free API access to developers starting today, while the Developer and Enterprise tiers will support higher rate limits and scaling capabilities for production workloads.

The SN40L AI chip, with its patented dataflow design and three-tier memory architecture, powers the performance of SambaNova Cloud, making it a key platform for developers building next-generation AI applications.



Source link

Disclaimer

We strive to uphold the highest ethical standards in all of our reporting and coverage. We StartupNews.fyi want to be transparent with our readers about any potential conflicts of interest that may arise in our work. It’s possible that some of the investors we feature may have connections to other businesses, including competitors or companies we write about. However, we want to assure our readers that this will not have any impact on the integrity or impartiality of our reporting. We are committed to delivering accurate, unbiased news and information to our audience, and we will continue to uphold our ethics and principles in all of our work. Thank you for your trust and support.

Popular

More Like this

SambaNova Launches Fastest AI Platform with Llama 3.1 405B at 132 Tokens per Second


SambaNova Systems has introduced SambaNova Cloud, the fastest AI inference platform available today, powered by its SN40L AI chip. The platform offers developers immediate access to Meta’s Llama 3.1 models, including the 405B model, at full 16-bit precision and at a rate of 132 tokens per second (t/s). 

The Llama 3.1 70B model runs at 461 t/s. The service is now open to developers without a waiting list.

Cerebras Inference recently announced that it delivers 1,800 tokens per second for the Llama 3.1 8B model and 450 tokens per second for the Llama 3.1 70B model, making it 20 times faster than NVIDIA GPU based hyperscale clouds. Meanwhile, Groq can achieve over 500 tokens per second on the Llama 3.1 70B model. 

SambaNova Cloud supports both the Llama 3.1 70B model, designed for agentic AI applications, and the 405B model, the largest open-source AI model available. 

According to SambaNova CEO Rodrigo Liang, this versatility offers developers the ability to run high-speed, lower-cost models as well as the highest fidelity model at full precision. “Enterprise customers want versatility – 70B at lightning speeds for agentic AI systems, and the highest fidelity 405B model for when they need the best results. SambaNova Cloud is the only platform that offers both today,” he said.

Artificial Analysis independently benchmarked SambaNova Cloud, confirming its performance as the fastest available AI platform for Llama 3.1 models. The service surpassed offerings from competitors like OpenAI, Anthropic, and Google, making it suitable for real-time AI applications and agentic workflows.

Meta’s Llama 3.1 models are recognised as the most popular open-source models available today, and the 405B model is the largest open-weights model. The cost and complexity of running such large models have been traditionally high, but SambaNova’s SN40L chips significantly reduce these challenges, offering higher speeds at a lower cost compared to Nvidia H100s.

Industry experts have responded positively to SambaNova Cloud’s speed and efficiency. Dr. Andrew Ng, Founder of DeepLearning.AI, emphasized the importance of token generation speed in agentic AI workflows, highlighting the platform’s unique ability to deliver fast results using large models. Bigtincan and Blackbox AI are among the first to partner with SambaNova to enhance their own AI-driven products.

SambaNova Cloud is now available in three tiers: Free, Developer, and Enterprise. The Free Tier offers free API access to developers starting today, while the Developer and Enterprise tiers will support higher rate limits and scaling capabilities for production workloads.

The SN40L AI chip, with its patented dataflow design and three-tier memory architecture, powers the performance of SambaNova Cloud, making it a key platform for developers building next-generation AI applications.



Source link

Disclaimer

We strive to uphold the highest ethical standards in all of our reporting and coverage. We StartupNews.fyi want to be transparent with our readers about any potential conflicts of interest that may arise in our work. It’s possible that some of the investors we feature may have connections to other businesses, including competitors or companies we write about. However, we want to assure our readers that this will not have any impact on the integrity or impartiality of our reporting. We are committed to delivering accurate, unbiased news and information to our audience, and we will continue to uphold our ethics and principles in all of our work. Thank you for your trust and support.

Website Upgradation is going on for any glitch kindly connect at office@startupnews.fyi

More like this

TCS: TCS opens delivery centre in Poland, to double...

Indian IT giant Tata Consultancy Services (TCS) on...

Global Startup Summit 2024: Key Insights, Networking, and Opportunities...

The 8th edition of the Global Startup Summit, co-powered...

Indian edtech firm Physics Wallah raises $210m series B

It began as a YouTube channel in 2014...

Popular

Upcoming Events

Startup Information that matters. Get in your inbox Daily!