Sambanova hits 198 tokens per day. Second on the full, non-distilled Deepseek-R1 671B with only 16 SN40L RDU chips

Sambanova runs Deepseek-R1 on 198 tokens/sec using 16 custom chips
SN40L RDU -Chip is reportedly 3x faster, 5x more effective than GPUs
5x speed boost promises soon with 100x capacity at the end of the year on the cloud

Chinese AI Upstart Deepseek has very quickly given a name to itself in 2025 with its R1 large open source language model, built for advanced reasoning tasks that show performance on par with the industry’s top models, while it is more cost-effective.

Sambanova Systems, an AI startup founded in 2017 by experts from Sun/Oracle and Stanford University, has now announced what it claims is the world’s fastest implementation of Deepseek-R1 671B LLM to date.

The company says it has achieved 198 tokens per day. Second per User by using only 16 custom -built chips and replacing the 40 racks of 320 NVIDIA GPUs that would typically be required.

Independently verified

“Powered by the SN40L RDU chip, Sambanova is the fastest platform running Deepseek,” said Rodrigo Liang, CEO and co-founder of Sambanova. “This will rise to 5x faster than the latest GPU speed on a single rack and at the end of the year, we will offer 100x capacity to Deepseek-R1.”

While Nvidia’s GPUs have traditionally driven large AI workloads, Sambanova claims that its reconfigurable data flow architecture offers a more effective solution. The company claims that its hardware delivers three times the speed and five times the efficiency of leading GPUs while maintaining the full reasoning strength of Deepseek-R1.

“Deepseek-R1 is one of the most advanced border-IA models available, but its full potential has been limited by the inefficiency of GPUs,” Liang said. “It is changing today. We bring the next big breakthrough – collapsed inference costs and reduce hardware requirements from 40 racks to only one – to offer Deepseek -R1 at the fastest speeds, effectively. “

George Cameron, co-founder of AI Evaluating Firm Artificial Analysis, said his company had “independently benchmark Sambanova’s cloud installation of the full 671 billion parameter Deepseek-R1 mixture of experts model over 195 output tokens/s, the fastest output speed, we we have ever measured to Deepseek-R1. High output speeds are especially important for reasoning models as these models use reasoning production token to improve the quality of their answers. Sambanova’s high output speeds support the use of reasoning models in latency -sensitive use cases. “

Deepseek-R1 671B is now available on Sambanova Cloud, with API access offered to select users. The company scales capacity quickly and says it hopes to reach 20,000 symbols per year. Second of the total tripod flow “in the near future”.

(Image Credit: Artificial Analysis)

Must Read

Leave a Comment Cancel Reply