- GPUs handle prefill operations by converting prompts into key-value caches
- SambaNova RDUs generate tokens with high throughput and low latency
- Intel Xeon 6 processors manage workload distribution and execute compiled code
Intel and SambaNova Systems have introduced a joint hardware plan that combines GPUs, SambaNova RDUs and Intel Xeon 6 processors for large-scale inference workloads.
The system allocates GPUs for prefill operations, RDUs for decoding, and Xeon CPUs for execution and orchestration tasks across agent-driven environments.
“Agentic AI is moving into production—and the winning pattern we’re seeing is GPUs to start the job, Intel Xeon 6 to run it, and SambaNova RDUs to finish it quickly,” said Rodrigo Liang, CEO and co-founder of SambaNova Systems.
The article continues below
The CPU is the execution and control layer
This design is scheduled to be available in the second half of 2026 for enterprises, cloud providers, and sovereign deployments.
The architecture places Intel Xeon 6 processors at the center of system control, where they manage workload distribution, execute code, and coordinate tool interactions.
It includes handling compilation, validating output, and maintaining communication between concurrent processes.
“When thousands of concurrent encoding agents generate tool calls, fetch requests, code builds and encrypted messages between agents, the CPU is not a background component – it is the executive and action layer of the system,” said Harry Ault, CRO at SambaNova.
The statement defines the CPU as the primary layer responsible for system behavior rather than a supporting component.
According to SambaNova, Xeon 6 delivers more than 50% faster LLVM compile times compared to Arm-based server CPUs.
It also delivers up to 70% faster vector database performance compared to other x86-based systems.
These numbers relate to execution speed within encoding and retrieval workflows, and in this configuration GPUs handle the prefill stage by converting prompts to key-value caches.
SambaNova RDUs act as the decoding layer and generate high-throughput, low-latency tokens.
Xeon 6 processors act as both host CPUs and execution engines, managing system-level operations and running compiled workloads.
“Production conclusions are moving toward heterogeneous hardware—no single chip type is optimal for every step of an agency workflow,” said Banghua Zhu, co-founder and CTO of RadixArk.
He added that the combination of RDUs with Xeon CPUs allows the systems to maintain compatibility with existing software environments.
The system is designed to run inside existing air-cooled data centers without requiring new construction.
According to the companies, this allows scaling of inference workloads without additional strain on water and energy resources.
As Nvidia and Groq continue to focus on improving inference throughput and latency, this announcement adds a layer of competition.
It offers an alternative approach that distributes workloads across multiple hardware layers rather than relying on a single processing model.
Follow TechRadar on Google News and add us as a preferred source to get our expert news, reviews and opinions in your feeds. Be sure to click the Follow button!
And of course you can too follow TechRadar on TikTok for news, reviews, video unboxings, and get regular updates from us on WhatsApp also.



