- AMD highlighted the MI350 Series on Hot Chips 2025 with Node-to-Rack Scalability
- MI355X DLC Rack has 128 GPUs 36TB HBM3E and 2.6 Exaflops
- Nvidias Vera Rubin system coming next year is a maximum scale animal
AMD used the recent Hot Chips 2025 event to talk more about the CDNA 4 architecture that drives its new instinct MI350 series and shows how its accelerators are scaled from the node to rack.
The MI350 series platforms combine the 5th gene EPYC CPUs, MI350 GPUs and AMD Pollara Nics in OCP standard design with UEC supported network. Bandwidth is delivered through Infinity Fabric at up to 1075 GB/s.
At the top end of this is the MI355X DLC ‘ORV3’ Rack, a 2ou system with 128 GPUs, 36 TB HBM3E memory and top flow of 2.6 Exaflops at FP4 precision (there is also a 96-GPU EIA version with 27 TB HBM3E).
Here’s Vera Rubin
At the Node level, AMD presented flexible designs for both air -cooled and liquid -cooled systems.
A MI350X platform with 8 GPUs achieves 73.8 Petaflops on FP8, while the liquid-cooled MI355X platform reaches 80.5 Petaflops FP8 in a closer form factor.
AMD also confirmed its roadmap. The chip giant debuted the MI325X in 2024, the MI350 family arrived earlier in 2025, and the instinct MI400 is set to perform in 2026.
The MI400 will offer up to 40 Petaflops FP4, 20 Petaflops FP8, 432 GB HBM4 memory, 19.6TB/s bandwidth and 300 GB/s scale-out per GPU.
AMD says the performance basket from MI300 to MI400 shows accelerated gains rather than step -by -step steps.
The elephant in the room is of course Nvidia, who plans its ruby architecture in 2026–27. The Vera Rubin NVL144 system, which is pencil in for the second half of next year, (according to slides shared by NVIDIA) will be evaluated at 3.6 Exaflops FP4 inferences and 1.2 Exaflops FP8 training. It has 13TB/s of HBM4 ribbon width and 75 TB Quick memory that delivers a 1.6x gain over its predecessor.
Nvidia integrates 88 custom arm CPU kernels with 176 threads, associated with 1.8 TB/s of NVLink-C2C, along with 260TB/s of NVLink6 and 28.8TB/S of CX9 interconnect.
By the end of 2027, the Nvidia Rubin Ultra NVL576 system has planned. This will be evaluated for 15 Exaflops FP4 -inference and 5 Exaflops FP8 training, with 4.6 PB/S of HBM4E ribbon width, 365TB Quick memory and interconnection speeds of 1.5 pb/s with NVLink7 and 115TB/s using CX9.
On a full scale, the Rubin system will include 576 GPUs, 2,304 HBM4E stacks a total of 150 TB memory and 1,300 trillion transistors, supported by 12,672 Vera CPU kernels, 576 ConnectX-9 NICS, 72 Bluefield DPUs and 144 NVLINK Switches-nominal of 1,500pb/S.
It is a densely integrated, monolithic animal aimed at maximum scale.
While it is fun to compare the AMD and Nvidia numbers, it is obviously not exactly fair. AMDS MI355X DLC RACK is a product that is detailed in 2025, while Nvidia’s Rubin Systems remains timetable for 2026 and 2027. Still, it is an interesting glimpse of how each company frames the next wave of AI infrastructure.



