Nvidia’s new Rubin CPX GPU delivers 30 Petaflops Compute and 128 GB of memory for inference

Nvidia announces Rubin CPX GPU with 128 GB memory built for Enterprise AI workload
Vera Rubin NVL144 CPX RACK delivers 8 Exaflops Compute and 100 TB Quick Memory
Shipments scheduled by the end of 2026 with Rubin Ultra and Feynman already on timetable

Nvidia has announced a brand new GPU built on the Rubin architecture and designed for long-context AI workload.

Rubin CPX, as it is known, includes 128 GB GDDR7 memory, making it the company’s first GPU on this property.

There were rumors of a 128 GB RTX game card, but it’s 100% not. This GPU is a calculation engine aimed at inference in areas such as software development, research and high-definition video. It’s not running Metal Gear Solid Delta: Snake Eater anytime soon.

Vera Rubin NVL144 CPX Rack

GPU delivers up to 30 Petaflops of NVFP4 Calculate and integrates hardware attention acceleration, as NVIDIA says, is three times faster than the GB300 NVL72.

It also contains four NVENC and four NVDEC devices to speed up video work.

As part of Nvidia’s wider push against the division of inference, Rubin CPX is designed to handle the calculated-heavy context phase, while other Rubin GPUs and Vera CPU’s address generation tasks.

By concentrating Rubin CPX on context treatment tasks, NVIDIA aims to improve flow and at the same time lower the cost of high value inference costs.

Nvidia’s dynamo software will control the things behind the scenes, give cache transfers with low latency cache and routing across components.

The company’s largest implementation model is Vera Rubin NVL144 CPX rack. Each device integrates 144 Rubin CPX GPUs, 144 Rubin GPUs and 36 Vera CPUs.

Together they provide 8 Exaflops of NVFP4 calculation, 100 TB of high-speed memory and 1.7 PB/s memory bandwidth.

Quantum-X800 Infiniband or Spectrum-X Ethernet with ConnectX-9 Supernics provides the connection.

Shipments of Rubin CPX and NVL144 CPX racks are currently pencil in until the end of 2026 after the recent tape-out at TSMC.

Nvidia’s roadmap includes Rubin Ultra, now expected in 2027, and Feynman, scheduled for 2028.

These designs will expand the ruby architecture with higher density modules, HBM4E memory and faster networks.

Via Videocardz

(Image Credit: Nvidia)

Must Read

Leave a Comment Cancel Reply