Tech startup proposes new way to tackle massive LLMs using fastest memory available to mankind

GPU-like PCIe cards offer 10pflops of FP4 Compute Power and 2GB of SRAM
SRAM is usually used in small amounts as cache in processors (L1 to L3)
It also uses LPDDR5 rather than far more expensive HBM memory

Microsoft-backed Silicon Valley startup D-Matrix has developed a chiplet-based solution designed for fast, small-batch inference of LLMs in enterprise environments. Its architecture takes an all-digital computer-in-memory approach using modified SRAM cells for speed and energy efficiency.

Corsair, D-Matrix’s current product, is described as a “first-of-its-kind AI computing platform” and features two D-Matrix ASICs on a full-height PCIe card with four chiplets per chip. ASIC. It achieves a total of 9.6 PFLOPS of FP4 Compute Power with 2GB of SRAM-based performance memory. Unlike traditional designs that rely on expensive HBM, Corsair uses LPDDR5 capacity memory with up to 256 GB per Short for handling larger models or batch inference workloads.

D-Matrix says Corsair delivers 10x better interactive performance, 3x energy efficiency and 3x cost performance compared to GPU alternatives such as the hugely popular NVIDIA H100.

A leap of faith

Sree Ganesan, Product Manager at D-Matrix, said Ee times“Today’s solutions mostly hit the memory wall with existing architectures. They have to add a lot more compute – Matrix has focused on memory bandwidth and innovation on the memory fighting barrier.”

D-Matrix’s approach eliminates the bottleneck by enabling computation directly within memory.

“We’ve built a digital in-memory compute core where multiply accumulates happen in memory, and you can take advantage of very high bandwidth—we’re talking about 150 terabytes per second,” Ganesan explained. “This, in combination with the series of other innovations, allows us to solve the memory wall challenge.”

CEO Sid Sheth told Ee times The company was founded in 2019 after feedback from hyperscalers suggested that inference was the future. “It was a leap of faith, because inference alone as an option was not perceived as being too big back in 2019,” he said. “Of course, it all changed after 2022 and Chatgpt. We also invest in transformers [networks] Pretty early in the business. “

Corsair will enter mass production in Q2 2025, and D-Matrix is already planning its next-generation ASIC, the Raptor, which will integrate 3D stacked dram to support reasoning workloads and larger memory capacity.

Must Read

Leave a Comment Cancel Reply