Tech startup proposes new way to tackle massive LLMs using fastest memory available to mankind
GPU-like PCIe cards offer 10pflops of FP4 Compute Power and 2GB of SRAM SRAM is usually used in small amounts as cache in processors (L1 to L3) It also uses LPDDR5 rather than far more expensive HBM memory Microsoft-backed Silicon Valley startup D-Matrix has developed a chiplet-based solution designed for fast, small-batch inference of LLMs […]









