- UK-based fractile is backed by NATO and wants to build faster and cheaper AI calculation of memory
- Nvidia’s bruteforce gpu -approach consumes too much power and is held back by memory
- Fractile numbers focused on a cluster of H100 GPU Comparison, not Mainstream H200
Nvidia sits comfortably at the top of the AI hardware -food chain and dominates the market with its high -performance GPUs and CUDA SoftWar stacks, which have quickly become the standard tools for training and running large AI models -but this dominance comes at a cost -namely a growing goal on his back.
Hypercalers like Amazon, Google, Microsoft and Meta pour resources to develop their own custom silicon in an attempt to reduce their dependence on Nvidia’s chips and reduce costs. At the same time, a wave of AI -hardware -startups is trying to exploit the rising demand for specialized accelerators in the hope of offering more efficient or affordable alternatives and ultimately displacing Nvidia.
You may not have heard of UK-based fractile yet, but start-ups claiming that its revolutionary approach to computing can run the world’s largest language models 100x faster and to 1/10. The cost of existing systems has some rather remarkable backers, including NATO and the former CEO of Intel, Pat Gelsinger.
Removing each bottleneck
“We are building the hardware that removes each bottleneck for the fastest possible inference of the largest transformer networks,” says Fractile.
“This means that the biggest LLMs in the world are running faster than you can read, and a universe of brand new capabilities and opportunities for how we work that will be unlocked using almost instant inference of models with superhuman intelligence.”
It is worth pointing out before you get too excited, that Fractile’s performance numbers are based on comparisons with clusters of NVIDIA H100 GPUs using 8-bit quantization and Tensorrt-LM, Llama 2 70b-not the newer H200 chips run.
In a LinkedIn-Estationing, Gelsinger recently wrote VC company Playground Global as a general partner, “Inference of Frontier AI models is bottlenecks of hardware. Even before testing time Compute scaling, costs and latency were huge challenges for large scale LLM implementations … to achieve our aspirations to AII AI, radically faster, chaper and much lower power in lums. “
“I am glad to share that I have recently invested in Fractile, a British-basic AI-Hardware company pursuing a path that is radical enough to offer such a jump,” he then revealed.
“Their in-memory computing approach to inference acceleration jointly tackles the two bottlenecks to scale inference, which overcomes both the memory bottle that holds back in today’s GPUs while decimating the power consumption, and the only greatest physical limitation we face the next annual university in spilling data center capacity!”