Meta found a way to turn retired server RAM into cheap hyperscale memory expansion without buying new DRAM

Meta-recycled retired DDR4 memory instead of buying expensive new DRAM
CXL technology turned discarded server memory into useful computing capacity
Meta reported 25% fewer servers for machine learning inference workloads

Memory shortages, rising DRAM prices and extended delivery schedules have pushed hyperscalers toward alternatives that only recently seemed impractical, but Meta has developed a way to recycle old DDR4 memory retrieved from decommissioned servers instead of discarding it.

The approach allows companies to expand server memory capacity without buying new DRAM, a cost researchers describe as the so-called RAM tax.

This expansion is made possible through Compute Express Link (CXL) technology, which connects the older DDR4 modules together with newer DDR5 memory pools on the same machine.

Recycling old memory instead of buying new memory

Meta describes the approach as providing memory expansion at near-zero cost while significantly reducing electronic waste and emissions from infrastructure.

The strategy comes at a time when memory supply constraints continue to impact server deployment schedules across cloud computing environments worldwide.

According to Meta researchers, existing CXL implementations struggled because extended memory provided nearly ten times lower bandwidth than local memory.

The company also reported about 60% higher latency levels compared to direct-attached memory located next to processor sockets inside servers.

Another limitation involved commercial CXL products bundling controllers with DRAM modules, preventing the practical reuse of existing DDR4 stocks on a large scale.

Meta responded by developing an in-house ASIC known as Vistara, specifically designed around low latency, power efficiency and recycled memory usage.

The included software stack automatically determines appropriate memory ratios for individual workloads, while disabling expansion where delays become unacceptable operational compromises.

“We are addressing these challenges via hardware-software co-design. On the hardware side, we are designing an in-house CXL ASIC, Vistara, optimized for DRAM reuse, power efficiency and low latency,” said Meta.

“On the software side, we build an optimized solution based on TPP (Transparent Page Placement), determine the appropriate local-to-extended memory ratio for each workload, and automate per-workload configuration, including disabling extended memory for workloads that cannot tolerate the increased latency.”

Meta claims that the architecture demonstrates enough practical value to justify implementation in production environments that deal with various computational demands on a daily basis.

Meta reported that disaggregated machine learning inference workloads achieved server count reductions reaching as high as 25% through deployment.

Distributed cache systems reportedly recorded average latency reductions of around 29%, despite relying in part on slower recycled memory resources underneath.

The results suggest that additional capacity sometimes outweighs raw memory speed when applications struggle more with shortages than response times.

Interestingly, the same interconnect technology that is attracting Meta’s attention is attracting interest from semiconductor companies developing large accelerator fabrics globally.

The broader ecosystem includes work by companies pursuing alternatives to proprietary interconnect technologies, such as Nvidia’s widely used NVLink systems.

Among these is the Ultra Accelerator Link, or UAL, a separate initiative backed by AMD, AWS, Google, Microsoft and Meta to connect accelerators across different hardware vendors.

Within Meta’s own testing, disaggregated machine learning inference systems and distributed caching infrastructure were the two workloads directly examined by researchers.

Both recorded measurable improvements from the recycled memory approach, where inference systems require fewer servers and caches that experience lower average latency.

Whether recycled DDR4 for CXL becomes standard practice will likely depend on whether performance trade-offs remain acceptable outside of hyperscale environments.

Via Blocksandfiles

Follow TechRadar on Google News and add us as a preferred source to get our expert news, reviews and opinions in your feeds. Be sure to click the Follow button!

And of course you can too follow TechRadar on TikTok for news, reviews, video unboxings, and get regular updates from us on WhatsApp also.

Must Read

Leave a Comment Cancel Reply