- Exo supports Llama, Mistral, Llava, Qwen and Deepseek
- Can run on Linux, MacOS, Android and iOS but not Windows
- AI models that need 16 GB RAM can run on two 8 GB of laptops
Driving large language models (LLMs) typically requires expensive, high performance hardware with significant memory and GPU power. However, EXO software now seems to offer an alternative by enabling distributed artificial intelligence (AI) inferens across a network of devices.
The company allows users to combine computing power on multiple computers, smartphones and even single-table computers (SBCs) such as Raspberry Pi’s to run models that would otherwise be inaccessible.
This decentralized approach shares similarities to the SETI@Home project that distributed computer tasks across volunteer machines. By utilizing a peer-to-peer (P2P) network, EXO eliminates the need for a single, powerful system, making AI-Inference more accessible to individuals and organizations.
How Exo Distributes AI workloads
Exo aims to challenge the dominance of major technology companies in AI development. By decentralizing the inference, it seeks to give individuals and smaller organizations more control over AI models, similar to initiatives that focus on expanding access to GPU resources.
“The basic restriction with AI is calculated,” argues Alex Cheema, co -founder of Exo Labs. “If you haven’t calculated, you can’t compete. But if you create this distributed network we might.”
The software shares dynamic LLMs across available devices into a network that assigns model layers based on each machine available memory and treatment effect. Supported LLMs include Llama, Mistral, Llava, Qwen and Deepseek.
Users can install EXO on Linux, MacOS, Android or iOS, even if Windows support is not currently available. A minimum python version of 3.12.0 is required along with additional dependencies for systems running Linux equipped with NVIDIA GPUs.
One of the Exo’s key forces is that, unlike traditional setups that depend on advanced GPUs, it enables collaboration between different hardware configurations.
For example, an AI model that requires 16 GB of RAM can run on two 8 GB of laptops that work together. A more demanding model like Deepseek R1, which requires approx. 1.3 TB RAM, theoretically, theoretically operated on a cluster of 170 Raspberry Pi 5 devices with 8 GB of RAM each.
Network speed and latency are critical concerns, and EXOS developers acknowledge that the addition of lower performance devices can slow down inference latency, but insists that the overall flow is improved with each device added to the network.
Safety risks also occur when multiple machines share workloads, requiring protective measures to prevent data leaks and unauthorized access.
Adoption is another obstacle as developers of AI tools are currently dependent on large data centers. The low cost of Exo’s approach can appeal. But Exo’s approach simply does not match the speed of the advanced AI clusters.
Via CNX software