Developer runs 4‑Mac cluster for 70B
- On June 3, 2026, developer noisyb0y1’s four-Mac local AI cluster using the open-source EXO framework circulated widely across X and developer forums. - EXO says it can split models across multiple Macs and serve standard APIs, while the viral post framed the setup at $12 monthly power. - EXO’s GitHub repository and project site list Apple Silicon clustering, Thunderbolt support and OpenAI-compatible APIs for users testing similar setups.
A four-Mac Apple Silicon cluster built with the open-source EXO framework has become a fresh reference point in the local-AI debate, after a developer posted a home setup claiming it could run 70B-class models without renting cloud GPUs. The post, published on X by noisyb0y1, framed the system as a privacy-first alternative for inference workloads that would otherwise be pushed to external providers. EXO’s own documentation says the software “connects your Macs and workstations into one local inference cluster” and can split models across devices instead of forcing them to fit on one machine. The project’s GitHub page showed more than 45,000 stars on June 3, a sign of how much attention distributed local inference is drawing among developers. ### How does a four-Mac setup get to a 70B-class model at all? EXO says the core trick is model sharding. Its project site says the framework finds devices, reads the network, places model shards across available memory and then exposes standard APIs to clients. That matters on Apple Silicon because unified memory on a single Mac often caps what a developer can load locally, while several machines together can provide enough aggregate memory for larger weights and cache. (exolabs.net) GitHub’s repository description is similarly direct. The README headline describes EXO as a way to “Run frontier AI locally,” and the project says it enables models “larger than would fit on a single device.” The company site also says the software is compatible with OpenAI-, Claude-, Responses- and Ollama-style APIs, which lowers the work needed to swap a local cluster in for cloud inference during testing. (exolabs.net) ### Why are Apple Silicon machines showing up in these experiments? Apple Silicon has become a recurring target for this kind of build because memory and power are the limiting factors for local inference. EXO’s site explicitly highlights Apple Silicon, MLX and Thunderbolt, and says the system can track link type, latency, bandwidth and available compute before placing model shards. That means the framework is designed around the practical constraints that determine whether a multi-machine cluster is usable or just a demo. (github.com) Third-party writeups over the past year have documented similar attempts. A Medium post described self-hosting Llama 70B on a three-node Apple Silicon cluster with Exo and MLX, while other recent guides have pitched two-to-four Mac clusters as a way to run Llama 70B locally without cloud APIs. Those reports are not the source of noisyb0y1’s specific cost claim, but they show the setup is part of a broader pattern rather than an isolated stunt. (exolabs.net) ### What is the cost argument developers are making? The viral X post’s most repeated comparison was economic: about $12 a month in electricity versus an estimated $1,900 cloud bill. Reuters could not independently verify that exact comparison because it depends on the model, usage pattern, hardware mix, local power price and the cloud service being replaced. But the argument tracks with the way EXO is marketed. Its site says “No cloud account needed,” and community guides describe the software as a way to use machines a developer already owns rather than rent external compute. (medium.com) The tradeoff is performance and complexity. EXO says it supports distributed inference and lists RDMA over Thunderbolt, but local clusters still rely on interconnect speed, memory balance and model placement. A cluster can reduce cloud spend for steady workloads, while bursty or latency-sensitive production traffic may still favor rented infrastructure, depending on the operator’s requirements. That comparison is an inference based on EXO’s published design and on how distributed inference systems generally behave. (exolabs.net) ### Why did this spill into manufacturing-adjacent and edge-compute debates? Developers discussing the post tied it to a wider question: whether cheap, power-efficient local clusters can handle private workloads near where data is created. EXO’s site says it connects “Macs and workstations” into one cluster, not just identical machines, and serves normal APIs once the cluster is up. That makes the idea relevant beyond hobby use, including labs, offices and small industrial teams that want local inference without building a rack-scale GPU system. (exolabs.net) The next reference points are likely to come from benchmarks rather than screenshots. EXO’s GitHub repository was updated as recently as June 3, and the project site points users to its GitHub, Discord, Hugging Face releases and X account for new model support, setup data and hardware results. (github.com) (exolabs.net)