Compressed models hitting mainstream
Multiverse Computing is packaging compressed models from OpenAI, Meta, DeepSeek and Mistral into an app/API to showcase speed and cost wins—firms report compression can cut infra costs by up to ~80%. Compression is rapidly becoming a core cost-optimization lever, not just a research trick. (techcrunch.com) (brics-econ.org)
CompactifAI’s consumer app embeds a tiny local model called “Gilda” that Multiverse says can run offline on-device, and it uses an automatic router named “Ash Nazg” to fall back to cloud models when a device lacks RAM or storage. (techcrunch.com) Sensor Tower data cited by TechCrunch shows the CompactifAI app had under 5,000 downloads in the past month, while Multiverse positions the offering primarily as a developer/enterprise play rather than a mass-market consumer product. (techcrunch.com) Multiverse launched a CompactifAI API on AWS on June 11, 2025, listing the API in AWS Marketplace and describing a serverless inference layer that leverages SageMaker Hyperpod to scale across “hundreds” of GPUs. (multiversecomputing.com) The company closed a €189 million Series B (about $215 million) in June 2025 to commercialize its CompactifAI compression technology, and Multiverse has publicly claimed CompactifAI can shrink LLM size by as much as 95% in some cases. ( ) A customer example cited in Multiverse’s materials quotes Rubén Espinosa, CTO of Luzia, saying the compressed models cut their model footprint by “over 50%” while preserving response quality and lowering latency. (multiversecomputing.com) Ecosystem context: Multiverse added NVIDIA’s newly released Nemotron 3 family to its API to support Nano/Super/Ultra models, NVIDIA’s Nemotron 3 paper describes 1M-token context windows and a hybrid Mixture‑of‑Experts architecture, and DeepSeek’s October 2025 OCR-style “visual compression” has been reported to compress long text inputs roughly 10×. ( ) Academic and industry work also flags trade‑offs: a CVPR workshop paper on model compression highlights hidden accuracy and transfer costs from aggressive compression and pruning, underscoring why Multiverse and buyers emphasize per‑workload validation. (openaccess.thecvf.com)