Device Runs 120B Models in Your Pocket

Tiiny AI has revealed its Pocket Lab, a device capable of running 70B to 120B parameter models on-device. It features a 190 TOPS NPU and 80GB of RAM, enabling complex, multi-agent workflows without any cloud offloading. This kind of hardware is crucial for bringing powerful AI directly to the operational edge in logistics and retail.

The Pocket Lab's ability to run 120B-parameter models is enabled by two key software technologies: TurboSparse for efficient inference and PowerInfer, an open-source engine that balances workloads across the CPU and NPU. It supports one-click deployment for popular open-source models like Llama, Qwen, DeepSeek, Mistral, and OpenAI's GPT-OSS. This combination of hardware and software allows the device to handle what Tiiny AI calls "PhD-level reasoning" and multi-step analysis entirely offline. Measuring just 14.2 x 8 x 2.53 cm and weighing around 300 grams, the device is verified by Guinness World Records as "The Smallest MiniPC (100B LLM Locally)". It operates within a 65W power envelope, a fraction of the energy required by traditional GPU-based systems for similar tasks. The hardware includes a 12-core ARMv9.2 CPU, a custom SoC with a dedicated NPU (dNPU), 80GB of LPDDR5X RAM, and a 1TB SSD. This level of on-device processing power is critical for logistics and retail, where latency and data privacy are significant concerns. Edge computing reduces the reliance on cloud servers, enabling real-time data processing for inventory tracking, fleet management, and warehouse automation. By keeping sensitive operational data on-site, it also enhances security and helps comply with data regulations like GDPR. For warehouse operations, this technology can power sophisticated multi-agent AI systems directly on handhelds or local servers. One agent could manage inventory tracking by analyzing sensor data, another could optimize picking routes for human workers or robots, and a third could predict demand shifts to prevent stockouts—all collaborating in real-time without cloud delays. This moves supply chains from a reactive to a proactive model, where decisions are made autonomously at the source.

Device Runs 120B Models in Your Pocket

Get your own daily briefing