Micro‑LLMs enable local inference under 30M
- Micro‑LLMs in the 8–30 million parameter range are being highlighted for very low‑compute, on‑device inference for narrowly scoped agent tasks. - Practitioners claim fine‑tuning such small models can cost under $100 and that trimmed Gemma/Qwen variants make local deployment feasible for privacy‑sensitive workflows. - That capability opens low‑latency, private agent use cases on edge devices without cloud dependency. (x.com 1) (x.com 2)