On‑device LoRA: QVAC SDK 0.9

Tether CEO Paolo Ardoino announced QVAC SDK 0.9.0 will let teams run LoRA fine‑tuning on device—load a base model locally, train on private data, and export a lightweight adapter for inference. (x.com) The pitch is focused on privacy and efficiency by keeping training and adapter generation off servers. (x.com)

A way to customize an artificial intelligence model without rewriting the whole thing is coming to Tether’s QVAC software kit, and Paolo Ardoino said the update will run that training directly on the device. (gist.github.com) Ardoino said on April 16, 2026 that QVAC SDK version 0.9.0 is due in about 10 days and will let developers load a base model locally, train it on local data, and export a Low-Rank Adaptation, or LoRA, adapter for inference. (gist.github.com) LoRA is a parameter-efficient tuning method: instead of changing every weight in a large language model, it trains a much smaller add-on that can be attached to the base model later. QVAC’s public site says its tools are built around local processing, cross-platform support, and keeping retrieval, biometrics, and other workflows on the device rather than in the cloud. (qvac.tether.io) That fits the pitch Tether has been making since the QVAC launch earlier in April 2026: artificial intelligence should run on phones, laptops, and local machines across Linux, macOS, Windows, Android, and iOS. QVAC’s site says the engine uses Vulkan and is designed to run on “any GPU,” while the company frames the product as an alternative to centralized cloud services. (qvac.tether.io) The technical groundwork for this update was already visible in QVAC Fabric, Tether’s LoRA training framework for heterogeneous graphics processors. A public GitHub repository says the project targets Android phones with Qualcomm Adreno and Arm Mali chips, Apple Silicon on iPhone and Mac, and Advanced Micro Devices, Intel, and Nvidia graphics on desktop systems. (github.com) That repository says the project’s goal is to move fine-tuning beyond Nvidia’s Compute Unified Device Architecture, or CUDA, which has dominated model training on servers. It also says QVAC uses a “dynamic tiling” approach to fit training workloads into the tighter memory limits of mobile graphics processors. (github.com) In March 2026, QVAC published benchmarks for BitNet models, a compressed model design meant to cut memory use. The company said a 125 million parameter BitNet model could be fine-tuned in about 10 minutes on a Samsung S25, and a 1 billion parameter model trained on about 300 documents, or 18,000 tokens, finished in 1 hour 18 minutes on the same phone and 1 hour 45 minutes on an iPhone 16. (huggingface.co) Those numbers came from QVAC’s own testing, not an independent benchmark, and the company has not yet published the research paper its GitHub page lists as “coming soon.” The repository does say it has tested the system across 6 graphics architectures, 5 model families, and 4 quantization levels, with binaries and adapters released for developers to inspect. (github.com) If version 0.9.0 ships on the timetable Ardoino posted, the next step is not a new model from Tether but a new workflow: download a base model, tune it privately on a device, and carry only the lightweight adapter into inference. That keeps the announcement focused on where the training happens, not just where the model runs. (gist.github.com)

On‑device LoRA: QVAC SDK 0.9

Get your own daily briefing