New tuning paper: MiCA

An arXiv paper circulated claiming 'MiCA learns more knowledge than LoRA and full fine‑tuning', which adds fuel to the debate over parameter‑efficient methods versus heavier fine‑tuning. (x.com). The preprint puts another option on the table for teams weighing accuracy gains against compute and memory costs during model customization. (x.com)

# New tuning paper: MiCA A new arXiv paper is making a sharp claim in one of artificial intelligence’s most practical arguments: how should you teach a large language model something new without paying the full cost of retraining it? The paper, posted on April 2, 2026, introduces a method called Minor Component Adaptation, or MiCA, and reports that it can acquire more new knowledge than both Low-Rank Adaptation, or LoRA, and full fine-tuning in the authors’ experiments. (arxiv.org) That claim landed because model customization is no longer a side topic. Companies routinely take a base model and adapt it to legal text, medical notes, customer support logs, coding data, or internal documents. The hard part is that every extra training run costs memory, compute, storage, and time. Parameter-efficient fine-tuning exists to cut that bill by changing only a small slice of the model instead of rewriting the whole thing. (arxiv.org, arxiv.org) The standard baseline for that cheaper approach is LoRA, short for Low-Rank Adaptation. Introduced in 2021, LoRA freezes the original model weights and adds small trainable matrices inside the network, which lets teams adapt very large models with far fewer trainable parameters and lower memory use than full fine-tuning. (arxiv.org, microsoft.com) The intuition behind LoRA is simple: a huge model may not need every weight changed to learn a new task. If the important update can be compressed into a smaller mathematical shape, you can train that compact update and leave the original model mostly untouched. That is why LoRA became a default method across open-source model tuning stacks. (arxiv.org) But there has been a growing argument that “matching task performance” and “learning new knowledge well” are not the same thing. A 2024 paper from researchers at the Massachusetts Institute of Technology and elsewhere found that LoRA and full fine-tuning can reach similar downstream scores while changing models in very different spectral ways, with LoRA introducing what the authors call “intruder dimensions.” (arxiv.org) That older result matters because it shifted the debate from raw benchmark scores to *where* a model stores what it learns. If different tuning methods push updates into different parts of parameter space, then some methods may be better at absorbing fresh facts, while others may be better at preserving older behavior or reducing forgetting. (arxiv.org) MiCA is built directly on that line of thinking. Instead of aiming updates at the dominant directions in a weight matrix, which are the biggest and most active directions, MiCA uses singular value decomposition to find the minor directions tied to the smallest singular values and constrains learning to those underused subspaces. (arxiv.org, arxiv.org) A rough analogy is editing a crowded whiteboard. LoRA tends to write in the large, already busy regions of the board, while MiCA tries to write in the corners that are still mostly empty. The authors argue that those quieter directions are less occupied by the model’s existing representations, so they may be a cleaner place to insert new information. That interpretation follows from the paper’s description of “underutilized subspaces,” though it is still an inference about mechanism rather than a settled fact. (arxiv.org, arxiv.org) The headline number is the one that spread on social media: MiCA reports up to a 5.9 times improvement in knowledge acquisition under optimized training hyperparameters, while using a parameter footprint the paper describes as 6 percent to 60 percent of LoRA’s. Those are unusually strong gains for a method that still sits in the parameter-efficient camp rather than the full fine-tuning camp. (arxiv.org) The wording there matters. The paper says “up to” 5.9 times, which means the best result came under specific settings, not as a blanket average across all possible tasks. It is also a preprint, not a peer-reviewed conference paper yet, so the claims should be read as promising evidence rather than settled consensus. (arxiv.org) Even so, the paper hits a real pain point for teams building products. Full fine-tuning can be expensive because every parameter is trainable. LoRA is cheaper, but some recent work suggests it may store updates in ways that are not ideal for continual knowledge injection. MiCA’s pitch is that you may be able to get stronger knowledge uptake without paying the full memory and compute cost of retraining the entire model. (arxiv.org, arxiv.org, arxiv.org) That does not mean MiCA replaces LoRA tomorrow. LoRA already has years of tooling, library support, production familiarity, and operational know-how behind it. A new method has to prove not just that it wins on paper, but that it behaves reliably across model families, data regimes, hardware setups, and long training runs. (github.com, arxiv.org) The next questions are straightforward. Can outside groups reproduce the gains? Do the gains hold on instruction tuning, domain adaptation, and continual learning workloads beyond the paper’s setup? Does MiCA preserve old capabilities as well as it learns new facts? Those are the tests that will decide whether this is a clever niche result or a genuine new default for model tuning. (arxiv.org, arxiv.org) For now, the paper adds one more serious option to the menu. In a field where every extra percentage point of accuracy competes with graphics processing unit memory limits and training budgets, a method that claims better knowledge acquisition with a smaller parameter footprint will get attention fast. MiCA has that attention now; the harder part starts with replication. (arxiv.org)

New tuning paper: MiCA

Get your own daily briefing