Five ways to fine‑tune LLMs

A visual explainer that’s been circulating breaks down five common large‑language‑model fine‑tuning approaches — practical techniques people actually use to adapt base models for tasks like summarization or retrieval‑augmented generation. The piece is catching traction online because it translates abstract training ideas into hands‑on options you can apply or evaluate when choosing vendors or building in‑house. (x.com)

A simple chart about fine-tuning large language models is spreading because it does something rare in AI. It makes a messy subject legible. Instead of treating “fine-tuning” as one magic step, it shows that people now use several distinct ways to adapt a base model, each with a different tradeoff in cost, memory, and control. That matters because the phrase gets thrown around loosely. In practice, a team choosing between vendors or building its own stack is usually choosing between very different techniques under the same label. (developers.openai.com) The first thing the explainer gets right is the problem itself. Full fine-tuning means updating all of a model’s weights. That works, but it is expensive enough to be out of reach for many teams once models get large. The original LoRA paper was influential because it showed you could freeze the pretrained model and train only small low-rank update matrices instead, slashing the number of trainable parameters while keeping strong downstream performance. That idea changed the field because it turned fine-tuning from a giant-lab privilege into something many smaller groups could actually run. (arxiv.org) From there, the chart follows the branch that now dominates practical work: LoRA and its descendants. Basic LoRA inserts two small matrices alongside a large weight matrix and learns only those updates. LoRA-FA pushes the efficiency further by freezing one of those matrices and training the other, which cuts activation memory without the usual performance hit. VeRA goes further still. It shares random low-rank matrices across layers and learns only tiny scaling vectors, reducing storage for each adapted model. These are not abstract tweaks. They are engineering answers to a real deployment problem: what happens when you need many specialized versions of one base model and cannot afford to store or train each one the old way. (arxiv.org) That same pressure explains why another family of methods took off. QLoRA keeps the LoRA idea but quantizes the frozen base model to 4-bit precision, which cuts memory enough to fine-tune a 65 billion parameter model on a single 48 GB GPU while preserving the quality of full 16-bit fine-tuning in the paper’s tests. DoRA attacks a different weakness. Its authors argued that LoRA still leaves an accuracy gap versus full fine-tuning, so they split each weight into magnitude and direction and adapt them separately, using LoRA for the directional part. The point is not that one method wins forever. It is that the field keeps inventing ways to get closer to full fine-tuning quality without paying full fine-tuning costs. (arxiv.org) Once you see that pattern, another confusion clears up. Fine-tuning is not the same thing as retrieval-augmented generation, even though people often compare them as if they were competing brands of the same tool. RAG does not mainly change the model’s weights. It retrieves relevant documents at inference time and feeds them into the prompt so the model can answer with current or private information. Microsoft’s documentation frames this as the classic “chat over my data” setup. Fine-tuning, by contrast, is better when you want the model to learn a format, style, behavior, or domain pattern directly. If your problem is missing knowledge that changes often, RAG is usually the cleaner fix. If your problem is that the model answers in the wrong way, fine-tuning is often the right lever. (learn.microsoft.com) That is also why newer guides now separate plain supervised fine-tuning from preference tuning. Supervised fine-tuning trains on input-output pairs. The model learns to imitate the examples you give it. Direct Preference Optimization trains on comparisons between a better answer and a worse one, teaching the model what to prefer rather than what to copy exactly. OpenAI’s developer guide now presents SFT, DPO, and reinforcement fine-tuning as different tools for different jobs, not as interchangeable upgrades on a ladder. The distinction matters because a summarization system, a customer-support bot, and a reasoning agent often fail in different ways, and the training method should match the failure. (developers.openai.com) The viral chart is landing because it captures the real shape of the market. Most teams are not deciding whether to fine-tune in the abstract. They are deciding whether they need a cheap adapter like LoRA, a more memory-frugal variant like LoRA-FA or VeRA, a quantized route like QLoRA, a closer approximation to full tuning like DoRA, or no weight update at all because RAG solves the actual problem. That is a much more useful question than asking whether a model has been “fine-tuned,” and it starts with two little matrices named A and B.

Get your own daily briefing

Scout delivers personalized news, insights, and conversations tailored to your role and industry.

Download on the App Store

Shared from Scout - Be the smartest in the room.