Fine‑tuning just got practical
A new 115‑page 'Ultimate Guide' breaks down PEFT methods (LoRA, QLoRA, DoRA), pruning and alignment workflows, while Together AI says it’s scaled MoE fine‑tuning to 1T parameters with 6x throughput for tool-calling and vision-language models—both point to fine‑tuning at scale becoming operational. The combination of practical docs plus MoE throughput gains means fine‑tuning is moving from experiments to production paths for specialized enterprise models. (x.com) (x.com)
A comprehensive arXiv survey titled "The Ultimate Guide to Fine‑Tuning LLMs" (arXiv:2408.13296) was submitted on Aug 23, 2024 and revised Oct 30, 2024 and is being circulated as a 115‑page practitioner reference. (arxiv.org) The paper explicitly catalogs parameter‑efficient techniques — naming LoRA, QLoRA and DoRA — and lays out a seven‑stage fine‑tuning pipeline that includes data prep, hyperparameter tuning, pruning and alignment workflows. (arxiv.org) Together AI’s fine‑tuning update adds native tool‑calling, reasoning traces and vision‑language model fine‑tuning, and the company reports up to 6× higher throughput for 100B+ parameter fine‑tuning workloads while supporting dataset sizes up to 100GB. (together.ai) Together’s model catalog and docs list Kimi K2 (Moonshot AI) as a Mixture‑of‑Experts architecture with ~1 trillion total parameters and ≈32B activated parameters per token, a 256K context window, and an expert layout that selects 8 experts from 384 for each token. (together.ai) The Together release also adds job cost and ETA estimates in the fine‑tuning UI and documents function‑calling APIs for tool use, tying the platform‑level telemetry to the new fine‑tuning flows. (together.ai) Public practitioner repos and writeups — including a GitHub "Ultimate Guide to LLM Fine‑Tuning" repo and recent hands‑on tutorials — now provide runnable LoRA/QLoRA adapter code, merge/merge‑back procedures and production checklists that mirror Together’s operational features. (github.com)