LLaMA-Factory Offers No-Code Fine-Tuning
A popular GitHub repository, LLaMA-Factory, is gaining traction for providing a no-code user interface for fine-tuning over 100 different LLMs and Vision-Language Models. The tool supports popular techniques like LoRA and QLoRA and uses vLLM for its backend. It aims to lower the barrier to entry for customizing open-source models.
- The project originated from a paper titled "LlamaFactory: Unified Efficient Fine-Tuning of 100+ Language Models," which is available on arXiv. - Beyond LoRA and QLoRA, it supports a variety of other training techniques including full fine-tuning, freeze-tuning, and advanced algorithms like GaLore, DoRA, and LongLoRA. - For MLOps, it integrates with experiment tracking tools such as TensorBoard, Wandb, and MLflow, and offers a Gradio-based web UI called LlamaBoard for easier management. - In addition to language models, LLaMA-Factory also supports multimodal models like LLaVA and Qwen-VL, enabling tasks such as image and video understanding. - Performance optimizations are a key feature, with support for FlashAttention-2 and Unsloth, which can provide significant speedups and memory reduction during training. - The tool is not limited to just fine-tuning; it also supports continued pre-training, reward modeling, and reinforcement learning methods like PPO and DPO. - It provides an OpenAI-compatible API, allowing for easier integration of fine-tuned models into existing applications and inference pipelines. - The project has expanded its hardware support to include Ascend NPUs, in addition to CUDA and ROCm, demonstrating a focus on broader hardware compatibility.