New Open-Source 'Tulu 3' Model Enters the Ring
A new open-source model called Tulu 3 is being touted as a strong competitor to proprietary giants. Benchmarks reportedly show Tulu 3 matching or exceeding models like GPT-4 and Claude 3 on several reasoning and coding tasks, with a focus on modularity for easier fine-tuning.
Developed by the Allen Institute for AI (AI2), the Tulu 3 family of models is built upon Llama 3.1 base models. This initiative aims to bridge the gap between proprietary and open post-training recipes for language models. Tulu 3's training process involves a four-stage recipe that includes supervised fine-tuning (SFT), Direct Preference Optimization (DPO), and a novel technique called Reinforcement Learning with Verifiable Rewards (RLVR). This RLVR method is designed to improve performance on tasks with verifiable outcomes, like math problems. In a move towards greater transparency in AI development, AI2 has made the entire Tulu 3 pipeline open, releasing the training datasets, data curation tools, training code, and evaluation suites. This level of openness is intended to foster reproducibility and spur further research in the open-source community. Performance benchmarks show that Tulu 3 models are competitive with, and in some cases surpass, other open-weight models of similar size such as Llama 3.1-Instruct and Qwen2.5-Instruct. The largest model in the family, Tulu 3 405B, is also competitive with closed models like GPT-4o-mini and Claude 3.5-Haiku. The open-source nature of Tulu 3, released under an Apache 2.0 license, allows developers to avoid the restrictive licensing and proprietary limitations of other high-performance models. This provides startups with greater flexibility and control over their AI implementations. For engineers looking to experiment with Tulu 3 locally, the recommended hardware includes a high-end GPU such as an RTX 4090 or RTX A6000, at least 8 GB of RAM, and 200 GB of disk space.