DeepSeek Emerges as Challenger to OpenAI and Meta
The rapid rise of Chinese AI firm DeepSeek is reportedly shaking up the large language model landscape, presenting a new competitive threat to established leaders like OpenAI and Meta. The proliferation of powerful foundation models from new players requires product teams to evaluate and deploy across a more diverse LLM ecosystem.
- DeepSeek was founded in May 2023 by Liang Wenfeng, who also co-founded the Chinese quantitative hedge fund High-Flyer. The company leverages a resource-efficient approach, reportedly training its V3 model with just 2,000 NVIDIA H800 chips for under $6 million, a fraction of the cost reported for training competitors' models. - The company's DeepSeek-V2 model is a Mixture-of-Experts (MoE) model with 236 billion total parameters, but only activates 21 billion for each token, optimizing training costs and inference speed. This model was trained on a corpus of 8.1 trillion tokens, with a focus on English and Chinese data. - For efficient inference, DeepSeek-V2 utilizes a Multi-head Latent Attention (MLA) mechanism, which reduces the key-value (KV) cache requirements. This innovation allows for a context length of up to 128,000 tokens. - In code generation benchmarks, DeepSeek-Coder-V2 has shown performance comparable to or better than closed-source models like GPT4-Turbo, Claude 3 Opus, and Gemini 1.5 Pro. This MoE code model expanded its programming language support from 86 to 338 and increased its context length from 16K to 128K tokens. - On mathematics benchmarks, the DeepSeek-R1 model has demonstrated strong reasoning capabilities, outperforming OpenAI's o1-1217 on the MATH-500 benchmark with a score of 97.3% and scoring competitively on the AIME 2024 benchmark. - DeepSeek has adopted an open-source approach for some of its models, releasing model weights and detailed technical papers to foster community collaboration and transparency. This contrasts with the more "black-box" approach of some competitors.