DeepSeek Emerges as Challenger to OpenAI and Meta
What happened
The rapid rise of Chinese AI firm DeepSeek is reportedly shaking up the large language model landscape, presenting a new competitive threat to established leaders like OpenAI and Meta. The proliferation of powerful foundation models from new players requires product teams to evaluate and deploy across a more diverse LLM ecosystem.
Why it matters
- DeepSeek was founded in May 2023 by Liang Wenfeng, who also co-founded the Chinese quantitative hedge fund High-Flyer. The company leverages a resource-efficient approach, reportedly training its V3 model with just 2,000 NVIDIA H800 chips for under $6 million, a fraction of the cost reported for training competitors' models. - The company's DeepSeek-V2 model is a Mixture-of-Experts (MoE) model with 236 billion total parameters, but only activates 21 billion for each token, optimizing training costs and inference speed. This model was trained on a corpus of 8.1 trillion tokens, with a focus on English and Chinese data. - For efficient inference, DeepSeek-V2 utilizes a Multi-head Latent Attention (MLA) mechanism, which reduces the key-value (KV) cache requirements. This innovation allows for a context length of up to 128,000 tokens. - In code generation benchmarks, DeepSeek-Coder-V2 has shown performance comparable to or better than closed-source models like GPT4-Turbo, Claude 3 Opus, and Gemini 1.5 Pro. This MoE code model expanded its programming language support from 86 to 338 and increased its context length from 16K to 128K tokens. - On mathematics benchmarks, the DeepSeek-R1 model has demonstrated strong reasoning capabilities, outperforming OpenAI's o1-1217 on the MATH-500 benchmark with a score of 97.3% and scoring competitively on the AIME 2024 benchmark. - DeepSeek has adopted an open-source approach for some of its models, releasing model weights and detailed technical papers to foster community collaboration and transparency. This contrasts with the more "black-box" approach of some competitors.
Key numbers
- - DeepSeek was founded in May 2023 by Liang Wenfeng, who also co-founded the Chinese quantitative hedge fund High-Flyer.
- The company leverages a resource-efficient approach, reportedly training its V3 model with just 2,000 NVIDIA H800 chips for under $6 million, a fraction of the cost reported for training competitors' models.
- The company's DeepSeek-V2 model is a Mixture-of-Experts (MoE) model with 236 billion total parameters, but only activates 21 billion for each token, optimizing training costs and inference speed.
- This model was trained on a corpus of 8.1 trillion tokens, with a focus on English and Chinese data.
What happens next
- DeepSeek was founded in May 2023 by Liang Wenfeng, who also co-founded the Chinese quantitative hedge fund High-Flyer.
Quick answers
What happened in DeepSeek Emerges as Challenger to OpenAI and Meta?
The rapid rise of Chinese AI firm DeepSeek is reportedly shaking up the large language model landscape, presenting a new competitive threat to established leaders like OpenAI and Meta. The proliferation of powerful foundation models from new players requires product teams to evaluate and deploy across a more diverse LLM ecosystem.
Why does DeepSeek Emerges as Challenger to OpenAI and Meta matter?
DeepSeek was founded in May 2023 by Liang Wenfeng, who also co-founded the Chinese quantitative hedge fund High-Flyer. The company leverages a resource-efficient approach, reportedly training its V3 model with just 2,000 NVIDIA H800 chips for under $6 million, a fraction of the cost reported for training competitors' models. The company's DeepSeek-V2 model is a Mixture-of-Experts (MoE) model with 236 billion total parameters, but only activates 21 billion for each token, optimizing training costs and inference speed. This model was trained on a corpus of 8.1 trillion tokens, with a focus on English and Chinese data. For efficient inference, DeepSeek-V2 utilizes a Multi-head Latent Attention (MLA) mechanism, which reduces the key-value (KV) cache requirements. This innovation allows for a context length of up to 128,000 tokens. In code generation benchmarks, DeepSeek-Coder-V2 has shown performance comparable to or better than closed-source models like GPT4-Turbo, Claude 3 Opus, and Gemini 1.5 Pro. This MoE code model expanded its programming language support from 86 to 338 and increased its context length from 16K to 128K tokens. On mathematics benchmarks, the DeepSeek-R1 model has demonstrated strong reasoning capabilities, outperforming OpenAI's o1-1217 on the MATH-500 benchmark with a score of 97.3% and scoring competitively on the AIME 2024 benchmark. DeepSeek has adopted an open-source approach for some of its models, releasing model weights and detailed technical papers to foster community collaboration and transparency. This contrasts with the more "black-box" approach of some competitors.