LLM Pricing Landscape Shows Fierce Competition
A February 2026 pricing snapshot reveals significant cost disparities and downward pressure across 18 large language models. OpenAI's open-source GPT-OSS-20B is priced at $0.05 per million input tokens, while Grok-4 costs $30 per million tokens, a 600x difference. Anthropic's Claude Opus 4.6 has seen its price fall to a range of $5 to $25 per million tokens, down from a previous high of $75.
- A key driver of cost is the significant price difference between input and output tokens; across the market, output tokens cost a median of 4 times more than input tokens, with premium models like OpenAI's GPT-5.2 Pro reaching an 8x multiplier ($21 input vs. $168 output per million tokens). - For production applications, inference accounts for the vast majority of a model's lifetime cost, typically 80-90%, dwarfing the initial one-time training expense. For a popular model like DeepSeek V3, the GPU hours spent on its final training run are recouped after approximately 70 days of inference traffic. - The price drops are heavily influenced by inference optimization frameworks; vLLM's PagedAttention innovation and NVIDIA's TensorRT-LLM, which uses deep hardware integration, are key technologies for increasing throughput and GPU utilization. Techniques such as quantization can further reduce operational costs by 60-70%. - While open-source models avoid licensing fees, they require substantial upfront investment in infrastructure and in-house MLOps talent to manage hosting and scaling. In late 2025, the average cost for self-hosted open-source models was estimated at $0.83 per million tokens, 86% cheaper than the $6.03 average for proprietary models. - Competitors in the enterprise search market, such as Glean and Cohere, typically employ hybrid pricing models that combine a predictable base subscription fee with usage-based components for API calls and advanced AI features. - A common architectural pattern for cost savings is capability-based routing, which directs simple queries to inexpensive models and complex reasoning tasks to premium models. This approach can reduce blended costs by 60-70% without a significant impact on quality for most workloads. - The price of achieving GPT-4 level performance has collapsed by 98% since its introduction in 2023, dropping from an initial price of around $60 per million tokens to under $1 per million tokens by early 2026. Some analyses show the cost of equivalent performance falling by a median rate of 50x per year. - Chinese AI labs have become a major factor in price competition, with companies like DeepSeek and Qwen offering high-performance models in the $0.25 to $0.53 per million token range. However, this trend may be shifting, as Zhipu AI increased the price of its GLM-5 model by over 30% in February 2026.