Context windows surge

Conversation among creators and developers has shifted from model‑size bragging to very large context windows — for example a 2 million‑token context scenario being discussed publicly — which changes how long documents and codebases can be handled in one pass (youtube.com). That reframing pushes product questions toward memory reliability, execution over long inputs, and tool orchestration instead of pure reasoning benchmarks (youtube.com).

A context window is the amount of text, code, images, or audio a model can keep “in view” at once, and the ceiling has jumped from hundreds of thousands of tokens to as much as 2 million in public products. (openai.com) (developers.googleblog.com) Google said in June 2024 that Gemini 1.5 Pro was opening a 2 million-token context window to all developers, after first previewing 1 million tokens in February 2024 and expanding that access in April 2024. (developers.googleblog.com) (blog.google) (developers.googleblog.com) OpenAI said on April 14, 2025 that GPT-4.1, GPT-4.1 mini, and GPT-4.1 nano each support a 1 million-token context window, and Anthropic has marketed 200,000-token windows in Claude products while saying some Claude 3 models can take inputs above 1 million tokens for select customers. (openai.com) (developers.openai.com) (anthropic.com 1) (anthropic.com 2) In plain terms, a larger window lets one prompt include a long legal brief, a multi-file software repository, a large spreadsheet dump, or hours of transcript without chopping everything into small pieces first. OpenAI’s own guide says long context changes tasks like structured document parsing, reranking, and multi-hop reasoning over messy source material. (developers.openai.com) (openai.com) The engineering problem then shifts from “Can the model fit this?” to “Can it still find the right detail 700 pages later?” OpenAI’s prompting guide recommends re-grounding and explicit structure for long inputs, and Anthropic now frames the job as “context engineering,” meaning developers have to decide what information stays in the model’s working memory and in what order. (developers.openai.com) (anthropic.com) That has also turned cost and speed into product features. Google’s Gemini API now offers implicit and explicit context caching so developers can reuse large repeated inputs instead of paying to resend the same tokens every time, and Google says cache hits can return token discounts. (ai.google.dev) (developers.googleblog.com) The same shift shows up in coding tools. OpenAI says GPT-4.1 improved on coding and long-context understanding at the same time, while Anthropic’s recent engineering posts focus on “harnesses” for long-running agents that work across many context windows instead of assuming one giant prompt solves everything. (openai.com) (anthropic.com 1) (anthropic.com 2) Anthropic’s engineers wrote in April 2026 that long-horizon tasks often exceed a single context window and require resets, summaries, and external scaffolding. In a separate April 2026 post, the company said the “session is not Claude’s context window,” drawing a line between what the model sees right now and what the surrounding software keeps track of over time. (anthropic.com 1) (anthropic.com 2) So the bragging rights are no longer just about parameter counts or benchmark charts. The live questions now are how much source material a model can hold, how reliably it recalls the right fragment, and how well the surrounding tools keep that memory usable over long jobs. (developers.openai.com) (anthropic.com) (developers.googleblog.com)

Context windows surge

Get your own daily briefing