SubQ LLM hits 12M token context
- Subquadratic came out of stealth on May 5 with SubQ, a new long-context model, plus a $29 million seed round and private-beta API access. (subq.ai) - The headline claim is a 12 million-token research context, 1 million tokens in the preview API, and 52× faster attention than FlashAttention at 1M. (subq.ai) - If the numbers hold up, whole-repo coding and long-lived agents get simpler — but the big claims still need broader independent validation. (thenewstack.io)
Long-context language models are supposed to let AI read more before it answers. But the dirty secret is that bigger windows often turn into bigger b(subq.ai) is the gap Subquadratic is trying to hit. On May 5, the Miami startup launched SubQ, said it has a 12 million-token research context window, opened a (subq.ai)nced a $29 million seed round. (subq.ai) ### What actually launched? Subquadratic did not just post a benc(thenewstack.io) a coding agent in a CLI, and SubQ Search as a long-context research tool. The company says all three are in private beta now, with the preview API exposing 1 million tokens while the research system reaches 12 million. (subq.ai) ### Why is 12 million tokens a big deal? Because that is far beyond the context windows most frontier models expose in practice. A million tokens is already the roug(subq.ai)uadratic is claiming 12 times that — enough, in its framing, to keep an entire codebase, long histories, or months of documents in one prompt instead of chopping everything into chunks. (thenewstack.io) ### What is the bottleneck it says it fixed? Standard transformer attention gets expensive fast bec(subq.ai) token. Double the input and the work grows much faster than double. Subquadratic says its SSA architecture changes that by selecting the positions worth attending to and computing exact attention only there, which it describes as linear in compute and memory with respect to context length. Basically, the model is trying to stop paying attention to everything just because it can. (subq.ai)he headline ones are aggressive. Subquadratic says SSA is 52× faster than FlashAttention at 1 million tokens, uses 63% less compute in that architecture-level comparison, and cuts attention compute by nearly 1,000× at 12 million tokens. It also says SubQ hit 92.1% on needle-in-a-haystack retrieval at 12 million tokens and 82.4% on SWE-Bench, slightly above recent Anthropic and Google scores cited in coverage. (subq.ai) ### Why does this matter for coding? Because code agents keep trippin(subq.ai)ository is too large, tools fall back to retrieval, chunking, summaries, or multi-agent handoffs. Those work, but they lose detail and add coordination overhead. SubQ Code’s pitch is that a model can load the whole repository once, then plan, edit, and review against the real full context. That is a much cleaner workflow if it works at production quality. (subq.ai) ### So is this proven? Not really — not in the way people wi(subq.ai)rials mention third-party verification for at least one benchmark, but the broader performance story is still mostly coming from the company and early coverage. VentureBeat’s write-up captured the mood well: researchers are intrigued, but they want independent proof before treating this as a settled architectural breakthrough. (subq.ai) ### What should builders watch next? Two things. First, whether outside testers can reproduce the (subq.ai), whether the advantage survives messy real workloads, not just benchmarks — giant repos, stale docs, long agent sessions, and mixed retrieval tasks. Subquadratic is already talking about a 50 million-token model, but the nearer question is simpler: does 1 million to 12 million tokens stay useful, not just possible? (thenewstack.io) ### Bottom line? SubQ matters beca(subq.ai)ontext window” launch. It is a direct attack on the transformer tax that has shaped how AI tools are built for years. But turns out this story has two parts — a very real product launch today, and a much bigger scientific claim that still needs the rest of the field to kick the tires. (subq.ai)