LLM rankings and use cases

Recent social posts listed top large language models — GPT‑5.2, Claude 4.5, Gemini 3 Pro, Perplexity — and suggested specific models for writing, research and social content. ( ) The same commentary recommended specialised models (e.g., Grok 4 or STEM‑focused variants) for technical tasks like code or scientific reasoning. (x.com)

A wave of social posts is turning model shopping into a simple chart: one model for writing, another for research, another for code. (openai.com) Large language models are prediction engines trained on vast text, image, and code datasets; they guess the next useful token, then stack those guesses into essays, answers, or software. Companies now ship families of models tuned for different jobs, with tradeoffs in speed, cost, reasoning depth, and access to web tools. (openai.com) OpenAI says GPT‑5.2 is its “most advanced frontier model for everyday professional work,” and says paid ChatGPT users began getting GPT‑5.2 Instant, Thinking, and Pro on launch day while API access opened to all developers. In the same release, OpenAI highlighted gains on knowledge-work, coding, science, math, and abstract-reasoning benchmarks over GPT‑5.1. (openai.com) Anthropic’s Claude line has also split into task-specific tiers. Anthropic introduced Claude Opus 4.5 in late 2025 as a top-end model for coding and “heavy-duty agentic workflows,” and later said Claude Haiku 4.5 delivered roughly Sonnet 4–level coding performance at one-third the cost and more than twice the speed. (anthropic.com, anthropic.com) Google has pushed the same menu logic with Gemini 3. Google said Gemini 3 is its “most intelligent model,” while a separate Gemini 3 Pro release focused on visual reasoning across documents, screens, spatial tasks, and video. (blog.google, blog.google) Perplexity’s pitch is narrower: pair a model with live search and citations, then optimize for answer freshness and readability instead of only raw benchmark scores. Perplexity said its in-house Sonar model was trained to improve factuality and user experience in search, and later said Sonar-Reasoning-Pro tied for first in LM Arena’s Search Arena evaluation. (perplexity.ai, perplexity.ai) That is why recommendation threads now sort models by use case instead of asking which one is “best.” A user drafting marketing copy may care more about tone and latency, while a user checking a legal filing, a spreadsheet, or a research claim may care more about citations, tool use, and lower hallucination rates. (openai.com, perplexity.ai) The same split shows up in technical work. OpenAI released GPT‑5.2‑Codex as a coding variant for large refactors and migrations, xAI launched Grok 4 with native tool use and real-time search, and OpenAI described o3‑mini as a smaller reasoning model with particular strength in science, math, and coding. (openai.com, x.ai, openai.com) Those rankings are still unstable because companies measure different things. OpenAI cites GDPval, SWE‑Bench, GPQA, AIME, and ARC‑AGI; Google cites MMMU Pro and Video MMMU for Gemini 3 Pro; Perplexity points to human-preference rankings for search systems; and Anthropic’s releases lean heavily on coding and agent workflow results. (openai.com, blog.google, perplexity.ai, anthropic.com) Even the names in those threads can lag the market by weeks or months. Anthropic’s newsroom now lists Claude Opus 4.7 as of April 16, 2026, and Google’s Gemini news page currently highlights Gemini 3.1 variants, a reminder that social rankings often freeze a moving target. (anthropic.com, blog.google) The practical takeaway is less about a permanent pecking order than a buying guide for tasks. As model makers keep shipping writing, reasoning, coding, and search-tuned variants, the useful question is no longer which model wins the internet this week, but which one fits the job in front of you. (openai.com, x.ai, perplexity.ai)

LLM rankings and use cases

Get your own daily briefing