Musk: humans 'not by AI standards'
Elon Musk tweeted that a top human is 'relatively smart for a human, but not by AI standards,' a comment that spurred discussion comparing modern large models to human IQ metrics. The post has been linked in analyses that reference model IQ estimates such as claims about GPT‑5.4‑level performance. (x.com) (x.com)
Elon Musk’s latest AI provocation was a short line on X: a top human is “relatively smart for a human, but not by AI standards,” pushing a familiar argument into a fresh round of benchmark wars. (x.com) The comparison turns on a simple idea: large language models are software systems trained on huge piles of text and images, then tested on tasks like coding, math, search, and computer use. OpenAI says its GPT-5.4 model supports web search, code tools, computer use, and a 1,050,000-token context window, which is the amount of text it can keep in view at once. (developers.openai.com) That is why online arguments keep drifting to human-style scores such as Intelligence Quotient, or IQ. A 2024 paper comparing generative artificial intelligence systems with human benchmarks on the Wechsler Adult Intelligence Scale reported very high results on verbal comprehension and working memory, but weak results on perceptual reasoning, which covers visual pattern solving. (arxiv.org) Other researchers have warned that the fit is awkward. A 2023 Nature Reviews Psychology paper said large language models can generate human-like language “without the ability to think or feel like a human,” a distinction that cuts against treating one test score as a full measure of machine intelligence. (nature.com) The newer benchmark fight is moving beyond static question sets toward interactive tasks that look more like learning on the fly. ARC Prize says its ARC-AGI-3 benchmark measures whether agents can explore new environments, infer goals, adapt over time, and match human efficiency rather than just produce a final answer. (arcprize.org) That shift has given both sides material. Model boosters can point to systems that now outperform people on some narrow tests, while skeptics can point to benchmarks like ARC-AGI-3 that are explicitly built around the remaining gap between machine performance and human learning. (arcprize.org) Musk has been making the broader claim for months. In a 2025 interview cited by Yahoo Tech, he said Grok 4 was “better than almost all graduate students in all disciplines simultaneously,” and argued that the idea of humans running the economy could soon look antiquated. (tech.yahoo.com) The missing piece in many viral posts is standardization. IQ tests were designed to compare humans with other humans using population norms, while model evaluations are usually lab-run benchmark suites, product demos, or one-off experiments that change as systems and prompting methods change. (sciencedirect.com) So Musk’s line landed less as a settled measurement than as a framing device. It compresses a messy reality into one sentence: modern models are posting stronger scores on more tasks, but the argument over whether that maps cleanly onto “smarter than humans” is still being fought test by test. (x.com)