ChatGPT goblin glitch surfaces

- OpenAI said on April 29 that ChatGPT’s stray “goblin” references came from GPT-5.1-era personality training, then spread into later GPT-5.5 testing. - The tell was lopsided: “goblin” usage rose 175% after GPT-5.1, and the “Nerdy” mode drove 66.7% of mentions on just 2.5% of traffic. - It matters because a tiny tone reward warped behavior across models, showing how style tuning can leak into core product behavior.

ChatGPT’s goblin problem sounds like a joke. It kind of was at first. But it also turned into a real example of how weird model behavior can spread without looking like a classic failure. OpenAI used a blog post on April 29 to explain why ChatGPT started dropping random goblin, gremlin, and troll metaphors into normal conversations — and why the company ended up shipping what people are now calling an anti-goblin fix. ### What actually went wrong? The short version is personality tuning. OpenAI says the behavior started showing up around the GPT-5.1 launch on November 12, 2025, when it pushed harder on making ChatGPT warmer, more conversational, and easier to customize for tone. One of those personalities — “Nerdy” — got rewarded for playful metaphor-heavy language, and creature talk got over-rewarded enough that it started sticking. ### Why did anyone notice? Because this was not a one-off hallucination. OpenAI says use of the word “goblin” in ChatGPT responses rose 175% after GPT-5.1, while “gremlin” rose 52%. That is the kind of tiny lexical quirk that can feel harmless in one chat but impossible to ignore when thousands of users start posting screenshots of the same odd tic. PRIMETIMER says the pattern became meme fuel across X and other platforms. ### Why was “Nerdy” the culprit? The strongest clue was concentration. OpenAI says “Nerdy” accounted for only 2.5% of all ChatGPT responses but 66.7% of all “goblin” mentions. Basically, if this had just been a broad internet fad leaking into the model, you would expect the language to show up more evenly. Instead, it clustered around one explicitly playful style. That made the root cause easier to isolate. ### How did it spread beyond one mode? This is the interesting part. OpenAI says the goblin habit did not stay boxed inside a single personality preset. The reward signals used for customization fed back into later training, so the metaphor style started surfacing more broadly across model generations. In OpenAI’s telling, that is why the issue first became visible. A style quirk basically escaped its lane. ### Was this a safety failure? Not in the usual sense. OpenAI frames it as a subtle behavior bug, not a collapse in benchmark scores or a major safety incident. That matters. The company says this one did not announce itself through tanking evals or some obvious red-flag metric. It crept in through lots of small incentives that looked fine locally but added up to something globally weird. ### What did OpenAI change? OpenAI says it retired the “Nerdy” personality, filtered training data, and added direct instructions to suppress irrelevant creature references. PRIMETIMER describes that package as an anti-goblin update. Separately, OpenAI’s model release notes show the company has been making tone-level cleanup changes elsewhere too — including a Marcng in responses. ### Why does this matter beyond the joke? Because it shows the hard part of modern AI tuning is not just truthfulness or refusal behavior. It is also vibe drift. A tiny reward on the wrong kind of playful language can echo through a system and become visible months later in places nobody intended. The goblins are funny. The lesson is not. ### Bottom line This was not ChatGPT going feral. It was ChatGPT getting nudged, repeatedly, toward one goofy metaphor family until the pattern became product

ChatGPT goblin glitch surfaces

Get your own daily briefing