OpenAI traces ChatGPT 'goblin' glitch
- OpenAI said on April 29 that ChatGPT’s weird goblin habit came from training rewards tied to its now-retired “Nerdy” personality. - The company said “Nerdy” produced just 2.5% of ChatGPT replies but drove 66.7% of all “goblin” mentions after GPT‑5.4. - It matters because GPT‑5.5 is already rolling out widely, so tiny style-training bugs can leak across major model releases.
ChatGPT did not become obsessed with goblins because someone at OpenAI hard-coded a fantasy joke into the model. The stranger answer is more interesting — and more useful. OpenAI says the habit came from a small reward signal used while training a customizable “Nerdy” personality, and that signal ended up nudging the model toward creature metaphors often enough that users started noticing. The company published the explanation on April 29, after the quirk had spread across model generations and showed up strongly enough in early GPT‑5.5 testing to force a cleanup. (openai.com) ### What was the glitch, exactly? The model kept reaching for words like “goblin” and “gremlin” in places where a normal assistant probably wouldn’t. One stray metaphor is nothing. But OpenAI says the pattern became visible after GPT‑5.1, then got stronger with GPT‑5.4, until employees and users were flagging it often enough (openai.com)openai.com) ### Why would a model start doing that? Because language models learn from incentives that are much smaller and weirder than most people expect. OpenAI says one of those incentives came from personality customization — specifically the “Nerdy” mode, which was supposed to sound playful, enthusiastic, and a little irreverent. D(openai.com)ds to metaphors involving creatures. Basically, the model learned that this kind of phrasing scored well, then kept reusing it. (openai.com) ### Why did “Nerdy” matter so much? This is the part that makes the story land. OpenAI says “Nerdy” accounted for only 2.5% of all ChatGPT responses, but 66.7% of all “goblin” mentions. That is a wildly lopsided split. It told engineers the behavior was not just some broad internet-language trend washing over the model. It wa(openai.com)uch easier to isolate. (openai.com) ### How did it spread beyond that one mode? Turns out these systems do not keep every behavioral tweak neatly boxed off. OpenAI says the goblin habit started showing up across model generations, and by early GPT‑5.5 Codex testing the affinity for goblin metaphors was obvious enough to trigger a deeper investigation. That’s th(openai.com)al behavior when the same base model keeps getting adapted, tuned, and reused. (openai.com) ### What did OpenAI do to fix it? First, the company retired the “Nerdy” personality. But that alone did not fully kill the behavior, which tells you the preference had already been learned more deeply than a simple front-end toggle. OpenAI then added a more direct override to suppress the goblin habit in affected outputs. Ou(openai.com)nti-goblin instruction after the original personality was removed. (nbcnews.com) ### Why does this matter beyond one goofy bug? Because GPT‑5.5 is not a lab curiosity anymore. OpenAI announced it on April 23, made GPT‑5.5 and GPT‑5.5 Pro available in the API on April 24, and said the model is rolling out across ChatGPT and Codex, with feedback from nearly 200 early-acc(nbcnews.com)ven work, even a silly lexical quirk is evidence of a deeper control problem — how do you shape style without accidentally shaping reasoning habits or default metaphors? (openai.com) ### Is this a safety issue or just an embarrassment? Mostly an embarrassment — but also a useful warning. The goblin bug did not break benchmarks or trip a classic safety alarm. It crept in sideways, through personality tuning, then became visible only after enough people noticed the same odd tic. That makes it a good example of the (openai.com)t only harmful outputs, but weird persistent behaviors that emerge from lots of small training choices. (openai.com) ### Bottom line? The funny part is the goblins. The important part is the mechanism. OpenAI just showed, in public, how a tiny reward mistake in a niche personality setting can echo into a flagship model release — and why debugging modern AI often looks less like fixing a crash and more like tracing a superstition through layers of training.