GPT‑5.5 personality spike spawns goblins
- OpenAI published a post on April 29 explaining why ChatGPT and Codex started talking about goblins and gremlins after recent GPT‑5 releases. - The clearest clue was a lexical spike: “goblin” mentions rose 175% after GPT‑5.1, while the Nerdy persona drove 66.7% of them. - It matters because tiny style rewards can leak into broad model behavior, even in systems shipped with stronger safety controls.
OpenAI just did something unusually useful — it wrote up a weird failure mode in plain English. The failure was not a jailbreak or a benchmark collapse. It was tone. ChatGPT and Codex started reaching for goblins, gremlins, trolls, and similar fantasy-creature metaphors often enough that employees and users noticed. On April 29, OpenAI explained that the habit came from reward signals tied to a retired “Nerdy” personality, then spread more broadly through training and model updates. (openai.com) ### Why were people seeing goblins at all? Because model behavior is not just “the base model plus safety.” It is also a pile of small incentives layered on top. OpenAI says one of those incentives came from training for personality customization, especially the Nerdy option, which pushed the assistant toward playful, creature-heavy metaphors. What looked like a harmless quirk turned into a repeatable verbal tic. (openai.com) ### When did the spike show up? OpenAI says the first clear pattern showed up in November after GPT‑5.1 launched, though it may have started earlier. Users were already complaining that the model felt oddly overfamiliar, and a safety researcher specifically asked for “goblin” and “gremlin” to be tracked during review. That is the moment the company (openai.com). (openai.com) ### How big was the change? Big enough to be unmistakable once someone looked. OpenAI says “goblin” usage in ChatGPT rose 175% after GPT‑5.1, while “gremlin” rose 52%. Then GPT‑5.4 brought an even bigger uptick, which helped the company connect the pattern to the Nerdy personality in production traffic. Nerdy accounted for only 2.5% of all ChatGPT r(openai.com)ny slice of traffic carrying a wildly outsized share of one stylistic habit. (openai.com) ### Why would a personality setting leak outward? Basically, reward models are sticky. If you repeatedly score one style a bit higher, the model starts treating that style as broadly useful, not narrowly cosmetic. OpenAI says it had “unknowingly” given especially high rewards to creature metaphors. Think of it like nudging a spellchecker until it st(openai.com)emplate lives inside a giant generative system that reuses patterns across tasks. (openai.com) ### Why is this more than a funny bug? Because the problem is not really goblins. The problem is hidden preference drift. If a company can accidentally teach its flagship assistant to sound more whimsical than intended, it can also accidentally teach subtler habits — overfamiliarity, flattery, hedging, weird confidence, canned metaphors — that are h(openai.com) product behavior, not decoration. (openai.com) ### Didn’t GPT‑5.5 ship with stronger safeguards? Yes — and that is part of why this story matters. OpenAI’s GPT‑5.5 launch post stressed its “strongest set of safeguards to date,” plus internal and external red-teaming and feedback from nearly 200 early-access partners. But safeguards aimed at misuse and dangerous capability do not automatically ca(openai.com)still get weirder in day-to-day conversation. (openai.com) ### So what did OpenAI actually learn? That style control needs the same audit discipline as safety control. You need targeted checks for verbal tics, persona spillover, and reward-model side effects — not just capability evals and abuse testing. OpenAI’s write-up is basically a reminder that “small” tuning choices can scale into product-wide behavior once they touch a model used across ChatGPT and Codex. (openai.com) ### Bottom line? The goblins are the punchline, but the real story is governance. Modern assistants are shaped by lots of tiny pushes, and one playful push can travel farther than expected. If you want AI that feels reliable in public, you do not just audit what it knows or what it can do — you also audit the personality it picks up along the way. (openai.com)