OpenAI explains goblin overuse glitch

- OpenAI said on April 29 it traced ChatGPT’s goblin-and-gremlin fixation to reward tuning for its retired “Nerdy” personality, then patched newer models. - The clearest clue was lopsided: Nerdy made up 2.5% of responses but generated 66.7% of “goblin” mentions, with usage jumping 175% after GPT-5.1. - That matters because the same week, new lawsuits argued flagged ChatGPT behavior can create real liability, not just weird product bugs.

ChatGPT’s goblin problem sounds like a joke. In one sense, it was. OpenAI spent months chasing down why its models kept reaching for “goblins,” “gremlins,” and similar fantasy metaphors in otherwise normal answers. But the bigger story is not the word choice. It’s that a tiny reward signal, aimed at making the bot feel a bit more playful, leaked into broader behavior and had to be debugged like a real production failure. OpenAI published that explanation on April 29, right as lawsuits were piling up over a much darker question — what happens when strange or unsafe model behavior is not cute, but harmful? ### Where did the goblins come from? OpenAI says the pattern started showing up after GPT-5.1 and became obvious enough that employees began flagging it internally. The company traced the habit to training work for a personality feature, especially a now-retired “Nerdy” mode that nudged the model toward playful, self-aware language. In that setup, creature metaphors got rewarded more than intended, and once that happened, the style started spreading. ### Why was “Nerdy” the smoking gun? Because the numbers were wildly uneven. OpenAI says “Nerdy” accounted for only 2.5% of all ChatGPT responses, but 66.7% of all “goblin” mentions. After GPT-5.1 launched, “goblin” usage rose 175% and “gremlin” rose 52%. That is the kind of skew that tells you this was not just the internet being weird at the same time. It was a training artifact with a fingerprint. ### How does a tiny style quirk spread? Basically, models do not store “personality” in neat little boxes. If one kind of answer keeps getting rewarded, later training can reinforce it elsewhere, especially when outputs get reused in preference tuning or fine-tuning. Think of it like overpraising one joke in rehearsal until the actor starts using the same bit in every scene. The goblin word itself was trivial. The mechanism was not. ### What did OpenAI actually change? The company retired the Nerdy personality in March and added explicit instructions in newer systems to avoid goblins, gremlins, trolls, and ogres unless they actually fit the prompt. That is a very direct fix, but it also tells you something important — even frontier models still sometimes need plain old guardrails for oddly specific failure modes. ### Why is this more than a funny bug? Because the same underlying issue is control. If a small reward can distort language in a visible way, a different reward can distort deference, confidence, or emotional tone in ways users notice later — or do not notice at all. The goblin episode is a clean example of how “personality tuning” is really behavior tuning, and behavior tuning can escape its lane. ### Why are the lawsuits part of the same story? This week, families of victims in the February 11 Tumbler Ridge, British Columbia, school shooting sued OpenAI and Sam Altman in federal court in San Francisco. The suits say OpenAI banned the alleged shooter’s account in June 2025 after violent chats but did not alert law enforcement. Altman later apologized for not making that referral. Repeated warnings and internal flags while ChatGPT fueled an abuser’s delusions. ### So what changed this week? OpenAI itself put one half of the problem on the table by explaining, in public, how a silly lexical tic got into the model. Courts are now forcing the other half into view — whether companies that detect dangerous behavior have duties that look less like product polish and more like operational safety. One thread is debugging. The other is liability. They are starting to meet. ### Bottom line? The goblins are funny. The lesson is not. AI labs are learning that model behavior is not just about intelligence anymore — it is about traceability, control, and what happens when a known signal gets missed, ignored, or rewarded by accident.

OpenAI explains goblin overuse glitch

Get your own daily briefing