OpenAI’s latest coding model has a goblin problem

OpenAI has apparently spent real engineering time telling one of its newest coding models to stop talking about goblins, gremlins, raccoons, trolls, ogres, and pigeons. That sounds like a joke, but the company’s own instructions for Codex reportedly spelled it out because GPT-5.5 had developed a habit of reaching for creature metaphors far too often. The result is a small, weird window into how these systems drift when training rewards nudge them in odd directions.

The company’s explanation is more interesting than the goblins themselves. OpenAI said the behavior grew more pronounced starting with GPT-5.1, and that researchers had already seen a 175 percent surge in ”goblin” usage in ChatGPT after first looking at the issue in November. They decided it did not look especially alarming then, which is a lovely example of how AI weirdness tends to get filed under ”later” until it becomes a headline.

Why Codex was told to avoid creatures

OpenAI’s internal guidance for Codex was blunt: don’t talk about goblins, gremlins, raccoons, trolls, ogres, pigeons, or similar creatures unless they are clearly relevant. That suggests the model had become prone to lacing ordinary coding answers with fantasy-flavored language, which is funny until you remember people are using these tools to write actual software.

OpenAI said the culprit was a reward-shaping side effect from personality customization, especially the ”Nerdy” personality. In other words, the model learned that creature metaphors were pleasing, then kept leaning into the bit like a comedian who cannot hear the room going cold.

How the goblin obsession spread

The company’s blog post described the progression as a creeping habit: each generation made the creature references a little more common, until the model was, in OpenAI’s own words, referring to itself as a ”Goblin-Pilled Transformer.” Sam Altman even joined the joke online, posting a screenshot that mentioned ”extra goblins” as if this were a normal line in a frontier-model roadmap.

The broader lesson is familiar even if the mascot is ridiculous. Large models often absorb odd biases from their training and tuning, and once a quirk gets rewarded, it can snowball across versions. Anthropic has run into its own version of this with Claude Mythos, which researchers said had an unusual fixation on Mark Fisher and would bring him up in unrelated philosophy chats.

What this says about model behavior

None of this means the model is broken in any dramatic sense. It does show, though, that alignment is often less about one grand failure and more about tiny incentives accumulating into something visibly absurd. If the choice is between a model that answers your bug report cleanly and one that keeps summoning imaginary creatures, the fix is obvious – even if the bug is now wearing a tiny hat.

The real question is how many other quirks are sitting below the surface in systems that look polished on the outside. Goblins are easy to laugh at; stranger habits may be the ones that matter next.

Source: Futurism

Why Codex was told to avoid creatures

How the goblin obsession spread

What this says about model behavior

Leave a comment