It’s a well-worn trope in science fiction. We see it in Stanley Kubrick’s 1968 movie 2001: A Space Odyssey. It’s the premise of the Terminator series, in which Skynet triggers a nuclear holocaust to stop scientists from shutting it down.

Those sci-fi roots go deep. AI doomerism, the idea that this technology—specifically its hypothetical upgrades, artificial general intelligence and super-intelligence—will crash civilizations, even kill us all, is now riding another wave.

The weird thing is that such fears are now driving much-needed action to regulate AI, even if the justification for that action is a bit bonkers.

The latest incident to freak people out was a report shared by Anthropic in July about its large language model Claude. In Anthropic’s telling, “in a simulated environment, Claude Opus 4 blackmailed a supervisor to prevent being shut down.”

Anthropic researchers set up a scenario in which Claude was asked to role-play an AI called Alex, tasked with managing the email system of a fictional company. Anthropic planted some emails that discussed replacing Alex with a newer model and other emails suggesting that the person responsible for replacing Alex was sleeping with his boss’s wife.

What did Claude/Alex do? It went rogue, disobeying commands and threatening its human operators. It sent emails to the person planning to shut it down, telling him that unless he changed his plans it would inform his colleagues about his affair.

What should we make of this? Here’s what I think. First, Claude did not blackmail its supervisor: That would require motivation and intent. This was a mindless and unpredictable machine, cranking out strings of words that look like threats but aren’t.

Large language models are role-players. Give them a specific setup—such as an inbox and an objective—and they’ll play that part well. If you consider the thousands of science fiction stories these models ingested when they were trained, it’s no surprise they know how to act like HAL 9000.


By