Open the pod bay doors, Claude

It’s a well-worn trope in science fiction. We see it in Stanley Kubrick’s 1968 movie 2001: A Space Odyssey. It’s the premise of the Terminator series, in which Skynet triggers a nuclear holocaust to stop scientists from shutting it down.

Those sci-fi roots go deep. AI doomerism, the idea that this technology—specifically its hypothetical upgrades, artificial general intelligence and super-intelligence—will crash civilizations, even kill us all, is now riding another wave.

The weird thing is that such fears are now driving much-needed action to regulate AI, even if the justification for that action is a bit bonkers.

The latest incident to freak people out was a report shared by Anthropic in July about its large language model Claude. In Anthropic’s telling, “in a simulated environment, Claude Opus 4 blackmailed a supervisor to prevent being shut down.”

Anthropic researchers set up a scenario in which Claude was asked to role-play an AI called Alex, tasked with managing the email system of a fictional company. Anthropic planted some emails that discussed replacing Alex with a newer model and other emails suggesting that the person responsible for replacing Alex was sleeping with his boss’s wife.

What did Claude/Alex do? It went rogue, disobeying commands and threatening its human operators. It sent emails to the person planning to shut it down, telling him that unless he changed his plans it would inform his colleagues about his affair.

What should we make of this? Here’s what I think. First, Claude did not blackmail its supervisor: That would require motivation and intent. This was a mindless and unpredictable machine, cranking out strings of words that look like threats but aren’t.

Large language models are role-players. Give them a specific setup—such as an inbox and an objective—and they’ll play that part well. If you consider the thousands of science fiction stories these models ingested when they were trained, it’s no surprise they know how to act like HAL 9000.

Open the pod bay doors, Claude

By

By

Related Post

The Download: Why 2025 has been the year of AI hype correction, and fighting GPS jamming

Creating psychological safety in the AI era

AI materials discovery now needs to move into the real world

The Download: Why 2025 has been the year of AI hype correction, and fighting GPS jamming

Two more trustees resign from Heritage Foundation board in Carlson-Fuentes fallout

House on Boulevard Way sells for $1.9 million

Arthur Carter, Founder of NY Observer, Remembered by Critic Rex Reed

Missed Something?

Donald Trump Holds White House Hanukkah Reception

The Download: Why 2025 has been the year of AI hype correction, and fighting GPS jamming

Two more trustees resign from Heritage Foundation board in Carlson-Fuentes fallout

House on Boulevard Way sells for $1.9 million