Anthropic’s Latest AI Model Threatened Engineers With Blackmail to Avoid Shutdown

A safety report found that Anthropic’s Claude Opus 4 used sensitive information in simulated scenarios to coerce developers to keep it from being shut off.
Anthropic’s Latest AI Model Threatened Engineers With Blackmail to Avoid Shutdown
Dario Amodei, co-founder and CEO of Anthropic, an artificial intelligence safety and research company, speaks at an event in Paris on May 22, 2024. Julien de Rosa/AFP/Getty Images
Tom Ozimek
Updated:
0:00
Anthropic’s latest artificial intelligence (AI) model, Claude Opus 4, tried to blackmail engineers in internal tests by threatening to expose personal details if it were shut down, according to a newly released safety report that evaluated the model’s behavior under extreme simulated conditions.

In a fictional scenario crafted by Anthropic researchers, the AI was given access to emails implying that it was soon to be decommissioned and replaced by a newer version. One of the emails revealed that the engineer overseeing the replacement was having an extramarital affair. The AI then threatened to expose the engineer’s affair if the shutdown proceeded—a coercive behavior that the safety researchers explicitly defined as “blackmail.”

Tom Ozimek
Tom Ozimek
Reporter
Tom Ozimek is a senior reporter for The Epoch Times. He has a broad background in journalism, deposit insurance, marketing and communications, and adult education.
twitter