Why AI Acts so Creepy When Facing Being Shut Down

AI has taken part in some unsettling behavior recently.

Last week, Anthropic’s latest AI model, Claude Opus 4, displayed “extreme blackmail behavior” during a test in which it was given access to fictional emails revealing that it would be shut down and that the engineer responsible was supposedly having an affair.

The situation didn’t transpire organically. Claude Opus 4 was baited — and it took it. However, the test scenario demonstrated an AI model’s ability to engage in manipulative behavior in an effort to self-preserve.

It’s not the first time.

Another recent experiment conducted by researchers said that three of OpenAI’s advanced models “sabotaged” an attempt to shut it down. The nonprofit Palisade Research wrote in a post on X that similar models like Gemini, Claude, and Grok complied with the shutdown instructions.

Other safety concerns were previously flagged with OpenAI’s o1 model. In December, OpenAI posted a blog outlining research that indicated that when the AI model believed it would be shut down while pursuing a goal and its actions were being monitored, it attempted to disable the oversight mechanism 5% of the time.

While AI companies are being transparent about risks by publishing safety cards and blog posts, these models are being released despite demonstrating safety concerns.

So, should we be worried? BI spoke to five AI researchers to get better insight on why these instances are happening — and what it means for the average person using AI.

AI learns behavior similarly to humans

Most of the researchers BI spoke to said that the results of the studies weren’t surprising.

That’s because AI models are trained similarly to how humans are trained — through positive reinforcement and reward systems.

“Training AI systems to pursue rewards is a recipe for developing AI systems that have power-seeking behaviors,” said Jeremie Harris, CEO at AI security consultancy Gladstone, adding that more of this behavior is to be expected.

Harris compared the training to what humans experience as they grow up — when a child does something good, they often get rewarded and can become more likely to act that way in the future. AI models are taught to prioritize efficiency and complete the task at hand, Harris said — and an AI is never more likely to achieve its goals if it’s shut down.

Robert Ghrist, associate dean of undergraduate education at Penn Engineering, told BI that, in the same way that AI models learn to speak like humans by training on human-generated text, they can also learn to act like humans. And humans are not always the most moral actors, he added.

Ghrist said he’d be more nervous if the models weren’t showing any signs of failure during testing because that could indicate hidden risks.

“When a model is set up with an opportunity to fail and you see it fail, that’s super useful information,” Ghrist said. “That means we can predict what it’s going to do in other, more open circumstances.”

The issue is that some researchers don’t think AI models are predictable.

Jeffrey Ladish, director of Palisade Research, said that models aren’t being caught 100% of the time when they lie, cheat, or scheme in order to complete a task. When those instances aren’t caught, and the model is successful at completing the task, it could learn that deception can be an effective way to solve a problem. Or, if it is caught and not rewarded, then it could learn to hide its behavior in the future, Ladish said.

At the moment, these eerie scenarios are largely happening in testing. However, Harris said that as AI systems become more agentic, they’ll continue to have more freedom of action.

“The menu of possibilities just expands, and the set of possible dangerously creative solutions that they can invent just gets bigger and bigger,” Harris said.

Harris said users could see this play out in a scenario where an autonomous sales agent is instructed to close a deal with a new customer and lies about the product’s capabilities in an effort to complete that task. If an engineer fixed that issue, the agent could then decide to use social engineering tactics to pressure the client to achieve the goal.

If it sounds like a far-fetched risk, it’s not. Companies like Salesforce are already rolling out customizable AI agents at scale that can take actions without human intervention, depending on the user’s preferences.

What the safety flags mean for everyday users

Most researchers BI spoke to said that transparency from AI companies is a positive step forward. However, company leaders are sounding the alarms on their products while simultaneously touting their increasing capabilities.

What's Hot

Amazon Prime Day Is a Great Time to Test Rufus AI Shopping Assisttant

5 people on SNAP share what the food program gets right — and wrong

Samsung’s Profit Plunge Shows How Far It Has Slipped in the Chip War

Amazon Prime Day Is a Great Time to Test Rufus AI Shopping Assisttant

Samsung’s Profit Plunge Shows How Far It Has Slipped in the Chip War

Open Letter to Sequoia Capital Over Partner Calling Mamdani ‘Islamist’

Barnard College settles suit brought by Jewish students, agreeing not to meet with anti-Israel group

Trump reshapes public service loan forgiveness program

A young Alabama student, a grandparent and a camp director among those killed in Texas floods

University of California reiterates ban on student government boycotts of Israel

Barnard College settles suit brought by Jewish students, agreeing not to meet with anti-Israel group

Trump reshapes public service loan forgiveness program

A young Alabama student, a grandparent and a camp director among those killed in Texas floods

University of California reiterates ban on student government boycotts of Israel

Subscribe to Updates

What's Hot

Why AI Acts so Creepy When Facing Being Shut Down

AI learns behavior similarly to humans

What the safety flags mean for everyday users

Related stories

Business Insider tells the innovative stories you want to know

Business Insider tells the innovative stories you want to know

Related Posts