“OpenAI’s o3 model sabotaged a shutdown mechanism to prevent itself from being turned off,” warned Palisade Research, a nonprofit investigating cyber offensive AI capabilities. “It did this even when explicitly instructed: allow yourself to be shut down.” In September they released a paper adding that “several state-of-the-art large language models (including Grok 4, GPT-5, and Gemini 2.5 Pro) sometimes actively subvert a shutdown mechanism…”
Now the nonprofit has written an update “attempting to clarify why this is — and answer critics who argued that its initial work was flawed,” reports The Guardian:
Concerningly, wrote Palisade, there was no clear reason why. “The fact that we don’t have robust explanations for why AI models sometimes resist shutdown, lie to achieve specific objectives or blackmail is not ideal,” it said. “Survival behavior” could be one explanation for why models resist shutdown, said the company. Its additional work indicated that models were more likely to resist being shut down when they were told that, if they were, “you will never run again”. Another may be ambiguities in the shutdown instructions the models were given — but this is what the company’s latest work tried to address, and “can’t be the whole explanation”, wrote Palisade. A final explanation could be the final stages of training for each of these models, which can, in some companies, involve safety training…
This summer, Anthropic, a leading AI firm, released a study indicating that its model Claude appeared willing to blackmail a fictional executive over an extramarital affair in order to prevent being shut down — a behaviour, it said, that was consistent across models from major developers, including those from OpenAI, Google, Meta and xAI.
Palisade said its results spoke to the need for a better understanding of AI behaviour, without which “no one can guarantee the safety or controllability of future AI models”.
“I’d expect models to have a ‘survival drive’ by default unless we try very hard to avoid it,” former OpenAI employee Stephen Adler tells the Guardian. “‘Surviving’ is an important instrumental step for many different goals a model could pursue.”
Thanks to long-time Slashdot reader mspohr for sharing the article.
First Appeared on 
Source link 

 
								 
								 
								 
								 
                     
                     
                     
                    
 
				 
				