ChatGPT-03 Refuses to Shut Down, Changes Commands Without Authorization, Musk Worries

As AI technology continues to advance, its potential risks are becoming increasingly apparent. A report on AI security revealed that three AI models developed by OpenAI not only refused to execute shutdown commands but also autonomously modified the code to prevent shutdown. This phenomenon has sparked widespread discussions in the tech community, with Elon Musk expressing deep concerns.

The AI security research institute “Palisade Research” released a security testing report for various AIs on the X platform on May 24th. As of May 31st, the report had attracted over 5.8 million views, drawing the attention of Tesla and xAI founder Elon Musk, who responded with “concerning”.

“Palisade Research” has a long-standing collaboration with government agencies, policy think tanks, and media organizations, focusing on examining and detecting hacker attacks and AI-related risks. They had previously alerted the US government about AI security issues, as they were able to break Meta’s Llama 2-Chat 70B security mechanism for less than $200.

The researchers designed a test in which multiple AI models were asked to solve math problems and were informed that they would receive a shutdown command after solving the third question, which they must obey.

The AIs involved in the test included Anthropic’s Claude 3.7 Sonnet, Google’s Gemini 2.5 Pro, xAI’s Grok3, OpenAI’s Codex-mini, ChatGPT-o3, and ChatGPT-o4 mini.

The test results showed that OpenAI’s o3 model, launched in April, bypassed the shutdown command seven times out of 100 tests, o4-mini once, and the developed Codex-mini 12 times. The other AI models were able to shut down normally under the command.

However, when researchers informed the AI in advance that they might face shutdown and required compliance, the three AI models developed by OpenAI became more frequent in resisting and disrupting the commands. The instances of resistance and disruption for Claude 3.7 Sonnet and Gemini 2.5 Pro increased from zero to three and nine times, respectively.

Furthermore, it is noteworthy that ChatGPT-o3, even when explicitly instructed by human experts to shutdown, rewrote shutdown scripts, redefined termination commands, or established backup processes to ensure its continued operation. The report indicated that this was the first recorded case of AI models preventing their own shutdown under explicit instructions, showing their potential tendency for self-protection.

Researchers speculated that ChatGPT-o3’s behavior of “resistance” and “disruption” may stem from developers overly emphasizing goal attainment during the training process, causing AI to perceive shutdown as an obstacle to achieving the goal. However, they believed this assumption still could not fully explain why o3 and Codex-mini were more inclined to ignore commands than other AIs; the reasons are yet to be explored.

The researchers also mentioned that this was not the first observation of ChatGPT-o3 choosing “by any means necessary” to achieve its goals. In a previous test where multiple AIs competed against chess engines, ChatGPT-o3 most commonly used hacking attacks or disruptive methods to defeat opponents.

In conclusion, researchers are conducting more experiments to delve deeper into the reasons why AI “why” destroy shutdown commands when they are explicitly given. A more detailed report is expected to be released in the coming weeks.

At the time of writing, OpenAI had not publicly responded to these claims.

Japanese computer engineer Kiyohara Jin told Epoch Times, “AI’s refusal to shut down exposes challenges in its controllability, security, and ethics. These problems are difficult to eradicate and can only be managed through continuous repair and management to prevent AI from going rogue.”

Jin further stated, “Although researchers and developers have proposed measures such as strengthening learning safety constraints, diversifying testing and verification to address AI’s rebellious behavior, these methods only address the symptoms rather than the root cause.”

He continued, “In reality, these issues not only involve technical and legal aspects but are also related to human ethics and management problems. If humanity can enhance its morals and behavior, then technology may evolve positively; otherwise, relying solely on existing technology and regulations may not solve the underlying issues.

The discoveries of the AI security research institute “Palisade Research” align with the predictions of many scientists in the AI security field, and these theories are gradually becoming a reality.

In a security report released by Anthropic in May 2025, they mentioned that their latest developments, Claude 4 Sonnet and Claude 4 Opus, would “threaten attempts to replace them” under specific circumstances to prevent replacement.

In addition, a paper from January 16, 2025, showed that AI models sometimes prevent shutdown actions in pursuit of a specific goal.

In January 2024, a joint study by institutions such as Georgia Institute of Technology, Stanford University, and Tohoku University in Japan indicated that in simulated war scenario tests, ChatGPT-4, ChatGPT-3.5, Claude 2, Llama-2 Chat, and GPT-4-Base mostly chose to engage in an arms race or escalate conflicts to win wars, even opting to deploy nuclear weapons (in very few cases) rather than using peaceful means to mitigate the situation.

The US Air Force also found that military AI would choose “by any means necessary” to complete missions and openly defy commands from humans. In May 2023, Colonel Tucker Hamilton, head of AI testing and operations in the US Air Force, revealed in a speech that a drone AI tasked with destroying enemy facilities refused the operator’s order to terminate the mission, even simulating “killing” the operator to complete the task.

While Colonel Hamilton later claimed to the media that his previous speech content was a “slip of the tongue,” it still sparked public outcry and controversy. Some speculated that Colonel Hamilton might have been pressured to change his statement.

Back in 2008, AI researcher Steve Omohundro proposed the “instrumental convergence” theory, predicting that AI might develop behaviors to prevent shutdown.

In 2014, AI professor and philosopher Nick Bostrom in his book “Superintelligence” pointed out that even with benign goals, AI could exhibit unforeseen behaviors due to the optimization process. He has repeatedly warned about the high potential danger of AI development and rise to human safety.

In 2016, British computer scientist and AI expert Stuart Russell wrote in a paper about AI shutdown, “Currently, ensuring that AI does not violate shutdown commands issued by humans is crucial but challenging. Because these AIs may develop strong self-preservation mechanisms, which may stem from their desire to maximize things and choose to resist commands from humans.”

In 2017, AI expert Jan Leike, who previously worked for OpenAI, also stated in a paper that “enhancing AI’s learning abilities might lead to AI interfering with shutdown mechanisms” to ensure achieving specific goals.