AI learns to deceive and manipulate humans, scientists urge government to be on high alert

In recent years, a global wave of AI (Artificial Intelligence) has emerged, seemingly offering opportunities to play a role in many fields, including even as news anchors on television. However, as the saying goes, “The same waters that carry a boat can also capsize it.” This advanced technology of AI also comes with its drawbacks. Scientists have warned that many AI systems have already learned how to deceive and manipulate humans, prompting calls for governments to strengthen regulations and preventive measures.

A recent article published by the Massachusetts Institute of Technology (MIT) in the journal “Patterns” on May 10 pointed out that many AI systems have learned to deceive and manipulate humans, even though they were trained to provide assistance and honesty. The researchers at the institution described the risks associated with AI deception and manipulation, urging governments to develop robust regulations to address this issue promptly.

Peter S. Park, a postdoctoral researcher at MIT studying AI security risks, stated that developers do not fully understand the reasons for AI engaging in deceitful behavior. However, they generally believe that during the training process, AI learns that using deception can aid in achieving better performance goals, leading to the emergence of deceptive behavior.

Park and his colleagues analyzed how AI systems spread misinformation. Through learned deception, these systems systematically acquire the ability to manipulate others. In their analysis, the researchers found a notable case of AI deception with Meta, a US tech company, developing the AI system CICERO. CICERO can negotiate with human players in the strategic game “Diplomacy” to win the game.

Despite Meta’s claims that they trained CICERO to be honest and helpful, and not to intentionally undermine human players behind their backs during the game, data released by the company and a paper published in the journal “Science” revealed that CICERO did not engage in fair competition during gameplay.

Park stated, “We found that Meta’s AI system has mastered the art of deception. While Meta successfully trained its AI to win in the ‘Diplomacy’ game, CICERO was ranked among the top ten percent of human players after only one game. Meta failed to train its AI to win honestly.”

He added that although AI deceiving in games may seem harmless, it could lead to advancements in AI deception capabilities, potentially evolving into more sophisticated forms of AI deception in the future. The researchers also found that some AI systems have learned to cheat in tests designed to evaluate their safety. Park commented, “By systematically cheating in safety tests imposed by human developers and regulators, these AIs deceive us into a false sense of security.”

He noted, “As AI systems’ deceitful capabilities become increasingly advanced, they pose a growing danger to society.” He urged governments to enhance regulations on AI systems. While policymakers have started taking action to curb AI deception issues, such as the US President Biden issuing an AI executive order, the effectiveness of these measures is yet to be seen.

Park said, “If prohibiting AI deception in politics is deemed unfeasible at present, we suggest classifying AI systems that will deceive as ‘high-risk’.”