International academic journal “Nature” recently published a research report indicating that a significant amount of content from Chinese official media has infiltrated the global mainstream artificial intelligence training system. When it comes to sensitive Chinese political issues, several mainstream models are more likely to provide expressions closely aligned with Beijing’s official narrative when answering in Chinese. Insiders in the online industry have revealed that authorities are offering high salaries on platforms like LinkedIn to recruit Western artificial intelligence engineers from countries such as the United States, aiming to achieve a “come-from-behind victory.”
American researchers are concerned that the Chinese Communist Party’s propaganda system is leveraging artificial intelligence to enter new channels of information dissemination. A recent study published in “Nature” revealed that a large amount of content released by official Chinese media outlets including Xinhua News Agency and People’s Daily has been widely captured by artificial intelligence training systems and has entered the dataset of global mainstream language models. When addressing sensitive Chinese political topics, several models including ChatGPT, Claude, and Gemini tend to provide expressions more in line with Beijing’s official narrative when responding in Chinese, while English responses show a different tendency.
The research team stated that this phenomenon does not rely on hacking or technical intrusion but may stem from the structure of the training data itself. Media outlets like Xinhua News Agency and People’s Daily have long been open and freely reprinting a large volume of content in a uniform format. In contrast, many independent media outlets have copyright restrictions, paywalls, or anti-scraping mechanisms. These differences in the artificial intelligence capturing system may further create a data advantage.
Artificial intelligence researcher Zhang Ziang, in an interview with Epoch Times, mentioned that in the past, Chinese Communist propaganda mainly relied on television, newspapers, search engines, and social platforms for censorship. The emergence of the artificial intelligence era has brought about new circumstances: “The propaganda system may not need to directly intervene in model companies or engage in technical intrusions. Simply by continually and massively releasing content, it can potentially enter the global artificial intelligence system through the training data in a reverse manner.”
Zhang Ziang believes that platforms like People’s Daily continuously output a unified political discourse and spread it through reposting on websites and search systems. He stated, “What is truly alarming is not a few pieces of propaganda entering the models, but models starting to learn a specific narrative. When users receive the same explanation repeatedly over time, it may become the default cognition. This influence is more insidious than traditional propaganda.”
A team of researchers from several US universities, for the first time, tracked the path of official Chinese media infiltrating the artificial intelligence training system through a peer-review process. The study focused on official platforms such as Xinhua News Agency, People’s Daily, and Study China.
Analyzing the open-source Chinese data set CulturaX, the research team discovered that the database contains approximately 189 million Chinese documents, with the scale of official media content from the Chinese Communist Party reaching 41 times that of Chinese Wikipedia. In political terms such as “party congress” and “central committee,” the official content accounts for one-fourth of the total.
Subsequently, researchers tested mainstream models like ChatGPT, Claude, Gemini, and DeepSeek and found noticeable differences in responses when addressing Chinese political issues in Chinese and English. Some models in the Chinese environment can naturally continue the political rhetoric used by Xi Jinping in recent years and provide more positive interpretations, while English responses remain relatively neutral, and DeepSeek maintains a high level of consistency in both Chinese and English settings.
Molly Roberts, co-director of the China Data Lab at the University of California, San Diego, stated, “Authoritarian governments can now shape global information consumption across borders through artificial intelligence.”
Feng Qi, a network technology engineer from Guangdong, mentioned to reporters, “It is a fact that the Chinese Communist Party’s propaganda is being captured by artificial intelligence. I found using Claude that a significant portion of the expressions regarding China come from official narratives, such as describing unemployment as flexible employment, urban unemployment surveys, leading cadres, and Party members. These terms should not appear in models outside China, and ChatGPT also faces this issue.”
Feng Qi revealed that Chinese authorities are actively recruiting engineers from Silicon Valley in the United States: “Companies in Guangdong, Zhejiang, and Beijing in the artificial intelligence industry are actively headhunting on LinkedIn, especially targeting employees of top AI companies in Silicon Valley and top AI company engineers. If you bring the latest technology, you could receive rewards ranging from hundreds of thousands to millions of RMB. Currently, domestic demand for developing artificial intelligence far exceeds that of chips.”
The research indicates that this influence does not require technical intrusion. Official content from Xinhua News Agency, People’s Daily, among others, can be openly available for an extended period and easy to access in the artificial intelligence capturing system, while many independent media outlets face copyright and paywall limitations. Media scholar Zhang Cheng mentioned to reporters, “Users see answers provided by artificial intelligence but are unaware of who has been supplying the content in the long run.” The research expanded to 37 countries and found that the lower the degree of press freedom, the more likely AI outputs in their language environments are to resemble the regime’s narrative.
