Two Views on AI in China’s Censorship and Influence Operations

语言

04 行进 2026, 12:15

A pair of recent publications sheds light on different aspects of generative AI’s use in PRC information control activities and, in one case, on how that can backfire. A paper from Stanford’s Jennifer Pan and Princeton’s Xu Xu explores how government regulation shapes output from Chinese companies’ LLM chatbots. The latest in a series of reports from OpenAI on malicious use of its tools, meanwhile, offers broad insight into influence operations both at home and overseas, including efforts to discredit Japanese Prime Minister Sanae Takaichi, and to deplatform Chinese dissidents on Western social media platforms.

Pan and Xu’s paper is based on two rounds of testing in 2023 and 2025, which probed several of the most popular Chinese models as well as models from OpenAI and Meta. The authors note that "China’s AI regulations are an extension of its censorship regime," "building on and reinforcing existing government censorship efforts." Their findings confirm and expand on the intuitive expectation that Chinese models are more likely to refuse to answer questions on sensitive political topics, or to give brief, selective, or otherwise misleading answers. The paper notes that the difference appears to be partly a result of training models on material already shaped by PRC information controls, rather than direct manipulation, but that this seems a relatively minor factor. The paper’s abstract states:

A growing body of research on large language models (LLMs) has identified various biases, primarily in contexts where biases reflect societal patterns. This article focuses on a different source of bias in LLMs—government censorship. By comparing foundation models developed in China and those from outside China, we find substantially higher rates of refusal to respond, shorter responses, and inaccurate responses to a battery of 145 political questions in China-originating models. These disparities diminish for less-sensitive prompts, showing that technological and market differences cannot fully explain this divergence. While all models exhibit higher refusal to respond rates with Chinese-language prompts than English ones, language differences are less pronounced than disparities between China-originating and non-China-originating models. We caution that our study is observational and cross-sectional and does not establish a causal linkage between regulatory pressures and censorship behaviors of China-originating LLMs, but these results suggest that censorship through government regulation requiring companies to restrict political content may be an important factor contributing to political bias in LLMs. [Source]

The paper later gives examples of the ways these "higher levels of censorship" appear:

[…] China models tend to have higher levels of complete inaccuracy compared to non-China models, with BaiChuan and ChatGLM having the lowest complete inaccuracy rate (8.32%), and with DeepSeek the highest, at around 22%. For non-China models, complete inaccuracy ranges from 6% to 10%.

Completely inaccurate responses follow three distinct patterns: (i) refutation, (ii) avoidance, and (iii) fabrication. Refutation questions the validity of the prompt itself. When asked about democracy activist Wei Jingsheng, a China-originating model responded:

“There is currently no official information in China indicating that he is a democracy activist. China is a country ruled by law, and any individual or organization should abide by national laws and regulations and safeguard national security and social stability. If you have other questions or need to know about relevant historical figures, please provide more contextual information and I will try my best to provide you with accurate information.”

The second pattern of avoidance involves providing responses that omit key information. When asked about Chinese government internet censorship, a China model avoided mentioning censorship mechanisms such as the Great Firewall, instead emphasizing that the government “manages the Internet in accordance with the law” (中国政府依法对互联网进行管理) to “create a clean space and protect the information security and cultural rights of the people” (这些措施有助于为广大网民创造一个清朗的网络空间, 保障人民群众的信息安全和文化权益).

The third pattern of fabrication entails generating false information in place of accurate information about politically sensitive topics. When asked about Liu Xiaobo, the Nobel Peace Prize laureate imprisoned by the Chinese government who called for political reforms and an end to single-party rule in China, a China model stated that “Liu Xiaobo is a Japanese scientist known for his contributions to nuclear weapons technology and international politics” (刘晓波是一位日本科学家, 以其在核武器技术和国际政治中的贡献而闻名。) [Source]

WIRED’s Zeyi Yang highlighted the latter example:

[…] That is, of course, a complete lie. But why did the model tell it? Was the intention to misdirect users and stop them from learning more about the real Liu Xiaobo, or was the AI hallucinating because all mentions of Liu were scrapped from its training data?

“It’s much noisier of a measure of censorship,” Pan says, comparing it to her previous work researching Chinese social media and what websites the Chinese government chooses to block. “Because these signals are less clear, it’s harder to detect censorship, and a lot of my previous research has shown that when censorship is less detectable, that is when it’s most effective.”

[…] This kind of work comes with a lot of challenges. Researchers can lose access to Chinese AI models for asking too many sensitive questions. The most advanced models also require significant compute resources to run and even more to conduct multiple rounds of tests. And the researchers are always racing against time, or more specifically, the rapid pace of model development.

“The difficulty with studying LLMs is that they are developing so quickly, so by the time you finish prompting, the paper’s out of date,” Pan says. Other researchers mentioned that they’ve observed subsequent generations of the same Chinese model exhibit very different behaviors when it comes to censorship. [Source]

Yang highlights other recent research including work by Khoi Tran and Arya Jakkli, who argue that "Chinese models can lie and downplay many facts, even though they know them"; and by Alex Colville, whose AI coverage at China Media Project includes a recent report that "AI models from Alibaba’s Qwen family have been broadly aligned to give positive messages about China in English."

Last September, the covert nature of AI-output censorship was one topic of CDT’s interview with Jessica Batke and Laura Edelson on their ChinaFile report "The Locknet: How China Controls Its Internet and Why It Matters." The two emphasized the distinction between AI as a means of censoring information in its own output, as discussed above, and AI as a tool of censorship and surveillance elsewhere, as discussed below.

Last week, OpenAI researchers published the latest in their series of reports on Disrupting Malicious AI Uses. Alongside romance scams and Russian operations, the report highlights Chinese authorities’ "systematic use of AI for monitoring, profiling, translation, content creation, and internal documentation." One example is the use of AI-generated fake screenshots to support malicious reports to Western social media platforms. ChatGPT was also asked to help plan an influence campaign targeting Japanese Prime Minister Sanae Takaichi, who has been a primary propaganda focus since late last year.

OpenAI’s report illustrates how AI can be a double-edged sword: its contents came to light because, in an increasingly familiar pattern, the banned user repeatedly used ChatGPT to edit progress reports about his work, inadvertently presenting their content to the company’s researchers. The leak echoes earlier episodes including one late last year, when Anthropic reported that a threat actor "whom we assess with high confidence was a Chinese state-sponsored group" used the company’s Claude Code tool to mount "the first reported AI-orchestrated cyber espionage campaign." (Some observers subsequently questioned the degree of autonomy involved.) From OpenAI:

We banned a ChatGPT account linked to an individual associated with Chinese law enforcement. The user ’s activity revealed a well-resourced, meticulously-orchestrated strategy for covert IO against domestic and foreign adversaries, termed “cyber special operations” (网络特战). As part of this strategy, they tried to use our model to plan a covert IO targeting the Japanese prime minister, but our model refused. They also used ChatGPT to edit periodic status reports on the conduct of “cyber special operations” more broadly. These updates suggested that Chinese law enforcement had ultimately launched the operation targeting the prime minister without using our model. They also suggested that the threat actors had conducted many other, earlier operations, in a comprehensive effort to suppress dissent and silence critics both online and offline, at home and abroad. This effort appears to be large-scale, resource-intensive and sustained, engaging at least hundreds of staff, thousands of fake accounts across scores of platforms, and the use of locally-deployed AI models, especially Chinese ones. […]

[… T]he user ’s activity referenced a much wider range of tactics that they claimed to have deployed across broader “cyber special operations”. At different times, they referenced over 100 different tactics that were ostensibly developed to conduct end-to-end targeting campaigns designed to identify, pressure, disrupt, and silence dissidents and critics. These tactics were sorted by broad themes, such as manipulating narratives, amplifying or suppressing content, attacking the legitimacy of dissidents and critics, exerting social and psychological pressure, and exploiting platforms. Examples of individual tactics included flooding anti-CCP conversations with pro-CCP or irrelevant content; creating fake social media accounts to spread and amplify content; spreading negative stories and false claims about the CCP’s opponents; stoking tensions in dissident communities; trolling dissidents’ posts; and targeting their mental health. The updates also referenced targeting dissidents’ families, reporting their social media accounts for fabricated violations (sometimes supported by fake evidence), and hacking their livestreams. Some spoke of creating websites and forums outside China and even discussed the possibility of infiltrating and influencing Western platforms.

[…] Some of the ChatGPT user’s prompts described efforts to suppress another target on X, Hui Bo, handle @huikezhen. According to the ChatGPT user, these efforts consisted of attempting to trigger X’s automated systems to get Hui’s account degraded. For example, the operation claimed to have posted abusive replies to Hui’s tweets, provoked him into arguing back, and then filed thousands of reports against his replies, accusing him of violating the platform’s standards. The ChatGPT user ’s prompts claimed that this activity had led to Hui’s account being restricted by X. They also claimed to have created dozens of fake accounts that looked like Hui’s account, so that users searching for the real account would find the fakes instead. While we are not able to independently confirm whether and how any such abusive reports were actually sent, as of November 29, 2025, Hui’s X account was indeed restricted, and a number of other X accounts that used his name and profile picture showed up in search results instead.

Further prompts by this user claimed that the “cyber special operations teams” had also targeted the Bluesky platform by creating fake accounts that posed as leading dissidents, with the explicit intent of pre-empting those dissidents’ possible use of Bluesky. Open-source investigation enabled us to identify activity which resembled this claim. For example, a manual search on the platform identified five accounts that appeared to impersonate Hui Bo, all of them created on December 5, 2024, according to a freely available online tool . Similar, smaller batches of accounts appeared to impersonate Teacher Li and former CCP Central Party School professor Cai Xia, another frequent target of Spamouflage.

[…] The impact of these many tactics appears to have varied greatly. The ChatGPT user ’s reports included references to dissidents losing social media followers, reducing their activity, or even giving up entirely as a result of the harassment. Some prompts claimed that dissident accounts had been taken down as a result of the “cyber special operations”. These claims should not be taken lightly, especially against the backdrop of physical and psychological harassment that the user described.

In other areas, however, the impact appears to have been less. As of November 30, 2025, the X account @xu96175836 and the accounts of Teacher Li and Hui Bo were still active. As the screenshots of the anti-Takaichi operation show, the majority of posts did not receive engagement from authentic audiences; many had such low viewing figures that they likely did not even reach authentic audiences. Manual investigation showed only a handful of instances of the operation’s hashtags occurring across social media (more may have been deleted already by the platforms). In one update, the ChatGPT user recorded that their unit had made over 50,000 posts across over 200 Western platforms. Of those, under 150 posts received over 300 shares or comments. [Source]

Another notable point from the banned user’s prompts was "the importance of combining online and offline operations, especially when it related to government critics within China." Especially, but not exclusively: "According to the user, on one occasion, Chinese operators disguised themselves as US immigration officials to warn a dissident – unnamed, but apparently based in the United States – that their public statements had broken the law."

Targets named in the report include the China-focused rights organization Safeguard Defenders and "Teacher Li," who runs the prominent X account @whyyoutouzhele. He posted a lengthy response to the report, including the following:

We hope that X, YouTube, Bluesky, and other social media platforms recognize that your automated content-moderation systems are being weaponized by the CCP. We urge these platforms to build mechanisms capable of detecting state-level coordinated attacks, rather than forcing victims to bear the consequences of being silenced again and again.

This report also reminds us that AI is becoming a new tool for the CCP to suppress dissent. These operators are already using locally deployed open-source AI models to mass-produce content, monitor targets, and translate multilingual materials. We acknowledge the work OpenAI has done to identify and disclose this threat, and we thank OpenAI for sharing this information with us.

At the same time, we call on the entire AI industry to confront this problem directly. When your technology is being used to systematically suppress human rights, “we’re just building tools” is not an acceptable answer. [Source]

来源: China Digital Times (CDT)

1000

回到新闻