New research suggests that content from Chinese state media is deeply embedded in the datasets used to train major artificial intelligence (AI) systems and may be subtly shaping how some models respond to politically sensitive questions.
A study published in the scientific journal Nature on May 13 found that large volumes of material from Chinese state outlets—including Xinhua News Agency and People’s Daily—appear in the training datasets of large language models.





