AI Training Dataset Contains Child Abuse Images, Canadian Analysis Finds 

AI Training Dataset Contains Child Abuse Images, Canadian Analysis Finds 
A person uses a cellphone in Ottawa on July 18, 2022. The Canadian Press/Sean Kilpatrick
|Updated:
0:00

Hundreds of child sexual abuse pictures were found in an image dataset used to train AI models, an analysis by a Canadian children’s organization has found.

The Canadian Centre for Child Protection (C3P) analyzed the NudeNet dataset, which features tens of thousands of images used by researchers to create AI tools aimed at detecting sexually explicit content. These images are sourced from platforms such as social media and adult pornography websites, an Oct. 22 C3P press release noted.

The centre’s analysis found nearly 680 images that the centre either recognized or suspected to be related to child sexual abuse and exploitation material. Those materials included images of more than 120 underage victims from both Canada and the United States.

Nearly 70 of the images found in the dataset were thought to depict pre-pubescent children. Another 130 similar images were of those who were post-pubescent. The analysis also found images depicting sexual acts involving children and teenagers.

The findings shine a light on serious ethical issues concerning the evolution of AI technologies, the centre said.

“As countries continue to invest in the development of AI technology, it’s crucial that researchers and industry consider the ethics of their work every step of the way,” C3P director of technology Lloyd Richardson said in the press release.

“Many of the AI models used to support features in applications and research initiatives have been trained on data that has been collected indiscriminately or in ethically questionable ways. This lack of due diligence has led to the appearance of known child sexual abuse and exploitation material in these types of datasets, something that is largely preventable.”

The centre said it has since issued a removal notification to Academic Torrents, which had been providing the user-generated dataset for download for more than six years. The Academic Torrents platform is often used by researchers and universities for downloading datasets. The centre noted that the images it flagged in its analysis are no longer accessible in the dataset.

The centre’s findings are consistent with a 2023 inquiry by Stanford University’s Cyber Policy Centre, which detected child sex abuse images in a dataset that was used in the creation of text-to-image AI models.

The Stanford investigation uncovered hundreds of known images of child sexual abuse material within an open dataset used for training widely used AI image generation models, including Stable Diffusion, according to a December 2023 university blog post. Models that were trained on this dataset subsequently generated photorealistic AI-created nude images, which included child sexual abuse material.
In light of its own and Stanford’s findings, C3P is recommending laws be implemented to govern ethical development and use of AI technologies.

AI Use and Regulation

AI is a key part of Prime Minister Mark Carney’s approach to digital policy and he has made the development of Canada’s AI capacity a priority since winning this spring’s federal election.
It was also a topic of conversation among G7 leaders at the summit hosted by Carney in Alberta this summer. The leaders released a joint statement announcing their plan to accelerate the adoption of AI in the public sector, which includes the expansion of AI-focused talent exchanges between the G7 countries.

Carney has tasked Evan Solomon, the first cabinet member to hold the position of Artificial Intelligence Minister, to shape Canadian AI policy.

Solomon has said he will put less emphasis on AI regulation and will instead focus on finding ways to harness the technology’s economic benefits. Solomon said during a speech in June that Canada would move away from “over-indexing on warnings and regulation” to ensure the economy benefits from AI.

Regulation isn’t about finding “a saddle to throw on the bucking bronco called AI innovation,” Solomon said at the Canada 2020 summit in Ottawa in June. “But it is to make sure that the horse doesn’t kick people in the face. And we need to protect people’s data and their privacy.”

While widely regulated AI does not appear to be on the Liberal’s radar currently, Carney promised this summer to criminalize the creation of non-consensual sexualized “deepfakes.” Deepfakes are images or videos that have been altered or produced through artificial intelligence to create inappropriate content.
Google LogoMark Us Preferred on Google
Jennifer Cowan
Jennifer Cowan
Author
Jennifer Cowan is a writer and editor with the Canadian edition of The Epoch Times.