LAION

ai-trained-on-photos-from-kids’-entire-childhood-without-their-consent

AI trained on photos from kids’ entire childhood without their consent

AI trained on photos from kids’ entire childhood without their consent

Photos of Brazilian kids—sometimes spanning their entire childhood—have been used without their consent to power AI tools, including popular image generators like Stable Diffusion, Human Rights Watch (HRW) warned on Monday.

This act poses urgent privacy risks to kids and seems to increase risks of non-consensual AI-generated images bearing their likenesses, HRW’s report said.

An HRW researcher, Hye Jung Han, helped expose the problem. She analyzed “less than 0.0001 percent” of LAION-5B, a dataset built from Common Crawl snapshots of the public web. The dataset does not contain the actual photos but includes image-text pairs derived from 5.85 billion images and captions posted online since 2008.

Among those images linked in the dataset, Han found 170 photos of children from at least 10 Brazilian states. These were mostly family photos uploaded to personal and parenting blogs most Internet surfers wouldn’t easily stumble upon, “as well as stills from YouTube videos with small view counts, seemingly uploaded to be shared with family and friends,” Wired reported.

LAION, the German nonprofit that created the dataset, has worked with HRW to remove the links to the children’s images in the dataset.

That may not completely resolve the problem, though. HRW’s report warned that the removed links are “likely to be a significant undercount of the total amount of children’s personal data that exists in LAION-5B.” Han told Wired that she fears that the dataset may still be referencing personal photos of kids “from all over the world.”

Removing the links also does not remove the images from the public web, where they can still be referenced and used in other AI datasets, particularly those relying on Common Crawl, LAION’s spokesperson, Nate Tyler, told Ars.

“This is a larger and very concerning issue, and as a nonprofit, volunteer organization, we will do our part to help,” Tyler told Ars.

Han told Ars that “Common Crawl should stop scraping children’s personal data, given the privacy risks involved and the potential for new forms of misuse.”

According to HRW’s analysis, many of the Brazilian children’s identities were “easily traceable,” due to children’s names and locations being included in image captions that were processed when building the LAION dataset.

And at a time when middle and high school-aged students are at greater risk of being targeted by bullies or bad actors turning “innocuous photos” into explicit imagery, it’s possible that AI tools may be better equipped to generate AI clones of kids whose images are referenced in AI datasets, HRW suggested.

“The photos reviewed span the entirety of childhood,” HRW’s report said. “They capture intimate moments of babies being born into the gloved hands of doctors, young children blowing out candles on their birthday cake or dancing in their underwear at home, students giving a presentation at school, and teenagers posing for photos at their high school’s carnival.”

There is less risk that the Brazilian kids’ photos are currently powering AI tools since “all publicly available versions of LAION-5B were taken down” in December, Tyler told Ars. That decision came out of an “abundance of caution” after a Stanford University report “found links in the dataset pointing to illegal content on the public web,” Tyler said, including 3,226 suspected instances of child sexual abuse material.

Han told Ars that “the version of the dataset that we examined pre-dates LAION’s temporary removal of its dataset in December 2023.” The dataset will not be available again until LAION determines that all flagged illegal content has been removed.

“LAION is currently working with the Internet Watch Foundation, the Canadian Centre for Child Protection, Stanford, and Human Rights Watch to remove all known references to illegal content from LAION-5B,” Tyler told Ars. “We are grateful for their support and hope to republish a revised LAION-5B soon.”

In Brazil, “at least 85 girls” have reported classmates harassing them by using AI tools to “create sexually explicit deepfakes of the girls based on photos taken from their social media profiles,” HRW reported. Once these explicit deepfakes are posted online, they can inflict “lasting harm,” HRW warned, potentially remaining online for their entire lives.

“Children should not have to live in fear that their photos might be stolen and weaponized against them,” Han said. “The government should urgently adopt policies to protect children’s data from AI-fueled misuse.”

Ars could not immediately reach Stable Diffusion maker Stability AI for comment.

AI trained on photos from kids’ entire childhood without their consent Read More »

“csam-generated-by-ai-is-still-csam,”-doj-says-after-rare-arrest

“CSAM generated by AI is still CSAM,” DOJ says after rare arrest

“CSAM generated by AI is still CSAM,” DOJ says after rare arrest

The US Department of Justice has started cracking down on the use of AI image generators to produce child sexual abuse materials (CSAM).

On Monday, the DOJ arrested Steven Anderegg, a 42-year-old “extremely technologically savvy” Wisconsin man who allegedly used Stable Diffusion to create “thousands of realistic images of prepubescent minors,” which were then distributed on Instagram and Telegram.

The cops were tipped off to Anderegg’s alleged activities after Instagram flagged direct messages that were sent on Anderegg’s Instagram account to a 15-year-old boy. Instagram reported the messages to the National Center for Missing and Exploited Children (NCMEC), which subsequently alerted law enforcement.

During the Instagram exchange, the DOJ found that Anderegg sent sexually explicit AI images of minors soon after the teen made his age known, alleging that “the only reasonable explanation for sending these images was to sexually entice the child.”

According to the DOJ’s indictment, Anderegg is a software engineer with “professional experience working with AI.” Because of his “special skill” in generative AI (GenAI), he was allegedly able to generate the CSAM using a version of Stable Diffusion, “along with a graphical user interface and special add-ons created by other Stable Diffusion users that specialized in producing genitalia.”

After Instagram reported Anderegg’s messages to the minor, cops seized Anderegg’s laptop and found “over 13,000 GenAI images, with hundreds—if not thousands—of these images depicting nude or semi-clothed prepubescent minors lasciviously displaying or touching their genitals” or “engaging in sexual intercourse with men.”

In his messages to the teen, Anderegg seemingly “boasted” about his skill in generating CSAM, the indictment said. The DOJ alleged that evidence from his laptop showed that Anderegg “used extremely specific and explicit prompts to create these images,” including “specific ‘negative’ prompts—that is, prompts that direct the GenAI model on what not to include in generated content—to avoid creating images that depict adults.” These go-to prompts were stored on his computer, the DOJ alleged.

Anderegg is currently in federal custody and has been charged with production, distribution, and possession of AI-generated CSAM, as well as “transferring obscene material to a minor under the age of 16,” the indictment said.

Because the DOJ suspected that Anderegg intended to use the AI-generated CSAM to groom a minor, the DOJ is arguing that there are “no conditions of release” that could prevent him from posing a “significant danger” to his community while the court mulls his case. The DOJ warned the court that it’s highly likely that any future contact with minors could go unnoticed, as Anderegg is seemingly tech-savvy enough to hide any future attempts to send minors AI-generated CSAM.

“He studied computer science and has decades of experience in software engineering,” the indictment said. “While computer monitoring may address the danger posed by less sophisticated offenders, the defendant’s background provides ample reason to conclude that he could sidestep such restrictions if he decided to. And if he did, any reoffending conduct would likely go undetected.”

If convicted of all four counts, he could face “a total statutory maximum penalty of 70 years in prison and a mandatory minimum of five years in prison,” the DOJ said. Partly because of “special skill in GenAI,” the DOJ—which described its evidence against Anderegg as “strong”—suggested that they may recommend a sentencing range “as high as life imprisonment.”

Announcing Anderegg’s arrest, Deputy Attorney General Lisa Monaco made it clear that creating AI-generated CSAM is illegal in the US.

“Technology may change, but our commitment to protecting children will not,” Monaco said. “The Justice Department will aggressively pursue those who produce and distribute child sexual abuse material—or CSAM—no matter how that material was created. Put simply, CSAM generated by AI is still CSAM, and we will hold accountable those who exploit AI to create obscene, abusive, and increasingly photorealistic images of children.”

“CSAM generated by AI is still CSAM,” DOJ says after rare arrest Read More »