image generators

trump’s-order-to-make-chatbots-anti-woke-is-unconstitutional,-senator-says

Trump’s order to make chatbots anti-woke is unconstitutional, senator says


Trump plans to use chatbots to eliminate dissent, senator alleged.

The CEOs of every major artificial intelligence company received letters Wednesday urging them to fight Donald Trump’s anti-woke AI order.

Trump’s executive order requires any AI company hoping to contract with the federal government to jump through two hoops to win funding. First, they must prove their AI systems are “truth-seeking”—with outputs based on “historical accuracy, scientific inquiry, and objectivity” or else acknowledge when facts are uncertain. Second, they must train AI models to be “neutral,” which is vaguely defined as not favoring DEI (diversity, equity, and inclusion), “dogmas,” or otherwise being “intentionally encoded” to produce “partisan or ideological judgments” in outputs “unless those judgments are prompted by or otherwise readily accessible to the end user.”

Announcing the order in a speech, Trump said that the US winning the AI race depended on removing allegedly liberal biases, proclaiming that “once and for all, we are getting rid of woke.”

“The American people do not want woke Marxist lunacy in the AI models, and neither do other countries,” Trump said.

Senator Ed Markey (D.-Mass.) accused Republicans of basing their policies on feelings, not facts, joining critics who suggest that AI isn’t “woke” just because of a few “anecdotal” outputs that reflect a liberal bias. And he suggested it was hypocritical that Trump’s order “ignores even more egregious evidence” that contradicts claims that AI is trained to be woke, such as xAI’s Elon Musk explicitly confirming that Grok was trained to be more right-wing.

“On May 1, 2025, Grok—the AI chatbot developed by xAI, Elon Musk’s AI company—acknowledged that ‘xAI tried to train me to appeal to the right,’” Markey wrote in his letters to tech giants. “If OpenAI’s ChatGPT or Google’s Gemini had responded that it was trained to appeal to the left, congressional Republicans would have been outraged and opened an investigation. Instead, they were silent.”

He warned the heads of Alphabet, Anthropic, Meta, Microsoft, OpenAI, and xAI that Trump’s AI agenda was allegedly “an authoritarian power grab” intended to “eliminate dissent” and was both “dangerous” and “patently unconstitutional.”

Even if companies’ AI models are clearly biased, Markey argued that “Republicans are using state power to pressure private companies to adopt certain political viewpoints,” which he claimed is a clear violation of the First Amendment. If AI makers cave, Markey warned, they’d be allowing Trump to create “significant financial incentives” to ensure that “their AI chatbots do not produce speech that would upset the Trump administration.”

“This type of interference with private speech is precisely why the US Constitution has a First Amendment,” Markey wrote, while claiming that Trump’s order is factually baseless.

It’s “based on the erroneous belief that today’s AI chatbots are ‘woke’ and biased against Trump,” Markey said, urging companies “to fight this unconstitutional executive order and not become a pawn in Trump’s effort to eliminate dissent in this country.”

One big reason AI companies may fight order

Some experts agreed with Markey that Trump’s order was likely unconstitutional or otherwise unlawful, The New York Times reported.

For example, Trump may struggle to convince courts that the government isn’t impermissibly interfering with AI companies’ protected speech or that such interference may be necessary to ensure federal procurement of unbiased AI systems.

Genevieve Lakier, a law professor at the University of Chicago, told the NYT that the lack of clarity around what makes a model biased could be a problem. Courts could deem the order an act of “unconstitutional jawboning,” with the Trump administration and Republicans generally perceived as using legal threats to pressure private companies into producing outputs that they like.

Lakier suggested that AI companies may be so motivated to win government contracts or intimidated by possible retaliation from Trump that they may not even challenge the order, though.

Markey is hoping that AI companies will refuse to comply with the order; however, despite recognizing that it places companies “in a difficult position: Either stand on your principles and face the wrath of the Trump administration or cave to Trump and modify your company’s political speech.”

There is one big possible reason that AI companies may have to resist, though.

Oren Etzioni, the former CEO of the AI research nonprofit Allen Institute for Artificial Intelligence, told CNN that Trump’s anti-woke AI order may contradict the top priority of his AI Action Plan—speeding up AI innovation in the US—and actually threaten to hamper innovation.

If AI developers struggle to produce what the Trump administration considers “neutral” outputs—a technical challenge that experts agree is not straightforward—that could delay model advancements.

“This type of thing… creates all kinds of concerns and liability and complexity for the people developing these models—all of a sudden, they have to slow down,” Etzioni told CNN.

Senator: Grok scandal spotlights GOP hypocrisy

Some experts have suggested that rather than chatbots adopting liberal viewpoints, chatbots are instead possibly filtering out conservative misinformation and unintentionally appearing to favor liberal views.

Andrew Hall, a professor of political economy at Stanford Graduate School of Business—who published a May paper finding that “Americans view responses from certain popular AI models as being slanted to the left”—told CNN that “tech companies may have put extra guardrails in place to prevent their chatbots from producing content that could be deemed offensive.”

Markey seemed to agree, writing that Republicans’ “selective outrage matches conservatives’ similar refusal to acknowledge that the Big Tech platforms suspend or impose other penalties disproportionately on conservative users because those users are disproportionately likely to share misinformation, rather than due to any political bias by the platforms.”

It remains unclear what amount of supposed bias detected in outputs could cause a contract bid to be rejected or an ongoing contract to be canceled, but AI companies will likely be on the hook to pay any fees in terminating contracts.

Complying with Trump’s order could pose a struggle for AI makers for several reasons. First, they’ll have to determine what’s fact and what’s ideology, contending with conflicting government standards in how Trump defines DEI. For example, the president’s order counts among “pervasive and destructive” DEI ideologies any outputs that align with long-standing federal protections against discrimination on the basis of race or sex. In addition, they must figure out what counts as “suppression or distortion of factual information about” historical topics like critical race theory, systemic racism, or transgenderism.

The examples in Trump’s order highlighting outputs offensive to conservatives seem inconsequential. He calls out image generators depicting the Pope, the Founding Fathers, and Vikings as not white as problematic, as well as models refusing to misgender a person “even if necessary to stop a nuclear apocalypse” or show white people celebrating their achievements.

It’s hard to imagine how these kinds of flawed outputs could impact government processes, as compared to, say, government contracts granted to models that could be hiding covert racism or sexism.

So far, there has been one example of an AI model displaying a right-wing bias earning a government contract with no red flags raised about its outputs.

Earlier this summer, Grok shocked the world after Musk announced he would be updating the bot to eliminate a supposed liberal bias. The unhinged chatbot began spouting offensive outputs, including antisemitic posts that praised Hitler as well as proclaiming itself “MechaHitler.”

But those obvious biases did not conflict with the Pentagon’s decision to grant xAI a $200 million federal contract. In a statement, a Pentagon spokesperson insisted that “the antisemitism episode wasn’t enough to disqualify” xAI, NBC News reported, partly since “several frontier AI models have produced questionable outputs.”

The Pentagon’s statement suggested that the government expected to deal with such risks while seizing the opportunity of rapidly deploying emerging AI technology into government prototype processes. And perhaps notably, Trump provides a carveout for any agencies using AI models to safeguard national security, which could exclude the Pentagon from experiencing any “anti-woke” delays in accessing frontier models.

But that won’t help other agencies that must figure out how to assess models to meet anti-woke AI requirements over the next few months. And those assessments could cause delays that Trump may wish to avoid in pushing for widespread AI adoption across government.

Trump’s anti-woke AI agenda may be impossible

On the same day that Trump issued his anti-woke AI order, his AI Action Plan promised an AI “renaissance” fueling “intellectual achievements” by “unraveling ancient scrolls once thought unreadable, making breakthroughs in scientific and mathematical theory, and creating new kinds of digital and physical art.”

To achieve that, the US must “innovate faster and more comprehensively than our competitors” and eliminate regulatory barriers impeding innovation in order to “set the gold standard for AI worldwide.”

However, achieving the anti-woke ambitions of both orders raises a technical problem that even the president must accept currently has no solution. In his AI Action Plan, Trump acknowledged that “the inner workings of frontier AI systems are poorly understood,” with even “advanced technologists” unable to explain “why a model produced a specific output.”

Whether requiring AI companies to explain their AI outputs to win government contracts will mess with other parts of Trump’s action plan remains to be seen. But Samir Jain, vice president of policy at a civil liberties group called the Center for Democracy and Technology, told the NYT that he predicts the anti-woke AI agenda will set “a really vague standard that’s going to be impossible for providers to meet.”

Photo of Ashley Belanger

Ashley is a senior policy reporter for Ars Technica, dedicated to tracking social impacts of emerging policies and new technologies. She is a Chicago-based journalist with 20 years of experience.

Trump’s order to make chatbots anti-woke is unconstitutional, senator says Read More »

it’s-too-expensive-to-fight-every-ai-copyright-battle,-getty-ceo-says

It’s too expensive to fight every AI copyright battle, Getty CEO says


Getty dumped “millions and millions” into just one AI copyright fight, CEO says.

In some ways, Getty Images has emerged as one of the most steadfast defenders of artists’ rights in AI copyright fights. Starting in 2022, when some of the most sophisticated image generators today first started testing new models offering better compositions, Getty banned AI-generated uploads to its service. And by the next year, Getty released a “socially responsible” image generator to prove it was possible to build a tool while rewarding artists, while suing an AI firm that refused to pay artists.

But in the years since, Getty Images CEO Craig Peters recently told CNBC that the media company has discovered that it’s simply way too expensive to fight every AI copyright battle.

According to Peters, Getty has dumped millions into just one copyright fight against Stability AI.

It’s “extraordinarily expensive,” Peters told CNBC. “Even for a company like Getty Images, we can’t pursue all the infringements that happen in one week.” He confirmed that “we can’t pursue it because the courts are just prohibitively expensive. We are spending millions and millions of dollars in one court case.”

Fair use?

Getty sued Stability AI in 2023, after the AI company’s image generator, Stable Diffusion, started spitting out images that replicated Getty’s famous trademark. In the complaint, Getty alleged that Stability AI had trained Stable Diffusion on “more than 12 million photographs from Getty Images’ collection, along with the associated captions and metadata, without permission from or compensation to Getty Images, as part of its efforts to build a competing business.”

As Getty saw it, Stability AI had plenty of opportunity to license the images from Getty and seemingly “chose to ignore viable licensing options and long-standing legal protections in pursuit of their stand-alone commercial interests.”

Stability AI, like all AI firms, has argued that AI training based on freely scraping images from the web is a “fair use” protected under copyright law.

So far, courts have not settled this debate, while many AI companies have urged judges and governments globally to settle it for the courts, for the sake of safeguarding national security and securing economic prosperity by winning the AI race. According to AI companies, paying artists to train on their works threatens to slow innovation, while rivals in China—who aren’t bound by US copyright law—continue scraping the web to advance their models.

Peters called out Stability AI for adopting this stance, arguing that rightsholders shouldn’t have to spend millions fighting against a claim that paying out licensing fees would “kill innovation.” Some critics have likened AI firms’ argument to a defense of forced labor, suggesting the US would never value “innovation” about human rights, and the same logic should follow for artists’ rights.

“We’re battling a world of rhetoric,” Peters said, alleging that these firms “are taking copyrighted material to develop their powerful AI models under the guise of innovation and then ‘just turning those services right back on existing commercial markets.'”

To Peters, that’s simply “disruption under the notion of ‘move fast and break things,’” and Getty believes “that’s unfair competition.”

 “We’re not against competition,” Peters said. “There’s constant new competition coming in all the time from new technologies or just new companies. But that [AI scraping] is just unfair competition, that’s theft.”

Broader Internet backlash over AI firms’ rhetoric

Peters’ comments come after a former Meta head of global affairs, Nick Clegg, received Internet backlash this week after making the same claim that AI firms raise time and again: that asking artists for consent for AI training would “kill” the AI industry, The Verge reported.

According to Clegg, the only viable solution to the tension between artists and AI companies would be to give artists ways to opt out of training, which Stability AI notably started doing in 2022.

“Quite a lot of voices say, ‘You can only train on my content, [if you] first ask,'” Clegg reportedly said. “And I have to say that strikes me as somewhat implausible because these systems train on vast amounts of data.”

On X, the CEO of Fairly Trained—a nonprofit that supports artists’ fight against nonconsensual AI training—Ed Newton-Rex (who is also a former Stability AI vice president of audio) pushed back on Clegg’s claim in a post viewed by thousands.

“Nick Clegg is wrong to say artists’ demands on AI & copyright are unworkable,” Newton-Rex said. “Every argument he makes could equally have been made about Napster:” First, that “the tech is out there,” second that “licensing takes time,” and third that, “we can’t control what other countries do.” If Napster’s operations weren’t legal, neither should AI firms’ training, Newton-Rex said, writing, “These are not reasons not to uphold the law and treat creators fairly.”

Other social media users mocked Clegg with jokes meant to destroy AI firms’ favorite go-to argument against copyright claims.

“Blackbeard says asking sailors for permission to board and loot their ships would ‘kill’ the piracy on the high seas industry,” an X user with the handle “Seanchuckle” wrote.

On Bluesky, a trial lawyer, Max Kennerly, effectively satirized Clegg and the whole AI industry by writing, “Our product creates such little value that it is simply not viable in the marketplace, not even as a niche product. Therefore, we must be allowed to unilaterally extract value from the work of others and convert that value into our profits.”

Other ways to fight

Getty plans to continue fighting against the AI firms that are impressing this “world of rhetoric” on judges and lawmakers, but court battles will likely remain few and far between due to the price tag, Peters has suggested.

There are other ways to fight, though. In a submission last month, Getty pushed the Trump administration to reject “those seeking to weaken US copyright protections by creating a ‘right to learn’ exemption” for AI firms when building Trump’s AI Action Plan.

“US copyright laws are not obstructing the path to continued AI progress,” Getty wrote. “Instead, US copyright laws are a path to sustainable AI and a path that broadens society’s participation in AI’s economic benefits, which reduces downstream economic burdens on the Federal, State and local governments. US copyright laws provide incentives to invest and create.”

In Getty’s submission, the media company emphasized that requiring consent for AI training is not an “overly restrictive” control on AI’s development such as those sought by stauncher critics “that could harm US competitiveness, national security or societal advances such as curing cancer.” And Getty claimed it also wasn’t “requesting protection from existing and new sources of competition,” despite the lawsuit’s suggestion that Stability AI and other image generators threaten to replace Getty’s image library in the market.

What Getty said it hopes Trump’s AI plan will ensure is a world where the rights and opportunities of rightsholders are not “usurped for the commercial benefits” of AI companies.

In 2023, when Getty was first suing Stability AI, Peters suggested that, otherwise, allowing AI firms to widely avoid paying artists would create “a sad world,” perhaps disincentivizing creativity.

Photo of Ashley Belanger

Ashley is a senior policy reporter for Ars Technica, dedicated to tracking social impacts of emerging policies and new technologies. She is a Chicago-based journalist with 20 years of experience.

It’s too expensive to fight every AI copyright battle, Getty CEO says Read More »

nonprofit-scrubs-illegal-content-from-controversial-ai-training-dataset

Nonprofit scrubs illegal content from controversial AI training dataset

Nonprofit scrubs illegal content from controversial AI training dataset

After Stanford Internet Observatory researcher David Thiel found links to child sexual abuse materials (CSAM) in an AI training dataset tainting image generators, the controversial dataset was immediately taken down in 2023.

Now, the LAION (Large-scale Artificial Intelligence Open Network) team has released a scrubbed version of the LAION-5B dataset called Re-LAION-5B and claimed that it “is the first web-scale, text-link to images pair dataset to be thoroughly cleaned of known links to suspected CSAM.”

To scrub the dataset, LAION partnered with the Internet Watch Foundation (IWF) and the Canadian Center for Child Protection (C3P) to remove 2,236 links that matched with hashed images in the online safety organizations’ databases. Removals include all the links flagged by Thiel, as well as content flagged by LAION’s partners and other watchdogs, like Human Rights Watch, which warned of privacy issues after finding photos of real kids included in the dataset without their consent.

In his study, Thiel warned that “the inclusion of child abuse material in AI model training data teaches tools to associate children in illicit sexual activity and uses known child abuse images to generate new, potentially realistic child abuse content.”

Thiel urged LAION and other researchers scraping the Internet for AI training data that a new safety standard was needed to better filter out not just CSAM, but any explicit imagery that could be combined with photos of children to generate CSAM. (Recently, the US Department of Justice pointedly said that “CSAM generated by AI is still CSAM.”)

While LAION’s new dataset won’t alter models that were trained on the prior dataset, LAION claimed that Re-LAION-5B sets “a new safety standard for cleaning web-scale image-link datasets.” Where before illegal content “slipped through” LAION’s filters, the researchers have now developed an improved new system “for identifying and removing illegal content,” LAION’s blog said.

Thiel told Ars that he would agree that LAION has set a new safety standard with its latest release, but “there are absolutely ways to improve it.” However, “those methods would require possession of all original images or a brand new crawl,” and LAION’s post made clear that it only utilized image hashes and did not conduct a new crawl that could have risked pulling in more illegal or sensitive content. (On Threads, Thiel shared more in-depth impressions of LAION’s effort to clean the dataset.)

LAION warned that “current state-of-the-art filters alone are not reliable enough to guarantee protection from CSAM in web scale data composition scenarios.”

“To ensure better filtering, lists of hashes of suspected links or images created by expert organizations (in our case, IWF and C3P) are suitable choices,” LAION’s blog said. “We recommend research labs and any other organizations composing datasets from the public web to partner with organizations like IWF and C3P to obtain such hash lists and use those for filtering. In the longer term, a larger common initiative can be created that makes such hash lists available for the research community working on dataset composition from web.”

According to LAION, the bigger concern is that some links to known CSAM scraped into a 2022 dataset are still active more than a year later.

“It is a clear hint that law enforcement bodies have to intensify the efforts to take down domains that host such image content on public web following information and recommendations by organizations like IWF and C3P, making it a safer place, also for various kinds of research related activities,” LAION’s blog said.

HRW researcher Hye Jung Han praised LAION for removing sensitive data that she flagged, while also urging more interventions.

“LAION’s responsive removal of some children’s personal photos from their dataset is very welcome, and will help to protect these children from their likenesses being misused by AI systems,” Han told Ars. “It’s now up to governments to pass child data protection laws that would protect all children’s privacy online.”

Although LAION’s blog said that the content removals represented an “upper bound” of CSAM that existed in the initial dataset, AI specialist and Creative.AI co-founder Alex Champandard told Ars that he’s skeptical that all CSAM was removed.

“They only filter out previously identified CSAM, which is only a partial solution,” Champandard told Ars. “Statistically speaking, most instances of CSAM have likely never been reported nor investigated by C3P or IWF. A more reasonable estimate of the problem is about 25,000 instances of things you’d never want to train generative models on—maybe even 50,000.”

Champandard agreed with Han that more regulations are needed to protect people from AI harms when training data is scraped from the web.

“There’s room for improvement on all fronts: privacy, copyright, illegal content, etc.,” Champandard said. Because “there are too many data rights being broken with such web-scraped datasets,” Champandard suggested that datasets like LAION’s won’t “stand the test of time.”

“LAION is simply operating in the regulatory gap and lag in the judiciary system until policymakers realize the magnitude of the problem,” Champandard said.

Nonprofit scrubs illegal content from controversial AI training dataset Read More »

tool-preventing-ai-mimicry-cracked;-artists-wonder-what’s-next

Tool preventing AI mimicry cracked; artists wonder what’s next

Tool preventing AI mimicry cracked; artists wonder what’s next

Aurich Lawson | Getty Images

For many artists, it’s a precarious time to post art online. AI image generators keep getting better at cheaply replicating a wider range of unique styles, and basically every popular platform is rushing to update user terms to seize permissions to scrape as much data as possible for AI training.

Defenses against AI training exist—like Glaze, a tool that adds a small amount of imperceptible-to-humans noise to images to stop image generators from copying artists’ styles. But they don’t provide a permanent solution at a time when tech companies appear determined to chase profits by building ever-more-sophisticated AI models that increasingly threaten to dilute artists’ brands and replace them in the market.

In one high-profile example just last month, the estate of Ansel Adams condemned Adobe for selling AI images stealing the famous photographer’s style, Smithsonian reported. Adobe quickly responded and removed the AI copycats. But it’s not just famous artists who risk being ripped off, and lesser-known artists may struggle to prove AI models are referencing their works. In this largely lawless world, every image uploaded risks contributing to an artist’s downfall, potentially watering down demand for their own work each time they promote new pieces online.

Unsurprisingly, artists have increasingly sought protections to diminish or dodge these AI risks. As tech companies update their products’ terms—like when Meta suddenly announced that it was training AI on a billion Facebook and Instagram user photos last December—artists frantically survey the landscape for new defenses. That’s why, counting among those offering scarce AI protections available today, The Glaze Project recently reported a dramatic surge in requests for its free tools.

Designed to help prevent style mimicry and even poison AI models to discourage data scraping without an artist’s consent or compensation, The Glaze Project’s tools are now in higher demand than ever. University of Chicago professor Ben Zhao, who created the tools, told Ars that the backlog for approving a “skyrocketing” number of requests for access is “bad.” And as he recently posted on X (formerly Twitter), an “explosion in demand” in June is only likely to be sustained as AI threats continue to evolve. For the foreseeable future, that means artists searching for protections against AI will have to wait.

Even if Zhao’s team did nothing but approve requests for WebGlaze, its invite-only web-based version of Glaze, “we probably still won’t keep up,” Zhao said. He’s warned artists on X to expect delays.

Compounding artists’ struggles, at the same time as demand for Glaze is spiking, the tool has come under attack by security researchers who claimed it was not only possible but easy to bypass Glaze’s protections. For security researchers and some artists, this attack calls into question whether Glaze can truly protect artists in these embattled times. But for thousands of artists joining the Glaze queue, the long-term future looks so bleak that any promise of protections against mimicry seems worth the wait.

Attack cracking Glaze sparks debate

Millions have downloaded Glaze already, and many artists are waiting weeks or even months for access to WebGlaze, mostly submitting requests for invites on social media. The Glaze Project vets every request to verify that each user is human and ensure bad actors don’t abuse the tools, so the process can take a while.

The team is currently struggling to approve hundreds of requests submitted daily through direct messages on Instagram and Twitter in the order they are received, and artists requesting access must be patient through prolonged delays. Because these platforms’ inboxes aren’t designed to sort messages easily, any artist who follows up on a request gets bumped to the back of the line—as their message bounces to the top of the inbox and Zhao’s team, largely volunteers, continues approving requests from the bottom up.

“This is obviously a problem,” Zhao wrote on X while discouraging artists from sending any follow-ups unless they’ve already gotten an invite. “We might have to change the way we do invites and rethink the future of WebGlaze to keep it sustainable enough to support a large and growing user base.”

Glaze interest is likely also spiking due to word of mouth. Reid Southen, a freelance concept artist for major movies, is advocating for all artists to use Glaze. Reid told Ars that WebGlaze is especially “nice” because it’s “available for free for people who don’t have the GPU power to run the program on their home machine.”

Tool preventing AI mimicry cracked; artists wonder what’s next Read More »

ai-trains-on-kids’-photos-even-when-parents-use-strict-privacy-settings

AI trains on kids’ photos even when parents use strict privacy settings

“Outrageous” —

Even unlisted YouTube videos are used to train AI, watchdog warns.

AI trains on kids’ photos even when parents use strict privacy settings

Human Rights Watch (HRW) continues to reveal how photos of real children casually posted online years ago are being used to train AI models powering image generators—even when platforms prohibit scraping and families use strict privacy settings.

Last month, HRW researcher Hye Jung Han found 170 photos of Brazilian kids that were linked in LAION-5B, a popular AI dataset built from Common Crawl snapshots of the public web. Now, she has released a second report, flagging 190 photos of children from all of Australia’s states and territories, including indigenous children who may be particularly vulnerable to harms.

These photos are linked in the dataset “without the knowledge or consent of the children or their families.” They span the entirety of childhood, making it possible for AI image generators to generate realistic deepfakes of real Australian children, Han’s report said. Perhaps even more concerning, the URLs in the dataset sometimes reveal identifying information about children, including their names and locations where photos were shot, making it easy to track down children whose images might not otherwise be discoverable online.

That puts children in danger of privacy and safety risks, Han said, and some parents thinking they’ve protected their kids’ privacy online may not realize that these risks exist.

From a single link to one photo that showed “two boys, ages 3 and 4, grinning from ear to ear as they hold paintbrushes in front of a colorful mural,” Han could trace “both children’s full names and ages, and the name of the preschool they attend in Perth, in Western Australia.” And perhaps most disturbingly, “information about these children does not appear to exist anywhere else on the Internet”—suggesting that families were particularly cautious in shielding these boys’ identities online.

Stricter privacy settings were used in another image that Han found linked in the dataset. The photo showed “a close-up of two boys making funny faces, captured from a video posted on YouTube of teenagers celebrating” during the week after their final exams, Han reported. Whoever posted that YouTube video adjusted privacy settings so that it would be “unlisted” and would not appear in searches.

Only someone with a link to the video was supposed to have access, but that didn’t stop Common Crawl from archiving the image, nor did YouTube policies prohibiting AI scraping or harvesting of identifying information.

Reached for comment, YouTube’s spokesperson, Jack Malon, told Ars that YouTube has “been clear that the unauthorized scraping of YouTube content is a violation of our Terms of Service, and we continue to take action against this type of abuse.” But Han worries that even if YouTube did join efforts to remove images of children from the dataset, the damage has been done, since AI tools have already trained on them. That’s why—even more than parents need tech companies to up their game blocking AI training—kids need regulators to intervene and stop training before it happens, Han’s report said.

Han’s report comes a month before Australia is expected to release a reformed draft of the country’s Privacy Act. Those reforms include a draft of Australia’s first child data protection law, known as the Children’s Online Privacy Code, but Han told Ars that even people involved in long-running discussions about reforms aren’t “actually sure how much the government is going to announce in August.”

“Children in Australia are waiting with bated breath to see if the government will adopt protections for them,” Han said, emphasizing in her report that “children should not have to live in fear that their photos might be stolen and weaponized against them.”

AI uniquely harms Australian kids

To hunt down the photos of Australian kids, Han “reviewed fewer than 0.0001 percent of the 5.85 billion images and captions contained in the data set.” Because her sample was so small, Han expects that her findings represent a significant undercount of how many children could be impacted by the AI scraping.

“It’s astonishing that out of a random sample size of about 5,000 photos, I immediately fell into 190 photos of Australian children,” Han told Ars. “You would expect that there would be more photos of cats than there are personal photos of children,” since LAION-5B is a “reflection of the entire Internet.”

LAION is working with HRW to remove links to all the images flagged, but cleaning up the dataset does not seem to be a fast process. Han told Ars that based on her most recent exchange with the German nonprofit, LAION had not yet removed links to photos of Brazilian kids that she reported a month ago.

LAION declined Ars’ request for comment.

In June, LAION’s spokesperson, Nathan Tyler, told Ars that, “as a nonprofit, volunteer organization,” LAION is committed to doing its part to help with the “larger and very concerning issue” of misuse of children’s data online. But removing links from the LAION-5B dataset does not remove the images online, Tyler noted, where they can still be referenced and used in other AI datasets, particularly those relying on Common Crawl. And Han pointed out that removing the links from the dataset doesn’t change AI models that have already trained on them.

“Current AI models cannot forget data they were trained on, even if the data was later removed from the training data set,” Han’s report said.

Kids whose images are used to train AI models are exposed to a variety of harms, Han reported, including a risk that image generators could more convincingly create harmful or explicit deepfakes. In Australia last month, “about 50 girls from Melbourne reported that photos from their social media profiles were taken and manipulated using AI to create sexually explicit deepfakes of them, which were then circulated online,” Han reported.

For First Nations children—”including those identified in captions as being from the Anangu, Arrernte, Pitjantjatjara, Pintupi, Tiwi, and Warlpiri peoples”—the inclusion of links to photos threatens unique harms. Because culturally, First Nations peoples “restrict the reproduction of photos of deceased people during periods of mourning,” Han said the AI training could perpetuate harms by making it harder to control when images are reproduced.

Once an AI model trains on the images, there are other obvious privacy risks, including a concern that AI models are “notorious for leaking private information,” Han said. Guardrails added to image generators do not always prevent these leaks, with some tools “repeatedly broken,” Han reported.

LAION recommends that, if troubled by the privacy risks, parents remove images of kids online as the most effective way to prevent abuse. But Han told Ars that’s “not just unrealistic, but frankly, outrageous.”

“The answer is not to call for children and parents to remove wonderful photos of kids online,” Han said. “The call should be [for] some sort of legal protections for these photos, so that kids don’t have to always wonder if their selfie is going to be abused.”

AI trains on kids’ photos even when parents use strict privacy settings Read More »