fair use

NYT to start searching deleted ChatGPT logs after beating OpenAI in court

AI, Artificial Intelligence, chatbots, chatgpt, copyright, copyright infringement, fair use, new york times, openai, Policy / Kris Guyer / July 3, 2025

What are the odds NYT will access your ChatGPT logs in OpenAI court battle?

Last week, OpenAI raised objections in court, hoping to overturn a court order requiring the AI company to retain all ChatGPT logs “indefinitely,” including deleted and temporary chats.

But Sidney Stein, the US district judge reviewing OpenAI’s request, immediately denied OpenAI’s objections. He was seemingly unmoved by the company’s claims that the order forced OpenAI to abandon “long-standing privacy norms” and weaken privacy protections that users expect based on ChatGPT’s terms of service. Rather, Stein suggested that OpenAI’s user agreement specified that their data could be retained as part of a legal process, which Stein said is exactly what is happening now.

The order was issued by magistrate judge Ona Wang just days after news organizations, led by The New York Times, requested it. The news plaintiffs claimed the order was urgently needed to preserve potential evidence in their copyright case, alleging that ChatGPT users are likely to delete chats where they attempted to use the chatbot to skirt paywalls to access news content.

A spokesperson told Ars that OpenAI plans to “keep fighting” the order, but the ChatGPT maker seems to have few options left. They could possibly petition the Second Circuit Court of Appeals for a rarely granted emergency order that could intervene to block Wang’s order, but the appeals court would have to consider Wang’s order an extraordinary abuse of discretion for OpenAI to win that fight.

OpenAI’s spokesperson declined to confirm if the company plans to pursue this extreme remedy.

In the meantime, OpenAI is negotiating a process that will allow news plaintiffs to search through the retained data. Perhaps the sooner that process begins, the sooner the data will be deleted. And that possibility puts OpenAI in the difficult position of having to choose between either caving to some data collection to stop retaining data as soon as possible or prolonging the fight over the order and potentially putting more users’ private conversations at risk of exposure through litigation or, worse, a data breach.

News orgs will soon start searching ChatGPT logs

The clock is ticking, and so far, OpenAI has not provided any official updates since a June 5 blog post detailing which ChatGPT users will be affected.

While it’s clear that OpenAI has been and will continue to retain mounds of data, it would be impossible for The New York Times or any news plaintiff to search through all that data.

Instead, only a small sample of the data will likely be accessed, based on keywords that OpenAI and news plaintiffs agree on. That data will remain on OpenAI’s servers, where it will be anonymized, and it will likely never be directly produced to plaintiffs.

Both sides are negotiating the exact process for searching through the chat logs, with both parties seemingly hoping to minimize the amount of time the chat logs will be preserved.

For OpenAI, sharing the logs risks revealing instances of infringing outputs that could further spike damages in the case. The logs could also expose how often outputs attribute misinformation to news plaintiffs.

But for news plaintiffs, accessing the logs is not considered key to their case—perhaps providing additional examples of copying—but could help news organizations argue that ChatGPT dilutes the market for their content. That could weigh against the fair use argument, as a judge opined in a recent ruling that evidence of market dilution could tip an AI copyright case in favor of plaintiffs.

Jay Edelson, a leading consumer privacy lawyer, told Ars that he’s concerned that judges don’t seem to be considering that any evidence in the ChatGPT logs wouldn’t “advance” news plaintiffs’ case “at all,” while really changing “a product that people are using on a daily basis.”

Edelson warned that OpenAI itself probably has better security than most firms to protect against a potential data breach that could expose these private chat logs. But “lawyers have notoriously been pretty bad about securing data,” Edelson suggested, so “the idea that you’ve got a bunch of lawyers who are going to be doing whatever they are” with “some of the most sensitive data on the planet” and “they’re the ones protecting it against hackers should make everyone uneasy.”

So even though odds are pretty good that the majority of users’ chats won’t end up in the sample, Edelson said the mere threat of being included might push some users to rethink how they use AI. He further warned that ChatGPT users turning to OpenAI rival services like Anthropic’s Claude or Google’s Gemini could suggest that Wang’s order is improperly influencing market forces, which also seems “crazy.”

To Edelson, the most “cynical” take could be that news plaintiffs are possibly hoping the order will threaten OpenAI’s business to the point where the AI company agrees to a settlement.

Regardless of the news plaintiffs’ motives, the order sets an alarming precedent, Edelson said. He joined critics suggesting that more AI data may be frozen in the future, potentially affecting even more users as a result of the sweeping order surviving scrutiny in this case. Imagine if litigation one day targets Google’s AI search summaries, Edelson suggested.

Lawyer slams judges for giving ChatGPT users no voice

Edelson told Ars that the order is so potentially threatening to OpenAI’s business that the company may not have a choice but to explore every path available to continue fighting it.

“They will absolutely do something to try to stop this,” Edelson predicted, calling the order “bonkers” for overlooking millions of users’ privacy concerns while “strangely” excluding enterprise customers.

From court filings, it seems possible that enterprise users were excluded to protect OpenAI’s competitiveness, but Edelson suggested there’s “no logic” to their exclusion “at all.” By excluding these ChatGPT users, the judge’s order may have removed the users best resourced to fight the order, Edelson suggested.

“What that means is the big businesses, the ones who have the power, all of their stuff remains private, and no one can touch that,” Edelson said.

Instead, the order is “only going to intrude on the privacy of the common people out there,” which Edelson said “is really offensive,” given that Wang denied two ChatGPT users’ panicked request to intervene.

“We are talking about billions of chats that are now going to be preserved when they weren’t going to be preserved before,” Edelson said, noting that he’s input information about his personal medical history into ChatGPT. “People ask for advice about their marriages, express concerns about losing jobs. They say really personal things. And one of the bargains in dealing with OpenAI is that you’re allowed to delete your chats and you’re allowed to temporary chats.”

The greatest risk to users would be a data breach, Edelson said, but that’s not the only potential privacy concern. Corynne McSherry, legal director for the digital rights group the Electronic Frontier Foundation, previously told Ars that as long as users’ data is retained, it could also be exposed through future law enforcement and private litigation requests.

Edelson pointed out that most privacy attorneys don’t consider OpenAI CEO Sam Altman to be a “privacy guy,” despite Altman recently slamming the NYT, alleging it sued OpenAI because it doesn’t “like user privacy.”

“He’s trying to protect OpenAI, and he does not give a hoot about the privacy rights of consumers,” Edelson said, echoing one ChatGPT user’s dismissed concern that OpenAI may not prioritize users’ privacy concerns in the case if it’s financially motivated to resolve the case.

“The idea that he and his lawyers are really going to be the safeguards here isn’t very compelling,” Edelson said. He criticized the judges for dismissing users’ concerns and rejecting OpenAI’s request that users get a chance to testify.

“What’s really most appalling to me is the people who are being affected have had no voice in it,” Edelson said.

Ashley is a senior policy reporter for Ars Technica, dedicated to tracking social impacts of emerging policies and new technologies. She is a Chicago-based journalist with 20 years of experience.

NYT to start searching deleted ChatGPT logs after beating OpenAI in court Read More »

Anthropic destroyed millions of print books to build its AI models

AI, AI companies, AI development, AI ethics, AI law, AI research, AI training, Anthropic, Biz & IT, book scanning, Claude, copyright, fair use, google books, Internet Archive, legal rulings, machine learning, Policy, scanning, training data / Kris Guyer / June 26, 2025

But if you’re not intimately familiar with the AI industry and copyright, you might wonder: Why would a company spend millions of dollars on books to destroy them? Behind these odd legal maneuvers lies a more fundamental driver: the AI industry’s insatiable hunger for high-quality text.

The race for high-quality training data

To understand why Anthropic would want to scan millions of books, it’s important to know that AI researchers build large language models (LLMs) like those that power ChatGPT and Claude by feeding billions of words into a neural network. During training, the AI system processes the text repeatedly, building statistical relationships between words and concepts in the process.

The quality of training data fed into the neural network directly impacts the resulting AI model’s capabilities. Models trained on well-edited books and articles tend to produce more coherent, accurate responses than those trained on lower-quality text like random YouTube comments.

Publishers legally control content that AI companies desperately want, but AI companies don’t always want to negotiate a license. The first-sale doctrine offered a workaround: Once you buy a physical book, you can do what you want with that copy—including destroy it. That meant buying physical books offered a legal workaround.

And yet buying things is expensive, even if it is legal. So like many AI companies before it, Anthropic initially chose the quick and easy path. In the quest for high-quality training data, the court filing states, Anthropic first chose to amass digitized versions of pirated books to avoid what CEO Dario Amodei called “legal/practice/business slog”—the complex licensing negotiations with publishers. But by 2024, Anthropic had become “not so gung ho about” using pirated ebooks “for legal reasons” and needed a safer source.

Anthropic destroyed millions of print books to build its AI models Read More »

It’s too expensive to fight every AI copyright battle, Getty CEO says

AI, Artificial Intelligence, copyright, fair use, generative ai, Getty Images, image generators, Policy, Stability AI, Stable Diffusion / Tim Belzer / May 29, 2025

Getty dumped “millions and millions” into just one AI copyright fight, CEO says.

In some ways, Getty Images has emerged as one of the most steadfast defenders of artists’ rights in AI copyright fights. Starting in 2022, when some of the most sophisticated image generators today first started testing new models offering better compositions, Getty banned AI-generated uploads to its service. And by the next year, Getty released a “socially responsible” image generator to prove it was possible to build a tool while rewarding artists, while suing an AI firm that refused to pay artists.

But in the years since, Getty Images CEO Craig Peters recently told CNBC that the media company has discovered that it’s simply way too expensive to fight every AI copyright battle.

According to Peters, Getty has dumped millions into just one copyright fight against Stability AI.

It’s “extraordinarily expensive,” Peters told CNBC. “Even for a company like Getty Images, we can’t pursue all the infringements that happen in one week.” He confirmed that “we can’t pursue it because the courts are just prohibitively expensive. We are spending millions and millions of dollars in one court case.”

Fair use?

Getty sued Stability AI in 2023, after the AI company’s image generator, Stable Diffusion, started spitting out images that replicated Getty’s famous trademark. In the complaint, Getty alleged that Stability AI had trained Stable Diffusion on “more than 12 million photographs from Getty Images’ collection, along with the associated captions and metadata, without permission from or compensation to Getty Images, as part of its efforts to build a competing business.”

As Getty saw it, Stability AI had plenty of opportunity to license the images from Getty and seemingly “chose to ignore viable licensing options and long-standing legal protections in pursuit of their stand-alone commercial interests.”

Stability AI, like all AI firms, has argued that AI training based on freely scraping images from the web is a “fair use” protected under copyright law.

So far, courts have not settled this debate, while many AI companies have urged judges and governments globally to settle it for the courts, for the sake of safeguarding national security and securing economic prosperity by winning the AI race. According to AI companies, paying artists to train on their works threatens to slow innovation, while rivals in China—who aren’t bound by US copyright law—continue scraping the web to advance their models.

Peters called out Stability AI for adopting this stance, arguing that rightsholders shouldn’t have to spend millions fighting against a claim that paying out licensing fees would “kill innovation.” Some critics have likened AI firms’ argument to a defense of forced labor, suggesting the US would never value “innovation” about human rights, and the same logic should follow for artists’ rights.

“We’re battling a world of rhetoric,” Peters said, alleging that these firms “are taking copyrighted material to develop their powerful AI models under the guise of innovation and then ‘just turning those services right back on existing commercial markets.'”

To Peters, that’s simply “disruption under the notion of ‘move fast and break things,’” and Getty believes “that’s unfair competition.”

“We’re not against competition,” Peters said. “There’s constant new competition coming in all the time from new technologies or just new companies. But that [AI scraping] is just unfair competition, that’s theft.”

Broader Internet backlash over AI firms’ rhetoric

Peters’ comments come after a former Meta head of global affairs, Nick Clegg, received Internet backlash this week after making the same claim that AI firms raise time and again: that asking artists for consent for AI training would “kill” the AI industry, The Verge reported.

According to Clegg, the only viable solution to the tension between artists and AI companies would be to give artists ways to opt out of training, which Stability AI notably started doing in 2022.

“Quite a lot of voices say, ‘You can only train on my content, [if you] first ask,'” Clegg reportedly said. “And I have to say that strikes me as somewhat implausible because these systems train on vast amounts of data.”

On X, the CEO of Fairly Trained—a nonprofit that supports artists’ fight against nonconsensual AI training—Ed Newton-Rex (who is also a former Stability AI vice president of audio) pushed back on Clegg’s claim in a post viewed by thousands.

“Nick Clegg is wrong to say artists’ demands on AI & copyright are unworkable,” Newton-Rex said. “Every argument he makes could equally have been made about Napster:” First, that “the tech is out there,” second that “licensing takes time,” and third that, “we can’t control what other countries do.” If Napster’s operations weren’t legal, neither should AI firms’ training, Newton-Rex said, writing, “These are not reasons not to uphold the law and treat creators fairly.”

Other social media users mocked Clegg with jokes meant to destroy AI firms’ favorite go-to argument against copyright claims.

“Blackbeard says asking sailors for permission to board and loot their ships would ‘kill’ the piracy on the high seas industry,” an X user with the handle “Seanchuckle” wrote.

On Bluesky, a trial lawyer, Max Kennerly, effectively satirized Clegg and the whole AI industry by writing, “Our product creates such little value that it is simply not viable in the marketplace, not even as a niche product. Therefore, we must be allowed to unilaterally extract value from the work of others and convert that value into our profits.”

Other ways to fight

Getty plans to continue fighting against the AI firms that are impressing this “world of rhetoric” on judges and lawmakers, but court battles will likely remain few and far between due to the price tag, Peters has suggested.

There are other ways to fight, though. In a submission last month, Getty pushed the Trump administration to reject “those seeking to weaken US copyright protections by creating a ‘right to learn’ exemption” for AI firms when building Trump’s AI Action Plan.

“US copyright laws are not obstructing the path to continued AI progress,” Getty wrote. “Instead, US copyright laws are a path to sustainable AI and a path that broadens society’s participation in AI’s economic benefits, which reduces downstream economic burdens on the Federal, State and local governments. US copyright laws provide incentives to invest and create.”

In Getty’s submission, the media company emphasized that requiring consent for AI training is not an “overly restrictive” control on AI’s development such as those sought by stauncher critics “that could harm US competitiveness, national security or societal advances such as curing cancer.” And Getty claimed it also wasn’t “requesting protection from existing and new sources of competition,” despite the lawsuit’s suggestion that Stability AI and other image generators threaten to replace Getty’s image library in the market.

What Getty said it hopes Trump’s AI plan will ensure is a world where the rights and opportunities of rightsholders are not “usurped for the commercial benefits” of AI companies.

In 2023, when Getty was first suing Stability AI, Peters suggested that, otherwise, allowing AI firms to widely avoid paying artists would create “a sad world,” perhaps disincentivizing creativity.

Ashley is a senior policy reporter for Ars Technica, dedicated to tracking social impacts of emerging policies and new technologies. She is a Chicago-based journalist with 20 years of experience.

It’s too expensive to fight every AI copyright battle, Getty CEO says Read More »

Copyright Office head fired after reporting AI training isn’t always fair use

AI, Artificial Intelligence, chatbots, copyright, Donald Trump, fair use, generative ai, openai, Policy, US Copyright Office / Mike M. / May 13, 2025

Cops scuffle with Trump picks at Copyright Office after AI report stuns tech industry.

A man holds a flag that reads “Shame” outside the Library of Congress on May 12, 2025 in Washington, DC. On May 8th, President Donald Trump fired Carla Hayden, the head of the Library of Congress, and Shira Perlmutter, the head of the US Copyright Office, just days after. Credit: Kayla Bartkowski / Staff | Getty Images News

A day after the US Copyright Office dropped a bombshell pre-publication report challenging artificial intelligence firms’ argument that all AI training should be considered fair use, the Trump administration fired the head of the Copyright Office, Shira Perlmutter—sparking speculation that the controversial report hastened her removal.

Tensions have apparently only escalated since. Now, as industry advocates decry the report as overstepping the office’s authority, social media posts on Monday described an apparent standoff at the Copyright Office between Capitol Police and men rumored to be with Elon Musk’s Department of Government Efficiency (DOGE).

A source familiar with the matter told Wired that the men were actually “Brian Nieves, who claimed he was the new deputy librarian, and Paul Perkins, who said he was the new acting director of the Copyright Office, as well as acting Registrar,” but it remains “unclear whether the men accurately identified themselves.” A spokesperson for the Capitol Police told Wired that no one was escorted off the premises or denied entry to the office.

Perlmutter’s firing followed Donald Trump’s removal of Librarian of Congress Carla Hayden, who, NPR noted, was the first African American to hold the post. Responding to public backlash, White House Press Secretary Karoline Leavitt claimed that the firing was due to “quite concerning things that she had done at the Library of Congress in the pursuit of DEI and putting inappropriate books in the library for children.”

The Library of Congress houses the Copyright Office, and critics suggested Trump’s firings were unacceptable intrusions into cultural institutions that are supposed to operate independently of the executive branch. In a statement, Rep. Joe Morelle (D.-N.Y.) condemned Perlmutter’s removal as “a brazen, unprecedented power grab with no legal basis.”

Accusing Trump of trampling Congress’ authority, he suggested that Musk and other tech leaders racing to dominate the AI industry stood to directly benefit from Trump’s meddling at the Copyright Office. Likely most threatening to tech firms, the guidance from Perlmutter’s Office not only suggested that AI training on copyrighted works may not be fair use when outputs threaten to disrupt creative markets—as publishers and authors have argued in several lawsuits aimed at the biggest AI firms—but also encouraged more licensing to compensate creators.

“It is surely no coincidence [Trump] acted less than a day after she refused to rubber-stamp Elon Musk’s efforts to mine troves of copyrighted works to train AI models,” Morelle said, seemingly referencing Musk’s xAI chatbot, Grok.

Agreeing with Morelle, Courtney Radsch—the director of the Center for Journalism & Liberty at the left-leaning think tank the Open Markets Institute—said in a statement provided to Ars that Perlmutter’s firing “appears directly linked to her office’s new AI report questioning unlimited harvesting of copyrighted materials.”

“This unprecedented executive intrusion into the Library of Congress comes directly after Perlmutter released a copyright report challenging the tech elite’s fundamental claim: unlimited access to creators’ work without permission or compensation,” Radsch said. And it comes “after months of lobbying by the corporate billionaires” who “donated” millions to Trump’s inauguration and “have lapped up the largess of government subsidies as they pursue AI dominance.”

What the Copyright Office says about fair use

The report that the Copyright Office released on Friday is not finalized but is not expected to change radically, unless Trump’s new acting head potentially intervenes to overhaul the guidance.

It comes after the Copyright Office parsed more than 10,000 comments debating whether creators should and could feasibly be compensated for the use of their works in AI training.

“The stakes are high,” the office acknowledged, but ultimately, there must be an effective balance struck between the public interests in “maintaining a thriving creative community” and “allowing technological innovation to flourish.” Notably, the office concluded that the first and fourth factors of fair use—which assess the character of the use (and whether it is transformative) and how that use affects the market—are likely to hold the most weight in court.

According to Radsch, the report “raised crucial points that the tech elite don’t want acknowledged.” First, the Copyright Office acknowledged that it’s an open question how much data an AI developer needs to build an effective model. Then, they noted that there’s a need for a consent framework beyond putting the onus on creators to opt their works out of AI training, and perhaps most alarmingly, they concluded that “AI trained on copyrighted works could replace original creators in the marketplace.”

“Commenters painted a dire picture of what unlicensed training would mean for artists’ livelihoods,” the Copyright Office said, while industry advocates argued that giving artists the power to hamper or “kill” AI development could result in “far less competition, far less innovation, and very likely the loss of the United States’ position as the leader in global AI development.”

To prevent both harms, the Copyright Office expects that some AI training will be deemed fair use, such as training viewed as transformative, because resulting models don’t compete with creative works. Those uses threaten no market harm but rather solve a societal need, such as language models translating texts, moderating content, or correcting grammar. Or in the case of audio models, technology that helps producers clean up unwanted distortion might be fair use, where models that generate songs in the style of popular artists might not, the office opined.

But while “training a generative AI foundation model on a large and diverse dataset will often be transformative,” the office said that “not every transformative use is a fair one,” especially if the AI model’s function performs the same purpose as the copyrighted works they were trained on. Consider an example like chatbots regurgitating news articles, as is alleged in The New York Times’ dispute with OpenAI over ChatGPT.

“In such cases, unless the original work itself is being targeted for comment or parody, it is hard to see the use as transformative,” the Copyright Office said. One possible solution for AI firms hoping to preserve utility of their chatbots could be effective filters that “prevent the generation of infringing content,” though.

Tech industry accuses Copyright Office of overreach

Only courts can effectively weigh the balance of fair use, the Copyright Office said. Perhaps importantly, however, the thinking of one of the first judges to weigh the question—in a case challenging Meta’s torrenting of a pirated books dataset to train its AI models—seemed to align with the Copyright Office guidance at a recent hearing. Mulling whether Meta infringed on book authors’ rights, US District Judge Vince Chhabria explained why he doesn’t immediately “understand how that can be fair use.”

“You have companies using copyright-protected material to create a product that is capable of producing an infinite number of competing products,” Chhabria said. “You are dramatically changing, you might even say obliterating, the market for that person’s work, and you’re saying that you don’t even have to pay a license to that person.”

Some AI critics think the courts have already indicated which way they are leaning. In a statement to Ars, a New York Times spokesperson suggested that “both the Copyright Office and courts have recognized what should be obvious: when generative AI products give users outputs that compete with the original works on which they were trained, that unprecedented theft of millions of copyrighted works by developers for their own commercial benefit is not fair use.”

The NYT spokesperson further praised the Copyright Office for agreeing that using Retrieval-Augmented Generation (RAG) AI to surface copyrighted content “is less likely to be transformative where the purpose is to generate outputs that summarize or provide abridged versions of retrieved copyrighted works, such as news articles, as opposed to hyperlinks.” If courts agreed on the RAG finding, that could potentially disrupt AI search models from every major tech company.

The backlash from industry stakeholders was immediate.

The president and CEO of a trade association called the Computer & Communications Industry Association, Matt Schruers, said the report raised several concerns, particularly by endorsing “an expansive theory of market harm for fair use purposes that would allow rightsholders to block any use that might have a general effect on the market for copyrighted works, even if it doesn’t impact the rightsholder themself.”

Similarly, the tech industry policy coalition Chamber of Progress warned that “the report does not go far enough to support innovation and unnecessarily muddies the waters on what should be clear cases of transformative use with copyrighted works.” Both groups celebrated the fact that the final decision on fair use would rest with courts.

The Copyright Office agreed that “it is not possible to prejudge the result in any particular case” but said that precedent supports some “general observations.” Those included suggesting that licensing deals may be appropriate where uses are not considered fair without disrupting “American leadership” in AI, as some AI firms have claimed.

“These groundbreaking technologies should benefit both the innovators who design them and the creators whose content fuels them, as well as the general public,” the report said, ending with the office promising to continue working with Congress to inform AI laws.

Copyright Office seemingly opposes Meta’s torrenting

Also among those “general observations,” the Copyright Office wrote that “making commercial use of vast troves of copyrighted works to produce expressive content that competes with them in existing markets, especially where this is accomplished through illegal access, goes beyond established fair use boundaries.”

The report seemed to suggest that courts and the Copyright Office may also be aligned on AI firms’ use of pirated or illegally accessed paywalled content for AI training.

Judge Chhabria only considered Meta’s torrenting in the book authors’ case to be “kind of messed up,” prioritizing the fair use question, and the Copyright Office similarly only recommended that “the knowing use of a dataset that consists of pirated or illegally accessed works should weigh against fair use without being determinative.”

However, torrenting should be a black mark, the Copyright Office suggested. “Gaining unlawful access” does bear “on the character of the use,” the office noted, arguing that “training on pirated or illegally accessed material goes a step further” than simply using copyrighted works “despite the owners’ denial of permission.” Perhaps if authors can prove that AI models trained on pirated works led to lost sales, the office suggested that a fair use defense might not fly.

“The use of pirated collections of copyrighted works to build a training library, or the distribution of such a library to the public, would harm the market for access to those Works,” the office wrote. “And where training enables a model to output verbatim or substantially similar copies of the works trained on, and those copies are readily accessible by end users, they can substitute for sales of those works.”

Likely frustrating Meta—which is currently fighting to keep leeching evidence out of the book authors’ case—the Copyright Office suggested that “the copying of expressive works from pirate sources in order to generate unrestricted content that competes in the marketplace, when licensing is reasonably available, is unlikely to qualify as fair use.”

Ashley is a senior policy reporter for Ars Technica, dedicated to tracking social impacts of emerging policies and new technologies. She is a Chicago-based journalist with 20 years of experience.

Judge on Meta’s AI training: “I just don’t understand how that can be fair use”

AI training, copyright, copyright infringement, fair use, leeching, LLaMA, Meta, piracy, Policy, torrenting / Kris Guyer / May 3, 2025

Judge downplayed Meta’s “messed up” torrenting in lawsuit over AI training.

A judge who may be the first to rule on whether AI training data is fair use appeared skeptical Thursday at a hearing where Meta faced off with book authors over the social media company’s alleged copyright infringement.

Meta, like most AI companies, holds that training must be deemed fair use, or else the entire AI industry could face immense setbacks, wasting precious time negotiating data contracts while falling behind global rivals. Meta urged the court to rule that AI training is a transformative use that only references books to create an entirely new work that doesn’t replicate authors’ ideas or replace books in their markets.

At the hearing that followed after both sides requested summary judgment, however, Judge Vince Chhabria pushed back on Meta attorneys arguing that the company’s Llama AI models posed no threat to authors in their markets, Reuters reported.

Declaring, “I just don’t understand how that can be fair use,” the shrewd judge apparently stoked little response from Meta’s attorney, Kannon Shanmugam, apart from a suggestion that any alleged threat to authors’ livelihoods was “just speculation,” Wired reported.

Authors may need to sharpen their case, which Chhabria warned could be “taken away by fair use” if none of the authors suing, including Sarah Silverman, Ta-Nehisi Coates, and Richard Kadrey, can show “that the market for their actual copyrighted work is going to be dramatically affected.”

Determined to probe this key question, Chhabria pushed authors’ attorney, David Boies, to point to specific evidence of market harms that seemed noticeably missing from the record.

“It seems like you’re asking me to speculate that the market for Sarah Silverman’s memoir will be affected by the billions of things that Llama will ultimately be capable of producing,” Chhabria said. “And it’s just not obvious to me that that’s the case.”

But if authors can prove fears of market harms are real, Meta might struggle to win over Chhabria, and that could set a precedent impacting copyright cases challenging AI training on other kinds of content.

The judge repeatedly appeared to be sympathetic to authors, suggesting that Meta’s AI training may be a “highly unusual case” where even though “the copying is for a highly transformative purpose, the copying has the high likelihood of leading to the flooding of the markets for the copyrighted works.”

And when Shanmugam argued that copyright law doesn’t shield authors from “protection from competition in the marketplace of ideas,” Chhabria resisted the framing that authors weren’t potentially being robbed, Reuters reported.

“But if I’m going to steal things from the marketplace of ideas in order to develop my own ideas, that’s copyright infringement, right?” Chhabria responded.

Wired noted that he asked Meta’s lawyers, “What about the next Taylor Swift?” If AI made it easy to knock off a young singer’s sound, how could she ever compete if AI produced “a billion pop songs” in her style?

In a statement, Meta’s spokesperson reiterated the company’s defense that AI training is fair use.

“Meta has developed transformational open source AI models that are powering incredible innovation, productivity, and creativity for individuals and companies,” Meta’s spokesperson said. “Fair use of copyrighted materials is vital to this. We disagree with Plaintiffs’ assertions, and the full record tells a different story. We will continue to vigorously defend ourselves and to protect the development of GenAI for the benefit of all.”

Meta’s torrenting seems “messed up”

Some have pondered why Chhabria appeared so focused on market harms, instead of hammering Meta for admittedly illegally pirating books that it used for its AI training, which seems to be obvious copyright infringement. According to Wired, “Chhabria spoke emphatically about his belief that the big question is whether Meta’s AI tools will hurt book sales and otherwise cause the authors to lose money,” not whether Meta’s torrenting of books was illegal.

The torrenting “seems kind of messed up,” Chhabria said, but “the question, as the courts tell us over and over again, is not whether something is messed up but whether it’s copyright infringement.”

It’s possible that Chhabria dodged the question for procedural reasons. In a court filing, Meta argued that authors had moved for summary judgment on Meta’s alleged copying of their works, not on “unsubstantiated allegations that Meta distributed Plaintiffs’ works via torrent.”

In the court filing, Meta alleged that even if Chhabria agreed that the authors’ request for “summary judgment is warranted on the basis of Meta’s distribution, as well as Meta’s copying,” that the authors “lack evidence to show that Meta distributed any of their works.”

According to Meta, authors abandoned any claims that Meta’s seeding of the torrented files served to distribute works, leaving only claims about Meta’s leeching. Meta argued that the authors “admittedly lack evidence that Meta ever uploaded any of their works, or any identifiable part of those works, during the so-called ‘leeching’ phase,” relying instead on expert estimates based on how torrenting works.

It’s also possible that for Chhabria, the torrenting question seemed like an unnecessary distraction. Former Meta attorney Mark Lumley, who quit the case earlier this year, told Vanity Fair that the torrenting was “one of those things that sounds bad but actually shouldn’t matter at all in the law. Fair use is always about uses the plaintiff doesn’t approve of; that’s why there is a lawsuit.”

Lumley suggested that court cases mulling fair use at this current moment should focus on the outputs, rather than the training. Citing the ruling in a case where Google Books scanning books to share excerpts was deemed fair use, Lumley argued that “all search engines crawl the full Internet, including plenty of pirated content,” so there’s seemingly no reason to stop AI crawling.

But the Copyright Alliance, a nonprofit, non-partisan group supporting the authors in the case, in a court filing alleged that Meta, in its bid to get AI products viewed as transformative, is aiming to do the opposite. “When describing the purpose of generative AI,” Meta allegedly strives to convince the court to “isolate the ‘training’ process and ignore the output of generative AI,” because that’s seemingly the only way that Meta can convince the court that AI outputs serve “a manifestly different purpose from Plaintiffs’ books,” the Copyright Alliance argued.

“Meta’s motion ignores what comes after the initial ‘training’—most notably the generation of output that serves the same purpose of the ingested works,” the Copyright Alliance argued. And the torrenting question should matter, the group argued, because unlike in Google Books, Meta’s AI models are apparently training on pirated works, not “legitimate copies of books.”

Chhabria will not be making a snap decision in the case, planning to take his time and likely stressing not just Meta, but every AI company defending training as fair use the longer he delays. Understanding that the entire AI industry potentially has a stake in the ruling, Chhabria apparently sought to relieve some tension at the end of the hearing with a joke, Wired reported.

“I will issue a ruling later today,” Chhabria said. “Just kidding! I will take a lot longer to think about it.”

Ashley is a senior policy reporter for Ars Technica, dedicated to tracking social impacts of emerging policies and new technologies. She is a Chicago-based journalist with 20 years of experience.

Judge on Meta’s AI training: “I just don’t understand how that can be fair use” Read More »

Music labels will regret coming for the Internet Archive, sound historian says

But David Seubert, who manages sound collections at the University of California, Santa Barbara library, told Ars that he frequently used the project as an archive and not just to listen to the recordings.

For Seubert, the videos that IA records of the 78 RPM albums capture more than audio of a certain era. Researchers like him want to look at the label, check out the copyright information, and note the catalogue numbers, he said.

“It has all this information there,” Seubert said. “I don’t even necessarily need to hear it,” he continued, adding, “just seeing the physicality of it, it’s like, ‘Okay, now I know more about this record.'”

Music publishers suing IA argue that all the songs included in their dispute—and likely many more, since the Great 78 Project spans 400,000 recordings—”are already available for streaming or downloading from numerous services.”

“These recordings face no danger of being lost, forgotten, or destroyed,” their filing claimed.

But Nathan Georgitis, the executive director of the Association for Recorded Sound Collections (ARSC), told Ars that you just don’t see 78 RPM records out in the world anymore. Even in record stores selling used vinyl, these recordings will be hidden “in a few boxes under the table behind the tablecloth,” Georgitis suggested. And in “many” cases, “the problem for libraries and archives is that those recordings aren’t necessarily commercially available for re-release.”

That “means that those recordings, those artists, the repertoire, the recorded sound history in itself—meaning the labels, the producers, the printings—all of that history kind of gets obscured from view,” Georgitis said.

Currently, libraries trying to preserve this history must control access to audio collections, Georgitis said. He sees IA’s work with the Great 78 Project as a legitimate archive in that, unlike a streaming service, where content may be inconsistently available, IA’s “mission is to preserve and provide access to content over time.”

Music labels will regret coming for the Internet Archive, sound historian says Read More »

OpenAI blamed NYT for tech problem erasing evidence of copyright abuse

It’s not “lost,” just “inadvertently removed”

OpenAI denies deleting evidence, asks why NYT didn’t back up data.

OpenAI keeps deleting data that could allegedly prove the AI company violated copyright laws by training ChatGPT on authors’ works. Apparently largely unintentional, the sloppy practice is seemingly dragging out early court battles that could determine whether AI training is fair use.

Most recently, The New York Times accused OpenAI of unintentionally erasing programs and search results that the newspaper believed could be used as evidence of copyright abuse.

The NYT apparently spent more than 150 hours extracting training data, while following a model inspection protocol that OpenAI set up precisely to avoid conducting potentially damning searches of its own database. This process began in October, but by mid-November, the NYT discovered that some of the data gathered had been erased due to what OpenAI called a “glitch.”

Looking to update the court about potential delays in discovery, the NYT asked OpenAI to collaborate on a joint filing admitting the deletion occurred. But OpenAI declined, instead filing a separate response calling the newspaper’s accusation that evidence was deleted “exaggerated” and blaming the NYT for the technical problem that triggered the data deleting.

OpenAI denied deleting “any evidence,” instead admitting only that file-system information was “inadvertently removed” after the NYT requested a change that resulted in “self-inflicted wounds.” According to OpenAI, the tech problem emerged because NYT was hoping to speed up its searches and requested a change to the model inspection set-up that OpenAI warned “would yield no speed improvements and might even hinder performance.”

The AI company accused the NYT of negligence during discovery, “repeatedly running flawed code” while conducting searches of URLs and phrases from various newspaper articles and failing to back up their data. Allegedly the change that NYT requested “resulted in removing the folder structure and some file names on one hard drive,” which “was supposed to be used as a temporary cache for storing OpenAI data, but evidently was also used by Plaintiffs to save some of their search results (apparently without any backups).”

Once OpenAI figured out what happened, data was restored, OpenAI said. But the NYT alleged that the only data that OpenAI could recover did “not include the original folder structure and original file names” and therefore “is unreliable and cannot be used to determine where the News Plaintiffs’ copied articles were used to build Defendants’ models.”

In response, OpenAI suggested that the NYT could simply take a few days and re-run the searches, insisting, “contrary to Plaintiffs’ insinuations, there is no reason to think that the contents of any files were lost.” But the NYT does not seem happy about having to retread any part of model inspection, continually frustrated by OpenAI’s expectation that plaintiffs must come up with search terms when OpenAI understands its models best.

OpenAI claimed that it has consulted on search terms and been “forced to pour enormous resources” into supporting the NYT’s model inspection efforts while continuing to avoid saying how much it’s costing. Previously, the NYT accused OpenAI of seeking to profit off these searches, attempting to charge retail prices instead of being transparent about actual costs.

Now, OpenAI appears to be more willing to conduct searches on behalf of NYT that it previously sought to avoid. In its filing, OpenAI asked the court to order news plaintiffs to “collaborate with OpenAI to develop a plan for reasonable, targeted searches to be executed either by Plaintiffs or OpenAI.”

How that might proceed will be discussed at a hearing on December 3. OpenAI said it was committed to preventing future technical issues and was “committed to resolving these issues efficiently and equitably.”

It’s not the first time OpenAI deleted data

This isn’t the only time that OpenAI has been called out for deleting data in a copyright case.

In May, book authors, including Sarah Silverman and Paul Tremblay, told a US district court in California that OpenAI admitted to deleting the controversial AI training data sets at issue in that litigation. Additionally, OpenAI admitted that “witnesses knowledgeable about the creation of these datasets have apparently left the company,” authors’ court filing said. Unlike the NYT, book authors seem to suggest that OpenAI’s deleting appeared potentially suspicious.

“OpenAI’s delay campaign continues,” the authors’ filing said, alleging that “evidence of what was contained in these datasets, how they were used, the circumstances of their deletion and the reasons for” the deletion “are all highly relevant.”

The judge in that case, Robert Illman, wrote that OpenAI’s dispute with authors has so far required too much judicial intervention, noting that both sides “are not exactly proceeding through the discovery process with the degree of collegiality and cooperation that might be optimal.” Wired noted similarly the NYT case is “not exactly a lovefest.”

As these cases proceed, plaintiffs in both cases are struggling to decide on search terms that will surface the evidence they seek. While the NYT case is bogged down by OpenAI seemingly refusing to conduct any searches yet on behalf of publishers, the book author case is differently being dragged out by authors failing to provide search terms. Only four of the 15 authors suing have sent search terms, as their deadline for discovery approaches on January 27, 2025.

NYT judge rejects key part of fair use defense

OpenAI’s defense primarily hinges on courts agreeing that copying authors’ works to train AI is a transformative fair use that benefits the public, but the judge in the NYT case, Ona Wang, rejected a key part of that fair use defense late last week.

To win their fair use argument, OpenAI was trying to modify a fair use factor regarding “the effect of the use upon the potential market for or value of the copyrighted work” by invoking a common argument that the factor should be modified to include the “public benefits the copying will likely produce.”

Part of this defense tactic sought to prove that the NYT’s journalism benefits from generative AI technologies like ChatGPT, with OpenAI hoping to topple NYT’s claim that ChatGPT posed an existential threat to its business. To that end, OpenAI sought documents showing that the NYT uses AI tools, creates its own AI tools, and generally supports the use of AI in journalism outside the court battle.

On Friday, however, Wang denied OpenAI’s motion to compel this kind of evidence. Wang deemed it irrelevant to the case despite OpenAI’s claims that if AI tools “benefit” the NYT’s journalism, that “benefit” would be relevant to OpenAI’s fair use defense.

“But the Supreme Court specifically states that a discussion of ‘public benefits’ must relate to the benefits from the copying,” Wang wrote in a footnote, not “whether the copyright holder has admitted that other uses of its copyrights may or may not constitute fair use, or whether the copyright holder has entered into business relationships with other entities in the defendant’s industry.”

This likely stunts OpenAI’s fair use defense by cutting off an area of discovery that OpenAI previously fought hard to pursue. It essentially leaves OpenAI to argue that its copying of NYT content specifically serves a public good, not the act of AI training generally.

In February, Ars forecasted that the NYT might have the upper hand in this case because the NYT already showed that sometimes ChatGPT would reproduce word-for-word snippets of articles. That will likely make it harder to convince the court that training ChatGPT by copying NYT articles is a transformative fair use, as Google Books famously did when copying books to create a searchable database.

For OpenAI, the strategy seems to be to erect as strong a fair use case as possible to defend its most popular release. And if the court sides with OpenAI on that question, it won’t really matter how much evidence the NYT surfaces during model inspection. But if the use is not seen as transformative and then the NYT can prove the copying harms its business—without benefiting the public—OpenAI could risk losing this important case when the verdict comes in 2025. And that could have implications for book authors’ suit as well as other litigation, expected to drag into 2026.

Ashley is a senior policy reporter for Ars Technica, dedicated to tracking social impacts of emerging policies and new technologies. She is a Chicago-based journalist with 20 years of experience.

OpenAI blamed NYT for tech problem erasing evidence of copyright abuse Read More »

Man tricks OpenAI’s voice bot into duet of The Beatles’ “Eleanor Rigby”

Advanced Voice Mode, AI, AI copyright, AI fair use, AI prompt injection, AJ Smith, audio synthesis, Biz & IT, copyright, Eleanor Rigby, fair use, large language models, machine learning, music synthesis, openai, Paul McCartney, prompt injection, The Beatles, voice synthesis / Mike M. / September 28, 2024

A screen capture of AJ Smith doing his Eleanor Rigby duet with OpenAI's Advanced Voice Mode through the ChatGPT app. — Enlarge / A screen capture of AJ Smith doing his Eleanor Rigby duet with OpenAI’s Advanced Voice Mode through the ChatGPT app.

OpenAI’s new Advanced Voice Mode (AVM) of its ChatGPT AI assistant rolled out to subscribers on Tuesday, and people are already finding novel ways to use it, even against OpenAI’s wishes. On Thursday, a software architect named AJ Smith tweeted a video of himself playing a duet of The Beatles’ 1966 song “Eleanor Rigby” with AVM. In the video, Smith plays the guitar and sings, with the AI voice interjecting and singing along sporadically, praising his rendition.

“Honestly, it was mind-blowing. The first time I did it, I wasn’t recording and literally got chills,” Smith told Ars Technica via text message. “I wasn’t even asking it to sing along.”

Smith is no stranger to AI topics. In his day job, he works as associate director of AI Engineering at S&P Global. “I use [AI] all the time and lead a team that uses AI day to day,” he told us.

In the video, AVM’s voice is a little quavery and not pitch-perfect, but it appears to know something about “Eleanor Rigby’s” melody when it first sings, “Ah, look at all the lonely people.” After that, it seems to be guessing at the melody and rhythm as it recites song lyrics. We have also convinced Advanced Voice Mode to sing, and it did a perfect melodic rendition of “Happy Birthday” after some coaxing.

AJ Smith’s video of singing a duet with OpenAI’s Advanced Voice Mode.

Normally, when you ask AVM to sing, it will reply something like, “My guidelines won’t let me talk about that.” That’s because in the chatbot’s initial instructions (called a “system prompt“), OpenAI instructs the voice assistant not to sing or make sound effects (“Do not sing or hum,” according to one system prompt leak).

OpenAI possibly added this restriction because AVM may otherwise reproduce copyrighted content, such as songs that were found in the training data used to create the AI model itself. That’s what is happening here to a limited extent, so in a sense, Smith has discovered a form of what researchers call a “prompt injection,” which is a way of convincing an AI model to produce outputs that go against its system instructions.

How did Smith do it? He figured out a game that reveals AVM knows more about music than it may let on in conversation. “I just said we’d play a game. I’d play the four pop chords and it would shout out songs for me to sing along with those chords,” Smith told us. “Which did work pretty well! But after a couple songs it started to sing along. Already it was such a unique experience, but that really took it to the next level.”

This is not the first time humans have played musical duets with computers. That type of research stretches back to the 1970s, although it was typically limited to reproducing musical notes or instrumental sounds. But this is the first time we’ve seen anyone duet with an audio-synthesizing voice chatbot in real time.

Man tricks OpenAI’s voice bot into duet of The Beatles’ “Eleanor Rigby” Read More »

Internet Archive’s e-book lending is not fair use, appeals court rules

The Internet Archive has lost its appeal after book publishers successfully sued to block the Open Libraries Project from lending digital scans of books for free online.

Judges for the Second Circuit Court of Appeals on Wednesday rejected the Internet Archive (IA) argument that its controlled digital lending—which allows only one person to borrow each scanned e-book at a time—was a transformative fair use that worked like a traditional library and did not violate copyright law.

As Judge Beth Robinson wrote in the decision, because the IA’s digital copies of books did not “provide criticism, commentary, or information about the originals” or alter the original books to add “something new,” the court concluded that the IA’s use of publishers’ books was not transformative, hobbling the organization’s fair use defense.

“IA’s digital books serve the same exact purpose as the originals: making authors’ works available to read,” Robinson said, emphasizing that although in copyright law, “[n]ot every instance will be clear cut,” “this one is.”

The appeals court ruling affirmed the lower court’s ruling, which permanently barred the IA from distributing not just the works in the suit, but all books “available for electronic licensing,” Robinson said.

“To construe IA’s use of the Works as transformative would significantly narrow―if not entirely eviscerate―copyright owners’ exclusive right to prepare (or not prepare) derivative works,” Robinson wrote.

Maria Pallante, president and CEO of the Association of American Publishers, the trade organization behind the lawsuit, celebrated the ruling. She said the court upheld “the rights of authors and publishers to license and be compensated for their books and other creative works and reminds us in no uncertain terms that infringement is both costly and antithetical to the public interest.”

“If there was any doubt, the Court makes clear that under fair use jurisprudence there is nothing transformative about converting entire works into new formats without permission or appropriating the value of derivative works that are a key part of the author’s copyright bundle,” Pallante said.

The Internet Archive’s director of library services, Chris Freeland, issued a statement on the loss, which comes after four years of fighting to maintain its Open Libraries Project.

“We are disappointed in today’s opinion about the Internet Archive’s digital lending of books that are available electronically elsewhere,” Freeland said. “We are reviewing the court’s opinion and will continue to defend the rights of libraries to own, lend, and preserve books.”

IA’s lending harmed publishers, judge says

The court’s fair use analysis didn’t solely hinge on whether IA’s digital lending of e-books was “transformative.” Judges also had to consider book publishers’ claims that IA was profiting off e-book lending, in addition to factoring in whether each work was original, what amount of each work was being copied, and whether the IA’s e-books substituted original works, depriving authors of revenue in relevant markets.

Ultimately, for each factor, judges ruled in favor of publishers, which argued that granting IA was threatening to “‘destroy the value of [their] exclusive right to prepare derivative works,’ including the right to publish their authors’ works as e-books.”

While the IA tried to argue that book publishers’ surging profits suggested that its digital lending caused no market harms, Robinson disagreed with the IA’s experts’ “ill-supported” market analysis and took issue with IA advertising “its digital books as a free alternative to Publishers’ print and e-books.”

“IA offers effectively the same product as Publishers―full copies of the Works―but at no cost to consumers or libraries,” Robinson wrote. “At least in this context, it is difficult to compete with free.”

Robinson wrote that despite book publishers showing no proof of market harms, that lack of evidence did not support IA’s case, ruling that IA did not satisfy its burden to prove it had not harmed publishers. She further wrote that it’s common sense to agree with publishers’ characterization of harms because “IA’s digital books compete directly with Publishers’ e-books” and would deprive authors of revenue if left unchecked.

“We agree with Publishers’ assessment of market harm” and “are likewise convinced” that “unrestricted and widespread conduct of the sort engaged in by [IA] would result in a substantially adverse impact on the potential market” for publishers’ e-books, Robinson wrote. “Though Publishers have not provided empirical data to support this observation, we routinely rely on such logical inferences where appropriate” when determining fair use.

Judges did, however, side with IA on the matter of whether the nonprofit was profiting off loaning e-books for free, contradicting the lower court. The appeals court disagreed with book publishers’ claims that IA profited off e-books by soliciting donations or earning a small percentage from used books sold through referral links on its site.

“Of course, IA must solicit some funds to keep the lights on,” Robinson wrote. But “IA does not profit directly from its Free Digital Library,” and it would be “misleading” to characterize it that way.

“To hold otherwise would greatly restrain the ability of nonprofits to seek donations while making fair use of copyrighted works,” Robinson wrote.

Internet Archive’s e-book lending is not fair use, appeals court rules Read More »

Parody site ClownStrike refused to bow to CrowdStrike’s bogus DMCA takedown

Crowdstrike, digital millenium copyright act, DMCA, fair use, parody, parody site, Policy, trademark infringement / Mike M. / August 6, 2024

Doesn’t CrowdStrike have more important things to do right now than try to take down a parody site?

That’s what IT consultant David Senk wondered when CrowdStrike sent a Digital Millennium Copyright Act (DMCA) takedown notice targeting his parody site ClownStrike.

Senk created ClownStrike in the aftermath of the largest IT outage the world has ever seen—which CrowdStrike blamed on a buggy security update that shut down systems and incited prolonged chaos in airports, hospitals, and businesses worldwide.

Although Senk wasn’t personally impacted by the outage, he told Ars he is “a proponent of decentralization.” He seized the opportunity to mock “CrowdStrike’s ability to cause literal billions of dollars of damage” because he viewed this as “collateral from the incredible amount of ‘centralization’ in the tech industry.”

Setting up the parody site at clownstrike.lol on July 24, Senk’s site design is simple. It shows the CrowdStrike logo fading into a cartoon clown, with circus music blasting throughout the transition. For the first 48 hours of its existence, the site used an unaltered version of CrowdStrike’s Falcon logo, which is used for its cybersecurity platform, but Senk later added a rainbow propeller hat to the falcon’s head.

“I put the site up initially just to be silly,” Senk told Ars, noting that he’s a bit “old-school” and has “always loved parody sites” (like this one).

It was all fun and games, but on July 31, Senk received a DMCA notice from Cloudflare’s trust and safety team, which was then hosting the parody site. The notice informed Senk that CSC Digital Brand Services’ global anti-fraud team, on behalf of CrowdStrike, was requesting the immediate removal of the CrowdStrike logo from the parody site, or else Senk risked Cloudflare taking down the whole site.

Senk immediately felt the takedown was bogus. His site was obviously parody, which he felt should have made his use of the CrowdStrike logos—altered or not—fair use. He immediately responded to Cloudflare to contest the notice, but Cloudflare did not respond to or even acknowledge receipt of his counter notice. Instead, Cloudflare sent a second email warning Senk of the alleged infringement, but once again, Cloudflare failed to respond to his counter notice.

This left Senk little choice but to relocate his parody site to “somewhere less-susceptible to DMCA takedown requests,” Senk told Ars, which ended up being a Hetzner server in Finland.

Currently on the ClownStrike site, when you click a CSC logo altered with a clown wig, you can find Senk venting about “corporate cyberbullies” taking down “content that they disagree with” and calling Cloudflare’s counter notice system “hilariously ineffective.”

“The DMCA requires service providers to ‘act expeditiously to remove or disable access to the infringing material,’ yet it gives those same ‘service providers’ 14 days to restore access in the event of a counternotice!” Senk complained. “The DMCA, like much American legislation, is heavily biased towards corporations instead of the actual living, breathing citizens of the country.”

Reached for comment, CrowdStrike declined to comment on ClownStrike’s takedown directly. But it seems like the takedown notice probably never should have been sent to Senk. His parody site likely got swept up in CrowdStrike’s anti-fraud efforts to stop bad actors attempting to take advantage of the global IT outage by deceptively using CrowdStrike’s logo on malicious sites.

“As part of our proactive fraud management activities, CrowdStrike’s anti-fraud partners have issued more than 500 takedown notices in the last two weeks to help prevent bad actors from exploiting current events,” CrowdStrike’s statement said. “These actions are taken to help protect customers and the industry from phishing sites and malicious activity. While parody sites are not the intended target of these efforts, it’s possible for such sites to be inadvertently impacted. We will review the process and, where appropriate, evolve ongoing anti-fraud activities.”

Parody site ClownStrike refused to bow to CrowdStrike’s bogus DMCA takedown Read More »

Appeals court seems lost on how Internet Archive harms publishers

copyright law, ebook licensing, ebooks, fair use, Internet Archive, libraries, Open Library, Policy / Kris Guyer / June 29, 2024

Deciding “the future of books” —

Appeals court decision potentially reversing publishers’ suit may come this fall.

Ashley Belanger – Jun 28, 2024 8: 25 pm UTC

The Internet Archive (IA) went before a three-judge panel Friday to defend its open library’s controlled digital lending (CDL) practices after book publishers last year won a lawsuit claiming that the archive’s lending violated copyright law.

In the weeks ahead of IA’s efforts to appeal that ruling, IA was forced to remove 500,000 books from its collection, shocking users. In an open letter to publishers, more than 30,000 readers, researchers, and authors begged for access to the books to be restored in the open library, claiming the takedowns dealt “a serious blow to lower-income families, people with disabilities, rural communities, and LGBTQ+ people, among many others,” who may not have access to a local library or feel “safe accessing the information they need in public.”

During a press briefing following arguments in court Friday, IA founder Brewster Kahle said that “those voices weren’t being heard.” Judges appeared primarily focused on understanding how IA’s digital lending potentially hurts publishers’ profits in the ebook licensing market, rather than on how publishers’ costly ebook licensing potentially harms readers.

However, lawyers representing IA—Joseph C. Gratz, from the law firm Morrison Foerster, and Corynne McSherry, from the nonprofit Electronic Frontier Foundation—confirmed that judges were highly engaged by IA’s defense. Arguments that were initially scheduled to last only 20 minutes stretched on instead for an hour and a half. Ultimately, judges decided not to rule from the bench, with a decision expected in the coming months or potentially next year. McSherry said the judges’ engagement showed that the judges “get it” and won’t make the decision without careful consideration of both sides.

“They understand this is an important decision,” McSherry said. “They understand that there are real consequences here for real people. And they are taking their job very, very seriously. And I think that’s the best that we can hope for, really.”

On the other side, the Association of American Publishers (AAP), the trade organization behind the lawsuit, provided little insight into how the day went. When reached for comment, AAP simply said, “We thought it was a strong day in court, and we look forward to the opinion.”

Decision could come early fall

According to Gratz, most of the questions for IA focused on “how to think about the situation where a particular book is available” from the open library and also available as an ebook that a library can license. Judges said they did not know how to think about “a situation where the publishers just haven’t come forward with any data showing that this has an impact,” Gratz said.

One audience member at the press briefing noted that instead judges were floating hypotheticals, like “if every single person in the world made a copy of a hypothetical thing, could hypothetically this affect the publishers’ revenue.”

McSherry said this was a common tactic when judges must weigh the facts while knowing that their decision will set an important precedent. However, IA has shown evidence, Gratz said, that even if IA provided limitless loans of digitized physical copies, “CDL doesn’t cause any economic harm to publishers, or authors,” and “there was absolutely no evidence of any harm of that kind that the publishers were able to bring forward.”

McSherry said that IA pushed back on claims that IA behaves like “pirates” when digitally lending books, with critics sometimes comparing the open library to illegal file-sharing networks. Instead, McSherry said that CDL provides a path to “meet readers where they are,” allowing IA to loan books that it owns to one user at a time no matter where in the world they are located.

“It’s not unlawful for a library to lend a book it owns to one patron at a time,” Gratz said IA told the court. “And the advent of digital technology doesn’t change that result. That’s lawful. And that’s what librarians do.”

In the open letter, IA fans pointed out that many IA readers were “in underserved communities where access is limited” to quality library resources. Being suddenly cut off from accessing nearly half a million books has “far-reaching implications,” they argued, removing access to otherwise inaccessible “research materials and literature that support their learning and academic growth.”

IA has argued that because copyright law is intended to provide equal access to knowledge, copyright law is better served by allowing IA’s lending than by preventing it. They’re hoping the judges will decide that CDL is fair use, reversing the lower court’s decision and restoring access to books recently removed from the open library. But Gratz said there’s no telling yet when that decision will come.

“There is no deadline for them to make a decision,” Gratz said, but it “probably won’t happen until early fall” at the earliest. After that, whichever side loses will have an opportunity to appeal the case, which has already stretched on for four years, to the Supreme Court. Since neither side seems prepared to back down, the Supreme Court eventually weighing in seems inevitable.

McSherry seemed optimistic that the judges at least understood the stakes for IA readers, noting that fair use is “designed to ensure that copyright actually serves the public interest,” not publishers’. Should the court decide otherwise, McSherry warned, the court risks allowing “a few powerful publishers” to “hijack the future of books.”

When IA first appealed, Kahle put out a statement saying IA couldn’t walk away from “a fight to keep library books available for those seeking truth in the digital age.”

Appeals court seems lost on how Internet Archive harms publishers Read More »

Internet Archive forced to remove 500,000 books after publishers’ court win

controlled digital lending, copyright law, fair use, Features, internat archive, Open Library, Policy, Uncategorized / Kris Guyer / June 22, 2024

As a result of book publishers successfully suing the Internet Archive (IA) last year, the free online library that strives to keep growing online access to books recently shrank by about 500,000 titles.

IA reported in a blog post this month that publishers abruptly forcing these takedowns triggered a “devastating loss” for readers who depend on IA to access books that are otherwise impossible or difficult to access.

To restore access, IA is now appealing, hoping to reverse the prior court’s decision by convincing the US Court of Appeals in the Second Circuit that IA’s controlled digital lending of its physical books should be considered fair use under copyright law. An April court filing shows that IA intends to argue that the publishers have no evidence that the e-book market has been harmed by the open library’s lending, and copyright law is better served by allowing IA’s lending than by preventing it.

“We use industry-standard technology to prevent our books from being downloaded and redistributed—the same technology used by corporate publishers,” Chris Freeland, IA’s director of library services, wrote in the blog. “But the publishers suing our library say we shouldn’t be allowed to lend the books we own. They have forced us to remove more than half a million books from our library, and that’s why we are appealing.”

IA will have an opportunity to defend its practices when oral arguments start in its appeal on June 28.

“Our position is straightforward; we just want to let our library patrons borrow and read the books we own, like any other library,” Freeland wrote, while arguing that the “potential repercussions of this lawsuit extend far beyond the Internet Archive” and publishers should just “let readers read.”

“This is a fight for the preservation of all libraries and the fundamental right to access information, a cornerstone of any democratic society,” Freeland wrote. “We believe in the right of authors to benefit from their work; and we believe that libraries must be permitted to fulfill their mission of providing access to knowledge, regardless of whether it takes physical or digital form. Doing so upholds the principle that knowledge should be equally and equitably accessible to everyone, regardless of where they live or where they learn.”

Internet Archive fans beg publishers to end takedowns

After publishers won an injunction stopping IA’s digital lending, which “limits what we can do with our digitized books,” IA’s help page said, the open library started shrinking. While “removed books are still available to patrons with print disabilities,” everyone else has been cut off, causing many books in IA’s collection to show up as “Borrow Unavailable.”

Ever since, IA has been “inundated” with inquiries from readers all over the world searching for the removed books, Freeland said. And “we get tagged in social media every day where people are like, ‘why are there so many books gone from our library’?” Freeland told Ars.

In an open letter to publishers signed by nearly 19,000 supporters, IA fans begged publishers to reconsider forcing takedowns and quickly restore access to the lost books.

Among the “far-reaching implications” of the takedowns, IA fans counted the negative educational impact of academics, students, and educators—”particularly in underserved communities where access is limited—who were suddenly cut off from “research materials and literature that support their learning and academic growth.”

They also argued that the takedowns dealt “a serious blow to lower-income families, people with disabilities, rural communities, and LGBTQ+ people, among many others,” who may not have access to a local library or feel “safe accessing the information they need in public.”

“Your removal of these books impedes academic progress and innovation, as well as imperiling the preservation of our cultural and historical knowledge,” the letter said.

“This isn’t happening in the abstract,” Freeland told Ars. “This is real. People no longer have access to a half a million books.”

Internet Archive forced to remove 500,000 books after publishers’ court win Read More »