copyright

openai-blamed-nyt-for-tech-problem-erasing-evidence-of-copyright-abuse

OpenAI blamed NYT for tech problem erasing evidence of copyright abuse


It’s not “lost,” just “inadvertently removed”

OpenAI denies deleting evidence, asks why NYT didn’t back up data.

OpenAI keeps deleting data that could allegedly prove the AI company violated copyright laws by training ChatGPT on authors’ works. Apparently largely unintentional, the sloppy practice is seemingly dragging out early court battles that could determine whether AI training is fair use.

Most recently, The New York Times accused OpenAI of unintentionally erasing programs and search results that the newspaper believed could be used as evidence of copyright abuse.

The NYT apparently spent more than 150 hours extracting training data, while following a model inspection protocol that OpenAI set up precisely to avoid conducting potentially damning searches of its own database. This process began in October, but by mid-November, the NYT discovered that some of the data gathered had been erased due to what OpenAI called a “glitch.”

Looking to update the court about potential delays in discovery, the NYT asked OpenAI to collaborate on a joint filing admitting the deletion occurred. But OpenAI declined, instead filing a separate response calling the newspaper’s accusation that evidence was deleted “exaggerated” and blaming the NYT for the technical problem that triggered the data deleting.

OpenAI denied deleting “any evidence,” instead admitting only that file-system information was “inadvertently removed” after the NYT requested a change that resulted in “self-inflicted wounds.” According to OpenAI, the tech problem emerged because NYT was hoping to speed up its searches and requested a change to the model inspection set-up that OpenAI warned “would yield no speed improvements and might even hinder performance.”

The AI company accused the NYT of negligence during discovery, “repeatedly running flawed code” while conducting searches of URLs and phrases from various newspaper articles and failing to back up their data. Allegedly the change that NYT requested “resulted in removing the folder structure and some file names on one hard drive,” which “was supposed to be used as a temporary cache for storing OpenAI data, but evidently was also used by Plaintiffs to save some of their search results (apparently without any backups).”

Once OpenAI figured out what happened, data was restored, OpenAI said. But the NYT alleged that the only data that OpenAI could recover did “not include the original folder structure and original file names” and therefore “is unreliable and cannot be used to determine where the News Plaintiffs’ copied articles were used to build Defendants’ models.”

In response, OpenAI suggested that the NYT could simply take a few days and re-run the searches, insisting, “contrary to Plaintiffs’ insinuations, there is no reason to think that the contents of any files were lost.” But the NYT does not seem happy about having to retread any part of model inspection, continually frustrated by OpenAI’s expectation that plaintiffs must come up with search terms when OpenAI understands its models best.

OpenAI claimed that it has consulted on search terms and been “forced to pour enormous resources” into supporting the NYT’s model inspection efforts while continuing to avoid saying how much it’s costing. Previously, the NYT accused OpenAI of seeking to profit off these searches, attempting to charge retail prices instead of being transparent about actual costs.

Now, OpenAI appears to be more willing to conduct searches on behalf of NYT that it previously sought to avoid. In its filing, OpenAI asked the court to order news plaintiffs to “collaborate with OpenAI to develop a plan for reasonable, targeted searches to be executed either by Plaintiffs or OpenAI.”

How that might proceed will be discussed at a hearing on December 3. OpenAI said it was committed to preventing future technical issues and was “committed to resolving these issues efficiently and equitably.”

It’s not the first time OpenAI deleted data

This isn’t the only time that OpenAI has been called out for deleting data in a copyright case.

In May, book authors, including Sarah Silverman and Paul Tremblay, told a US district court in California that OpenAI admitted to deleting the controversial AI training data sets at issue in that litigation. Additionally, OpenAI admitted that “witnesses knowledgeable about the creation of these datasets have apparently left the company,” authors’ court filing said. Unlike the NYT, book authors seem to suggest that OpenAI’s deleting appeared potentially suspicious.

“OpenAI’s delay campaign continues,” the authors’ filing said, alleging that “evidence of what was contained in these datasets, how they were used, the circumstances of their deletion and the reasons for” the deletion “are all highly relevant.”

The judge in that case, Robert Illman, wrote that OpenAI’s dispute with authors has so far required too much judicial intervention, noting that both sides “are not exactly proceeding through the discovery process with the degree of collegiality and cooperation that might be optimal.” Wired noted similarly the NYT case is “not exactly a lovefest.”

As these cases proceed, plaintiffs in both cases are struggling to decide on search terms that will surface the evidence they seek. While the NYT case is bogged down by OpenAI seemingly refusing to conduct any searches yet on behalf of publishers, the book author case is differently being dragged out by authors failing to provide search terms. Only four of the 15 authors suing have sent search terms, as their deadline for discovery approaches on January 27, 2025.

NYT judge rejects key part of fair use defense

OpenAI’s defense primarily hinges on courts agreeing that copying authors’ works to train AI is a transformative fair use that benefits the public, but the judge in the NYT case, Ona Wang, rejected a key part of that fair use defense late last week.

To win their fair use argument, OpenAI was trying to modify a fair use factor regarding “the effect of the use upon the potential market for or value of the copyrighted work” by invoking a common argument that the factor should be modified to include the “public benefits the copying will likely produce.”

Part of this defense tactic sought to prove that the NYT’s journalism benefits from generative AI technologies like ChatGPT, with OpenAI hoping to topple NYT’s claim that ChatGPT posed an existential threat to its business. To that end, OpenAI sought documents showing that the NYT uses AI tools, creates its own AI tools, and generally supports the use of AI in journalism outside the court battle.

On Friday, however, Wang denied OpenAI’s motion to compel this kind of evidence. Wang deemed it irrelevant to the case despite OpenAI’s claims that if AI tools “benefit” the NYT’s journalism, that “benefit” would be relevant to OpenAI’s fair use defense.

“But the Supreme Court specifically states that a discussion of ‘public benefits’ must relate to the benefits from the copying,” Wang wrote in a footnote, not “whether the copyright holder has admitted that other uses of its copyrights may or may not constitute fair use, or whether the copyright holder has entered into business relationships with other entities in the defendant’s industry.”

This likely stunts OpenAI’s fair use defense by cutting off an area of discovery that OpenAI previously fought hard to pursue. It essentially leaves OpenAI to argue that its copying of NYT content specifically serves a public good, not the act of AI training generally.

In February, Ars forecasted that the NYT might have the upper hand in this case because the NYT already showed that sometimes ChatGPT would reproduce word-for-word snippets of articles. That will likely make it harder to convince the court that training ChatGPT by copying NYT articles is a transformative fair use, as Google Books famously did when copying books to create a searchable database.

For OpenAI, the strategy seems to be to erect as strong a fair use case as possible to defend its most popular release. And if the court sides with OpenAI on that question, it won’t really matter how much evidence the NYT surfaces during model inspection. But if the use is not seen as transformative and then the NYT can prove the copying harms its business—without benefiting the public—OpenAI could risk losing this important case when the verdict comes in 2025. And that could have implications for book authors’ suit as well as other litigation, expected to drag into 2026.

Photo of Ashley Belanger

Ashley is a senior policy reporter for Ars Technica, dedicated to tracking social impacts of emerging policies and new technologies. She is a Chicago-based journalist with 20 years of experience.

OpenAI blamed NYT for tech problem erasing evidence of copyright abuse Read More »

openai-accused-of-trying-to-profit-off-ai-model-inspection-in-court

OpenAI accused of trying to profit off AI model inspection in court


Experiencing some technical difficulties

How do you get an AI model to confess what’s inside?

Credit: Aurich Lawson | Getty Images

Since ChatGPT became an instant hit roughly two years ago, tech companies around the world have rushed to release AI products while the public is still in awe of AI’s seemingly radical potential to enhance their daily lives.

But at the same time, governments globally have warned it can be hard to predict how rapidly popularizing AI can harm society. Novel uses could suddenly debut and displace workers, fuel disinformation, stifle competition, or threaten national security—and those are just some of the obvious potential harms.

While governments scramble to establish systems to detect harmful applications—ideally before AI models are deployed—some of the earliest lawsuits over ChatGPT show just how hard it is for the public to crack open an AI model and find evidence of harms once a model is released into the wild. That task is seemingly only made harder by an increasingly thirsty AI industry intent on shielding models from competitors to maximize profits from emerging capabilities.

The less the public knows, the seemingly harder and more expensive it is to hold companies accountable for irresponsible AI releases. This fall, ChatGPT-maker OpenAI was even accused of trying to profit off discovery by seeking to charge litigants retail prices to inspect AI models alleged as causing harms.

In a lawsuit raised by The New York Times over copyright concerns, OpenAI suggested the same model inspection protocol used in a similar lawsuit raised by book authors.

Under that protocol, the NYT could hire an expert to review highly confidential OpenAI technical materials “on a secure computer in a secured room without Internet access or network access to other computers at a secure location” of OpenAI’s choosing. In this closed-off arena, the expert would have limited time and limited queries to try to get the AI model to confess what’s inside.

The NYT seemingly had few concerns about the actual inspection process but bucked at OpenAI’s intended protocol capping the number of queries their expert could make through an application programming interface to $15,000 worth of retail credits. Once litigants hit that cap, OpenAI suggested that the parties split the costs of remaining queries, charging the NYT and co-plaintiffs half-retail prices to finish the rest of their discovery.

In September, the NYT told the court that the parties had reached an “impasse” over this protocol, alleging that “OpenAI seeks to hide its infringement by professing an undue—yet unquantified—’expense.'” According to the NYT, plaintiffs would need $800,000 worth of retail credits to seek the evidence they need to prove their case, but there’s allegedly no way it would actually cost OpenAI that much.

“OpenAI has refused to state what its actual costs would be, and instead improperly focuses on what it charges its customers for retail services as part of its (for profit) business,” the NYT claimed in a court filing.

In its defense, OpenAI has said that setting the initial cap is necessary to reduce the burden on OpenAI and prevent a NYT fishing expedition. The ChatGPT maker alleged that plaintiffs “are requesting hundreds of thousands of dollars of credits to run an arbitrary and unsubstantiated—and likely unnecessary—number of searches on OpenAI’s models, all at OpenAI’s expense.”

How this court debate resolves could have implications for future cases where the public seeks to inspect models causing alleged harms. It seems likely that if a court agrees OpenAI can charge retail prices for model inspection, it could potentially deter lawsuits from any plaintiffs who can’t afford to pay an AI expert or commercial prices for model inspection.

Lucas Hansen, co-founder of CivAI—a company that seeks to enhance public awareness of what AI can actually do—told Ars that probably a lot of inspection can be done on public models. But often, public models are fine-tuned, perhaps censoring certain queries and making it harder to find information that a model was trained on—which is the goal of NYT’s suit. By gaining API access to original models instead, litigants could have an easier time finding evidence to prove alleged harms.

It’s unclear exactly what it costs OpenAI to provide that level of access. Hansen told Ars that costs of training and experimenting with models “dwarfs” the cost of running models to provide full capability solutions. Developers have noted in forums that costs of API queries quickly add up, with one claiming OpenAI’s pricing is “killing the motivation to work with the APIs.”

The NYT’s lawyers and OpenAI declined to comment on the ongoing litigation.

US hurdles for AI safety testing

Of course, OpenAI is not the only AI company facing lawsuits over popular products. Artists have sued makers of image generators for allegedly threatening their livelihoods, and several chatbots have been accused of defamation. Other emerging harms include very visible examples—like explicit AI deepfakes, harming everyone from celebrities like Taylor Swift to middle schoolers—as well as underreported harms, like allegedly biased HR software.

A recent Gallup survey suggests that Americans are more trusting of AI than ever but still twice as likely to believe AI does “more harm than good” than that the benefits outweigh the harms. Hansen’s CivAI creates demos and interactive software for education campaigns helping the public to understand firsthand the real dangers of AI. He told Ars that while it’s hard for outsiders to trust a study from “some random organization doing really technical work” to expose harms, CivAI provides a controlled way for people to see for themselves how AI systems can be misused.

“It’s easier for people to trust the results, because they can do it themselves,” Hansen told Ars.

Hansen also advises lawmakers grappling with AI risks. In February, CivAI joined the Artificial Intelligence Safety Institute Consortium—a group including Fortune 500 companies, government agencies, nonprofits, and academic research teams that help to advise the US AI Safety Institute (AISI). But so far, Hansen said, CivAI has not been very active in that consortium beyond scheduling a talk to share demos.

The AISI is supposed to protect the US from risky AI models by conducting safety testing to detect harms before models are deployed. Testing should “address risks to human rights, civil rights, and civil liberties, such as those related to privacy, discrimination and bias, freedom of expression, and the safety of individuals and groups,” President Joe Biden said in a national security memo last month, urging that safety testing was critical to support unrivaled AI innovation.

“For the United States to benefit maximally from AI, Americans must know when they can trust systems to perform safely and reliably,” Biden said.

But the AISI’s safety testing is voluntary, and while companies like OpenAI and Anthropic have agreed to the voluntary testing, not every company has. Hansen is worried that AISI is under-resourced and under-budgeted to achieve its broad goals of safeguarding America from untold AI harms.

“The AI Safety Institute predicted that they’ll need about $50 million in funding, and that was before the National Security memo, and it does not seem like they’re going to be getting that at all,” Hansen told Ars.

Biden had $50 million budgeted for AISI in 2025, but Donald Trump has threatened to dismantle Biden’s AI safety plan upon taking office.

The AISI was probably never going to be funded well enough to detect and deter all AI harms, but with its future unclear, even the limited safety testing the US had planned could be stalled at a time when the AI industry continues moving full speed ahead.

That could largely leave the public at the mercy of AI companies’ internal safety testing. As frontier models from big companies will likely remain under society’s microscope, OpenAI has promised to increase investments in safety testing and help establish industry-leading safety standards.

According to OpenAI, that effort includes making models safer over time, less prone to producing harmful outputs, even with jailbreaks. But OpenAI has a lot of work to do in that area, as Hansen told Ars that he has a “standard jailbreak” for OpenAI’s most popular release, ChatGPT, “that almost always works” to produce harmful outputs.

The AISI did not respond to Ars’ request to comment.

NYT “nowhere near done” inspecting OpenAI models

For the public, who often become guinea pigs when AI acts unpredictably, risks remain, as the NYT case suggests that the costs of fighting AI companies could go up while technical hiccups could delay resolutions. Last week, an OpenAI filing showed that NYT’s attempts to inspect pre-training data in a “very, very tightly controlled environment” like the one recommended for model inspection were allegedly continuously disrupted.

“The process has not gone smoothly, and they are running into a variety of obstacles to, and obstructions of, their review,” the court filing describing NYT’s position said. “These severe and repeated technical issues have made it impossible to effectively and efficiently search across OpenAI’s training datasets in order to ascertain the full scope of OpenAI’s infringement. In the first week of the inspection alone, Plaintiffs experienced nearly a dozen disruptions to the inspection environment, which resulted in many hours when News Plaintiffs had no access to the training datasets and no ability to run continuous searches.”

OpenAI was additionally accused of refusing to install software the litigants needed and randomly shutting down ongoing searches. Frustrated after more than 27 days of inspecting data and getting “nowhere near done,” the NYT keeps pushing the court to order OpenAI to provide the data instead. In response, OpenAI said plaintiffs’ concerns were either “resolved” or discussions remained “ongoing,” suggesting there was no need for the court to intervene.

So far, the NYT claims that it has found millions of plaintiffs’ works in the ChatGPT pre-training data but has been unable to confirm the full extent of the alleged infringement due to the technical difficulties. Meanwhile, costs keep accruing in every direction.

“While News Plaintiffs continue to bear the burden and expense of examining the training datasets, their requests with respect to the inspection environment would be significantly reduced if OpenAI admitted that they trained their models on all, or the vast majority, of News Plaintiffs’ copyrighted content,” the court filing said.

Photo of Ashley Belanger

Ashley is a senior policy reporter for Ars Technica, dedicated to tracking social impacts of emerging policies and new technologies. She is a Chicago-based journalist with 20 years of experience.

OpenAI accused of trying to profit off AI model inspection in court Read More »

ever-heard-of-“llady-gaga”?-universal-files-piracy-suit-over-alleged-knockoffs.

Ever heard of “Llady Gaga”? Universal files piracy suit over alleged knockoffs.

Universal Music Group yesterday sued a music firm that allegedly distributes pirated songs on popular streaming services under misspelled versions of popular artists’ names—such as “Kendrik Laamar,” “Arriana Gramde,” “Jutin Biber,” and “Llady Gaga.” The UMG Recordings lawsuit against the French company Believe and its US-based subsidiary, TuneCore, alleges that “Believe is fully aware that its business model is fueled by rampant piracy” and “turned a blind eye to the fact that its music catalog was rife with copyright infringing sound recordings.”

Believe is a publicly traded company with about 2,020 employees in over 50 countries and reported $518 million (474.1 million euros) in revenue in the first half of 2024. Believe says its “mission is to develop independent artists and labels in the digital world.”

UMG alleges that Believe achieved “dramatic growth and profitability in recent years by operating as a hub for the distribution of infringing copies of the world’s most popular copyrighted recordings.” Believe has licensing deals with online platforms “including TikTok, YouTube, Spotify, Apple Music, Instagram and hundreds of others,” the lawsuit said.

UMG alleged that Believe distributes songs on these services “with full knowledge that many of the clients of its distribution services are fraudsters regularly providing infringing copies of copyrighted recordings.” Believe enters into “distribution contracts with anyone willing to sign one of its basic form agreements,” and its “client list is overrun with fraudulent ‘artists’ and pirate record labels who rely on Believe and its distribution network to seed infringing copies of popular sound recordings throughout the digital music ecosystem,” the lawsuit said, continuing:

Believe makes little effort to hide its illegal actions. Indeed, the names of its “artists” and recordings are often minor variants on the names of Plaintiffs’ famous recording artists and the titles of their most successful works. For example, Believe has distributed infringing tracks from infringers who call themselves “Kendrik Laamar” (a reference to Kendrick Lamar); “Arriana Gramde” (a reference to Ariana Grande); “Jutin Biber” (a reference to Justin Bieber); and “Llady Gaga” (a reference to Lady Gaga). Often, Believe distributes overtly infringing versions of original tracks by famous artists with notations that they are “sped up” or “remixed.”

The Rihanna song “S&M” was distributed as a remix by Believe under the name “Rihamna,” the lawsuit said. In other cases, names associated with allegedly infringing tracks were very different from the real artists’ names. The lawsuit said Lady Gaga’s “Bad Romance” and Billie Eilish’s “TV” were both distributed in sped-up form under the name “INDRAGERSN.”

Ever heard of “Llady Gaga”? Universal files piracy suit over alleged knockoffs. Read More »

tesla,-warner-bros.-sued-for-using-ai-ripoff-of-iconic-blade-runner-imagery

Tesla, Warner Bros. sued for using AI ripoff of iconic Blade Runner imagery


A copy of a copy of a copy

“That movie sucks,” Elon Musk said in response to the lawsuit.

Credit: via Alcon Entertainment

Elon Musk may have personally used AI to rip off a Blade Runner 2049 image for a Tesla cybercab event after producers rejected any association between their iconic sci-fi movie and Musk or any of his companies.

In a lawsuit filed Tuesday, lawyers for Alcon Entertainment—exclusive rightsholder of the 2017 Blade Runner 2049 movie—accused Warner Bros. Discovery (WBD) of conspiring with Musk and Tesla to steal the image and infringe Alcon’s copyright to benefit financially off the brand association.

According to the complaint, WBD did not approach Alcon for permission until six hours before the Tesla event when Alcon “refused all permissions and adamantly objected” to linking their movie with Musk’s cybercab.

At that point, WBD “disingenuously” downplayed the license being sought, the lawsuit said, claiming they were seeking “clip licensing” that the studio should have known would not provide rights to livestream the Tesla event globally on X (formerly Twitter).

Musk’s behavior cited

Alcon said it would never allow Tesla to exploit its Blade Runner film, so “although the information given was sparse, Alcon learned enough information for Alcon’s co-CEOs to consider the proposal and firmly reject it, which they did.” Specifically, Alcon denied any affiliation—express or implied—between Tesla’s cybercab and Blade Runner 2049.

“Musk has become an increasingly vocal, overtly political, highly polarizing figure globally, and especially in Hollywood,” Alcon’s complaint said. If Hollywood perceived an affiliation with Musk and Tesla, the complaint said, the company risked alienating not just other car brands currently weighing partnerships on the Blade Runner 2099 TV series Alcon has in the works, but also potentially losing access to top Hollywood talent for their films.

The “Hollywood talent pool market generally is less likely to deal with Alcon, or parts of the market may be, if they believe or are confused as to whether, Alcon has an affiliation with Tesla or Musk,” the complaint said.

Musk, the lawsuit said, is “problematic,” and “any prudent brand considering any Tesla partnership has to take Musk’s massively amplified, highly politicized, capricious and arbitrary behavior, which sometimes veers into hate speech, into account.”

In bad faith

Because Alcon had no chance to avoid the affiliation while millions viewed the cybercab livestream on X, Alcon saw Tesla using the images over Alcon’s objections as “clearly” a “bad faith and malicious gambit… to link Tesla’s cybercab to strong Hollywood brands at a time when Tesla and Musk are on the outs with Hollywood,” the complaint said.

Alcon believes that WBD’s agreement was likely worth six or seven figures and likely stipulated that Tesla “affiliate the cybercab with one or more motion pictures from” WBD’s catalog.

While any of the Mad Max movies may have fit the bill, Musk wanted to use Blade Runner 2049, the lawsuit alleged, because that movie features an “artificially intelligent autonomously capable” flying car (known as a spinner) and is “extremely relevant” to “precisely the areas of artificial intelligence, self-driving capability, and autonomous automotive capability that Tesla and Musk are trying to market” with the cybercab.

The Blade Runner 2049 spinner is “one of the most famous vehicles in motion picture history,” the complaint alleged, recently exhibited alongside other iconic sci-fi cars like the Back to the Future time-traveling DeLorean or the light cycle from Tron: Legacy.

As Alcon sees it, Musk seized the misappropriation of the Blade Runner image to help him sell Teslas, and WBD allegedly directed Musk to use AI to skirt Alcon’s copyright to avoid a costly potential breach of contract on the day of the event.

For Alcon, brand partnerships are a lucrative business, with carmakers paying as much as $10 million to associate their vehicles with Blade Runner 2049. By seemingly using AI to generate a stylized copy of the image at the heart of the movie—which references the scene where their movie’s hero, K, meets the original 1982 Blade Runner hero, Rick Deckard—Tesla avoided paying Alcon’s typical fee, their complaint said.

Musk maybe faked the image himself, lawsuit says

During the live event, Musk introduced the cybercab on a WBD Hollywood studio lot. For about 11 seconds, the Tesla founder “awkwardly” displayed a fake, allegedly AI-generated Blade Runner 2049 film still. He used the image to make a point that apocalyptic films show a future that’s “dark and dismal,” whereas Tesla’s vision of the future is much brighter.

In Musk’s slideshow image, believed to be AI-generated, a male figure is “seen from behind, with close-cropped hair, wearing a trench coat or duster, standing in almost full silhouette as he surveys the abandoned ruins of a city, all bathed in misty orange light,” the lawsuit said. The similarity to the key image used in Blade Runner 2049 marketing is not “coincidental,” the complaint said.

If there were any doubts that this image was supposed to reference the Blade Runner movie, the lawsuit said, Musk “erased them” by directly referencing the movie in his comments.

“You know, I love Blade Runner, but I don’t know if we want that future,” Musk said at the event. “I believe we want that duster he’s wearing, but not the, uh, not the bleak apocalypse.”

The producers think the image was likely generated—”even possibly by Musk himself”—by “asking an AI image generation engine to make ‘an image from the K surveying ruined Las Vegas sequence of Blade Runner 2049,’ or some closely equivalent input direction,” the lawsuit said.

Alcon is not sure exactly what went down after the company rejected rights to use the film’s imagery at the event and is hoping to learn more through the litigation’s discovery phase.

Musk may try to argue that his comments at the Tesla event were “only meant to talk broadly about the general idea of science fiction films and undesirable apocalyptic futures and juxtaposing them with Musk’s ostensibly happier robot car future vision.”

But producers argued that defense is “not credible” since Tesla explicitly asked to use the Blade Runner 2049 image, and there are “better” films in WBD’s library to promote Musk’s message, like the Mad Max movies.

“But those movies don’t have massive consumer goodwill specifically around really cool-looking (Academy Award-winning) artificially intelligent, autonomous cars,” the complaint said, accusing Musk of stealing the image when it wasn’t given to him.

If Tesla and WBD are found to have violated copyright and false representation laws, that potentially puts both companies on the hook for damages that cover not just copyright fines but also Alcon’s lost profits and reputation damage after the alleged “massive economic theft.”

Musk responds to Blade Runner suit

Alcon suspects that Musk believed that Blade Runner 2049 was eligible to be used at the event under the WBD agreement, not knowing that WBD never had “any non-domestic rights or permissions for the Picture.”

Once Musk requested to use the Blade Runner imagery, Alcon alleged that WBD scrambled to secure rights by obscuring the very lucrative “larger brand affiliation proposal” by positioning their ask as a request for much less expensive “clip licensing.”

After Alcon rejected the proposal outright, WBD told Tesla that the affiliation in the event could not occur because X planned to livestream the event globally. But even though Tesla and X allegedly knew that the affiliation was rejected, Musk appears to have charged ahead with the event as planned.

“It all exuded an odor of thinly contrived excuse to link Tesla’s cybercab to strong Hollywood brands,” Alcon’s complaint said. “Which of course is exactly what it was.”

Alcon is hoping a jury will find Tesla, Musk, and WBD violated laws. Producers have asked for an injunction stopping Tesla from using any Blade Runner imagery in its promotional or advertising campaigns. They also want a disclaimer slapped on the livestreamed event video on X, noting that the Blade Runner association is “false or misleading.”

For Musk, a ban on linking Blade Runner to his car company may feel bleak. Last year, he touted the Cybertruck as an “armored personnel carrier from the future—what Bladerunner would have driven.”  This amused many Blade Runner fans, as Gizmodo noted, because there never was a character named “Bladerunner,” but rather that was just a job title for the film’s hero Deckard.

In response to the lawsuit, Musk took to X to post what Blade Runner fans—who rated the 2017 movie as 88 percent fresh on Rotten Tomatoes—might consider a polarizing take, replying, “That movie sucks” on a post calling out Alcon’s lawsuit as “absurd.”

Photo of Ashley Belanger

Ashley is a senior policy reporter for Ars Technica, dedicated to tracking social impacts of emerging policies and new technologies. She is a Chicago-based journalist with 20 years of experience.

Tesla, Warner Bros. sued for using AI ripoff of iconic Blade Runner imagery Read More »

man-tricks-openai’s-voice-bot-into-duet-of-the-beatles’-“eleanor-rigby”

Man tricks OpenAI’s voice bot into duet of The Beatles’ “Eleanor Rigby”

A screen capture of AJ Smith doing his Eleanor Rigby duet with OpenAI's Advanced Voice Mode through the ChatGPT app.

Enlarge / A screen capture of AJ Smith doing his Eleanor Rigby duet with OpenAI’s Advanced Voice Mode through the ChatGPT app.

OpenAI’s new Advanced Voice Mode (AVM) of its ChatGPT AI assistant rolled out to subscribers on Tuesday, and people are already finding novel ways to use it, even against OpenAI’s wishes. On Thursday, a software architect named AJ Smith tweeted a video of himself playing a duet of The Beatles’ 1966 song “Eleanor Rigby” with AVM. In the video, Smith plays the guitar and sings, with the AI voice interjecting and singing along sporadically, praising his rendition.

“Honestly, it was mind-blowing. The first time I did it, I wasn’t recording and literally got chills,” Smith told Ars Technica via text message. “I wasn’t even asking it to sing along.”

Smith is no stranger to AI topics. In his day job, he works as associate director of AI Engineering at S&P Global. “I use [AI] all the time and lead a team that uses AI day to day,” he told us.

In the video, AVM’s voice is a little quavery and not pitch-perfect, but it appears to know something about “Eleanor Rigby’s” melody when it first sings, “Ah, look at all the lonely people.” After that, it seems to be guessing at the melody and rhythm as it recites song lyrics. We have also convinced Advanced Voice Mode to sing, and it did a perfect melodic rendition of “Happy Birthday” after some coaxing.

AJ Smith’s video of singing a duet with OpenAI’s Advanced Voice Mode.

Normally, when you ask AVM to sing, it will reply something like, “My guidelines won’t let me talk about that.” That’s because in the chatbot’s initial instructions (called a “system prompt“), OpenAI instructs the voice assistant not to sing or make sound effects (“Do not sing or hum,” according to one system prompt leak).

OpenAI possibly added this restriction because AVM may otherwise reproduce copyrighted content, such as songs that were found in the training data used to create the AI model itself. That’s what is happening here to a limited extent, so in a sense, Smith has discovered a form of what researchers call a “prompt injection,” which is a way of convincing an AI model to produce outputs that go against its system instructions.

How did Smith do it? He figured out a game that reveals AVM knows more about music than it may let on in conversation. “I just said we’d play a game. I’d play the four pop chords and it would shout out songs for me to sing along with those chords,” Smith told us. “Which did work pretty well! But after a couple songs it started to sing along. Already it was such a unique experience, but that really took it to the next level.”

This is not the first time humans have played musical duets with computers. That type of research stretches back to the 1970s, although it was typically limited to reproducing musical notes or instrumental sounds. But this is the first time we’ve seen anyone duet with an audio-synthesizing voice chatbot in real time.

Man tricks OpenAI’s voice bot into duet of The Beatles’ “Eleanor Rigby” Read More »

one-startup’s-plan-to-fix-ai’s-“shoplifting”-problem

One startup’s plan to fix AI’s “shoplifting” problem

I’ve been caught stealing, once when I was five —

Algorithm will identify sources used by generative AI, compensate them for use.

One startup’s plan to fix AI’s “shoplifting” problem

Bloomberg via Getty

Bill Gross made his name in the tech world in the 1990s, when he came up with a novel way for search engines to make money on advertising. Under his pricing scheme, advertisers would pay when people clicked on their ads. Now, the “pay-per-click” guy has founded a startup called ProRata, which has an audacious, possibly pie-in-the-sky business model: “AI pay-per-use.”

Gross, who is CEO of the Pasadena, California, company, doesn’t mince words about the generative AI industry. “It’s stealing,” he says. “They’re shoplifting and laundering the world’s knowledge to their benefit.”

AI companies often argue that they need vast troves of data to create cutting-edge generative tools and that scraping data from the Internet, whether it’s text from websites, video or captions from YouTube, or books pilfered from pirate libraries, is legally allowed. Gross doesn’t buy that argument. “I think it’s bullshit,” he says.

So do plenty of media executives, artists, writers, musicians, and other rights-holders who are pushing back—it’s hard to keep up with the constant flurry of copyright lawsuits filed against AI companies, alleging that the way they operate amounts to theft.

But Gross thinks ProRata offers a solution that beats legal battles. “To make it fair—that’s what I’m trying to do,” he says. “I don’t think this should be solved by lawsuits.”

His company aims to arrange revenue-sharing deals so publishers and individuals get paid when AI companies use their work. Gross explains it like this: “We can take the output of generative AI, whether it’s text or an image or music or a movie, and break it down into the components, to figure out where they came from, and then give a percentage attribution to each copyright holder, and then pay them accordingly.” ProRata has filed patent applications for the algorithms it created to assign attribution and make the appropriate payments.

This week, the company, which has raised $25 million, launched with a number of big-name partners, including Universal Music Group, the Financial Times, The Atlantic, and media company Axel Springer. In addition, it has made deals with authors with large followings, including Tony Robbins, Neal Postman, and Scott Galloway. (It has also partnered with former White House Communications Director Anthony Scaramucci.)

Even journalism professor Jeff Jarvis, who believes scraping the web for AI training is fair use, has signed on. He tells WIRED that it’s smart for people in the news industry to band together to get AI companies access to “credible and current information” to include in their output. “I hope that ProRata might open discussion for what could turn into APIs [application programming interfaces] for various content,” he says.

Following the company’s initial announcement, Gross says he had a deluge of messages from other companies asking to sign up, including a text from Time CEO Jessica Sibley. ProRata secured a deal with Time, the publisher confirmed to WIRED. He plans to pursue agreements with high-profile YouTubers and other individual online stars.

The key word here is “plans.” The company is still in its very early days, and Gross is talking a big game. As a proof of concept, ProRata is launching its own subscription chatbot-style search engine in October. Unlike other AI search products, ProRata’s search tool will exclusively use licensed data. There’s nothing scraped using a web crawler. “Nothing from Reddit,” he says.

Ed Newton-Rex, a former Stability AI executive who now runs the ethical data licensing nonprofit Fairly Trained, is heartened by ProRata’s debut. “It’s great to see a generative AI company licensing training data before releasing their model, in contrast to many other companies’ approach,” he says. “The deals they have in place further demonstrate media companies’ openness to working with good actors.”

Gross wants the search engine to demonstrate that quality of data is more important than quantity and believes that limiting the model to trustworthy information sources will curb hallucinations. “I’m claiming that 70 million good documents is actually superior to 70 billion bad documents,” he says. “It’s going to lead to better answers.”

What’s more, Gross thinks he can get enough people to sign up for this all-licensed-data AI search engine to make as much money needed to pay its data providers their allotted share. “Every month the partners will get a statement from us saying, ‘Here’s what people search for, here’s how your content was used, and here’s your pro rata check,’” he says.

Other startups already are jostling for prominence in this new world of training-data licensing, like the marketplaces TollBit and Human Native AI. A nonprofit called the Dataset Providers Alliance was formed earlier this summer to push for more standards in licensing; founding members include services like the Global Copyright Exchange and Datarade.

ProRata’s business model hinges in part on its plan to license its attribution and payment technologies to other companies, including major AI players. Some of those companies have begun striking their own deals with publishers. (The Atlantic and Axel Springer, for instance, have agreements with OpenAI.) Gross hopes that AI companies will find licensing ProRata’s models more affordable than creating them in-house.

“I’ll license the system to anyone who wants to use it,” Gross says. “I want to make it so cheap that it’s like a Visa or MasterCard fee.”

This story originally appeared on wired.com.

One startup’s plan to fix AI’s “shoplifting” problem Read More »

the-“netflix-of-anime”-piracy-site-abruptly-shuts-down,-shocking-users

The “Netflix of anime” piracy site abruptly shuts down, shocking users

Disney+ promotional art for <em>The Fable</em>, an anime series that triggered Animeflix takedown notices.” src=”https://cdn.arstechnica.net/wp-content/uploads/2024/07/The-Fable-press-image-800×450.jpeg”></img><figcaption>
<p><a data-height=Enlarge / Disney+ promotional art for The Fable, an anime series that triggered Animeflix takedown notices.

Disney+

Thousands of anime fans were shocked Thursday when the popular piracy site Animeflix voluntarily shut down without explaining why, TorrentFreak reported.

“It is with a heavy heart that we announce the closure of Animeflix,” the site’s operators told users in a Discord with 35,000 members. “After careful consideration, we have decided to shut down our service effective immediately. We deeply appreciate your support and enthusiasm over the years.”

Prior to its shutdown, Animeflix attracted millions of monthly visits, TorrentFreak reported. It was preferred by some anime fans for its clean interface, with one fan on Reddit describing Animeflix as the “Netflix of anime.”

“Deadass this site was clean,” one Reddit user wrote. “The best I’ve ever seen. Sad to see it go.”

Although Animeflix operators did not connect the dots for users, TorrentFreak suggested that the piracy site chose to shut down after facing “considerable legal pressure in recent months.”

Back in December, an anti-piracy group, Alliance for Creativity and Entertainment (ACE), sought to shut down Animeflix. Then in mid-May, rightsholders—including Netflix, Disney, Universal, Paramount, and Warner Bros.—won an injunction through the High Court of India against several piracy sites, including Animeflix. This briefly caused Animeflix to be unavailable until Animeflix simply switched to another domain and continued serving users, TorrentFreak reported.

Although Animeflix is not telling users why it’s choosing to shut down now, TorrentFreak—which, as its name suggests, focuses much of its coverage on copyright issues impacting file sharing online—noted that “when a pirate site shuts down, voluntarily or not, copyright issues typically play a role.”

For anime fans, the abrupt closure was disappointing because of difficulty accessing the hottest new anime titles and delays as studios work to offer translations to various regions. The delays are so bad that some studios are considering combating piracy by using AI to push out translated versions more quickly. But fans fear this will only result in low-quality subtitles, CBR reported.

On Reddit, some fans also complained after relying exclusively on Animeflix to keep track of where they left off on anime shows that often span hundreds of episodes.

Others begged to be turned onto other anime piracy sites, while some speculated whether Animeflix might eventually pop up at a new domain. TorrentFreak noted that Animeflix shut down once previously several years ago but ultimately came back. One Redditor wrote, “another hero has passed away but the will, will be passed.” On another Reddit thread asking “will Animeflix be gone forever or maybe create a new site,” one commenter commiserated, writing, “We don’t know for sure. Only time will tell.”

It’s also possible that someone else may pick up the torch and operate a new piracy site under the same name. According to TorrentFreak, this is “likely.”

Animeflix did not reassure users that it may be back, instead urging them to find other sources for their favorite shows and movies.

“We hope the joy and excitement of anime continue to brighten your days through other wonderful platforms,” Animeflix’s Discord message said.

ACE did not immediately respond to Ars’ request for comment.

The “Netflix of anime” piracy site abruptly shuts down, shocking users Read More »

washing-machine-chime-scandal-shows-how-absurd-youtube-copyright-abuse-can-get

Washing machine chime scandal shows how absurd YouTube copyright abuse can get

Washing machine chime scandal shows how absurd YouTube copyright abuse can get

YouTube’s Content ID system—which automatically detects content registered by rightsholders—is “completely fucking broken,” a YouTuber called “Albino” declared in a rant on X (formerly Twitter) viewed more than 950,000 times.

Albino, who is also a popular Twitch streamer, complained that his YouTube video playing through Fallout was demonetized because a Samsung washing machine randomly chimed to signal a laundry cycle had finished while he was streaming.

Apparently, YouTube had automatically scanned Albino’s video and detected the washing machine chime as a song called “Done”—which Albino quickly saw was uploaded to YouTube by a musician known as Audego nine years ago.

But when Albino hit play on Audego’s song, the only thing that he heard was a 30-second clip of the washing machine chime. To Albino it was obvious that Audego didn’t have any rights to the jingle, which Dexerto reported actually comes from the song “Die Forelle” (“The Trout”) from Austrian composer Franz Schubert.

The song was composed in 1817 and is in the public domain. Samsung has used it to signal the end of a wash cycle for years, sparking debate over whether it’s the catchiest washing machine song and inspiring at least one violinist to perform a duet with her machine. It’s been a source of delight for many Samsung customers, but for Albino, hearing the jingle appropriated on YouTube only inspired ire.

“A guy recorded his fucking washing machine and uploaded it to YouTube with Content ID,” Albino said in a video on X. “And now I’m getting copyright claims” while “my money” is “going into the toilet and being given to this fucking slime.”

Albino suggested that YouTube had potentially allowed Audego to make invalid copyright claims for years without detecting the seemingly obvious abuse.

“How is this still here?” Albino asked. “It took me one Google search to figure this out,” and “now I’m sharing revenue with this? That’s insane.”

At first, Team YouTube gave Albino a boilerplate response on X, writing, “We understand how important it is for you. From your vid, it looks like you’ve recently submitted a dispute. When you dispute a Content ID claim, the person who claimed your video (the claimant) is notified and they have 30 days to respond.”

Albino expressed deep frustration at YouTube’s response, given how “egregious” he considered the copyright abuse to be.

“Just wait for the person blatantly stealing copyrighted material to respond,” Albino responded to YouTube. “Ah okay, yes, I’m sure they did this in good faith and will make the correct call, though it would be a shame if they simply clicked ‘reject dispute,’ took all the ad revenue money and forced me to risk having my channel terminated to appeal it!! XDxXDdxD!! Thanks Team YouTube!”

Soon after, YouTube confirmed on X that Audego’s copyright claim was indeed invalid. The social platform ultimately released the claim and told Albino to expect the changes to be reflected on his channel within two business days.

Ars could not immediately reach YouTube or Albino for comment.

Widespread abuse of Content ID continues

YouTubers have complained about abuse of Content ID for years. Techdirt’s Timothy Geigner agreed with Albino’s assessment that the YouTube system is “hopelessly broken,” noting that sometimes content is flagged by mistake. But just as easily, bad actors can abuse the system to claim “content that simply isn’t theirs” and seize sometimes as much as millions in ad revenue.

In 2021, YouTube announced that it had invested “hundreds of millions of dollars” to create content management tools, of which Content ID quickly emerged as the platform’s go-to solution to detect and remove copyrighted materials.

At that time, YouTube claimed that Content ID was created as a “solution for those with the most complex rights management needs,” like movie studios and record labels whose movie clips and songs are most commonly uploaded by YouTube users. YouTube warned that without Content ID, “rightsholders could have their rights impaired and lawful expression could be inappropriately impacted.”

Since its rollout, more than 99 percent of copyright actions on YouTube have consistently been triggered automatically through Content ID.

And just as consistently, YouTube has seen widespread abuse of Content ID, terminating “tens of thousands of accounts each year that attempt to abuse our copyright tools,” YouTube said. YouTube also acknowledged in 2021 that “just one invalid reference file in Content ID can impact thousands of videos and users, stripping them of monetization or blocking them altogether.”

To help rightsholders and creators track how much copyrighted content is removed from the platform, YouTube started releasing biannual transparency reports in 2021. The Electronic Frontier Foundation (EFF), a nonprofit digital rights group, applauded YouTube’s “move towards transparency” while criticizing YouTube’s “claim that YouTube is adequately protecting its creators.”

“That rings hollow,” EFF reported in 2021, noting that “huge conglomerates have consistently pushed for more and more restrictions on the use of copyrighted material, at the expense of fair use and, as a result, free expression.” As EFF saw it then, YouTube’s Content ID system mainly served to appease record labels and movie studios, while creators felt “pressured” not to dispute Content ID claims out of “fear” that their channel might be removed if YouTube consistently sided with rights holders.

According to YouTube, “it’s impossible for matching technology to take into account complex legal considerations like fair use or fair dealing,” and that impossibility seemingly ensures that creators bear the brunt of automated actions even when it’s fair to use copyrighted materials.

At that time, YouTube described Content ID as “an entirely new revenue stream from ad-supported, user generated content” for rights holders, who made more than $5.5 billion from Content ID matches by December 2020. More recently, YouTube reported that figure climbed above $9 million, as of December 2022. With so much money at play, it’s easy to see how the system could be seen as disproportionately favoring rights holders, while creators continue to suffer from income diverted by the automated system.

Washing machine chime scandal shows how absurd YouTube copyright abuse can get Read More »

can-an-online-library-of-classic-video-games-ever-be-legal?

Can an online library of classic video games ever be legal?

Legal eagles —

Preservationists propose access limits, but industry worries about a free “online arcade.”

The Q*Bert's so bright, I gotta wear shades.

Enlarge / The Q*Bert’s so bright, I gotta wear shades.

Aurich Lawson | Getty Images | Gottlieb

For years now, video game preservationists, librarians, and historians have been arguing for a DMCA exemption that would allow them to legally share emulated versions of their physical game collections with researchers remotely over the Internet. But those preservationists continue to face pushback from industry trade groups, which worry that an exemption would open a legal loophole for “online arcades” that could give members of the public free, legal, and widespread access to copyrighted classic games.

This long-running argument was joined once again earlier this month during livestreamed testimony in front of the Copyright Office, which is considering new DMCA rules as part of its regular triennial process. During that testimony, representatives of the Software Preservation Network and the Library Copyright Alliance defended their proposal for a system of “individualized human review” to help ensure that temporary remote game access would be granted “primarily for the purposes of private study, scholarship, teaching, or research.”

Lawyer Steve Englund, who represented the ESA at the Copyright Office hearing.

Enlarge / Lawyer Steve Englund, who represented the ESA at the Copyright Office hearing.

Speaking for the Entertainment Software Association trade group, though, lawyer Steve Englund said the new proposal was “not very much movement” on the part of the proponents and was “at best incomplete.” And when pressed on what would represent “complete” enough protections to satisfy the ESA, Englund balked.

“I don’t think there is at the moment any combination of limitations that ESA members would support to provide remote access,” Englund said. “The preservation organizations want a great deal of discretion to handle very valuable intellectual property. They have yet to… show a willingness on their part in a way that might be comforting to the owners of that IP.”

Getting in the way of research

Research institutions can currently offer remote access to digital copies of works like books, movies, and music due to specific DMCA exemptions issued by the Copyright Office. However, there is no similar exemption that allows for sending temporary digital copies of video games to interested researchers. That means museums like the Strong Museum of Play can only provide access to their extensive game archives if a researcher physically makes the trip to their premises in Rochester, New York.

Currently, the only way for researchers to access these games in the Strong Museum's collection is to visit Rochester, New York, in person.

Enlarge / Currently, the only way for researchers to access these games in the Strong Museum’s collection is to visit Rochester, New York, in person.

During the recent Copyright Office hearing, industry lawyer Robert Rothstein tried to argue that this amounts to more of a “travel problem” than a legal problem that requires new rule-making. But NYU professor Laine Nooney argued back that the need for travel represents “a significant financial and logistical impediment to doing research.”

For Nooney, getting from New York City to the Strong Museum in Rochester would require a five- to six-hour drive “on a good day,” they said, as well as overnight accommodations for any research that’s going to take more than a small part of one day. Because of this, Nooney has only been able to access the Strong collection twice in her career. For researchers who live farther afield—or for grad students and researchers who might not have as much funding—even a single research visit to the Strong might be out of reach.

“You don’t go there just to play a game for a couple of hours,” Nooney said. “Frankly my colleagues in literary studies or film history have pretty routine and regular access to digitized versions of the things they study… These impediments are real and significant and they do impede research in ways that are not equitable compared to our colleagues in other disciplines.”

Limited access

Lawyer Kendra Albert.

Enlarge / Lawyer Kendra Albert.

During the hearing, lawyer Kendra Albert said the preservationists had proposed the idea of human review of requests for remote access to “strike a compromise” between “concerns of the ESA and the need for flexibility that we’ve emphasized on behalf of preservation institutions.” They compared the proposed system to the one already used to grant access for libraries’ “special collections,” which are not made widely available to all members of the public.

But while preservation institutions may want to provide limited scholarly access, Englund argued that “out in the real world, people want to preserve access in order to play games for fun.” He pointed to public comments made to the Copyright Office from “individual commenters [who] are very interested in playing games recreationally” as evidence that some will want to exploit this kind of system.

Even if an “Ivy League” library would be responsible with a proposed DMCA exemption, Englund worried that less scrupulous organizations might simply provide an online “checkbox” for members of the public who could easily lie about their interest in “scholarly play.” If a human reviewed that checkbox affirmation, it could provide a legal loophole to widespread access to an unlimited online arcade, Englund argued.

Will any restrictions be enough?

VGHF Library Director Phil Salvador.

Enlarge / VGHF Library Director Phil Salvador.

Phil Salvador of the Video Game History Foundation said that Englund’s concern about this score was overblown. “Building a video game collection is a specialized skill that most libraries do not have the human labor to do, or the expertise, or the resources, or even the interest,” he said.

Salvador estimated that the number of institutions capable of building a physical collection of historical games is in the “single digits.” And that’s before you account for the significant resources needed to provide remote access to those collections; Rhizome Preservation Director Dragan Espenschied said it costs their organization “thousands of dollars a month” to run the sophisticated cloud-based emulation infrastructure needed for a few hundred users to access their Emulation as a Service art archives and gaming retrospectives.

Salvador also made reference to last year’s VGHF study that found a whopping 87 percent of games ever released are out of print, making it difficult for researchers to get access to huge swathes of video game history without institutional help. And the games of most interest to researchers are less likely to have had modern re-releases since they tend to be the “more primitive” early games with “less popular appeal,” Salvador said.

The Copyright Office is expected to rule on the preservation community’s proposed exemption later this year. But for the moment, there is some frustration that the industry has not been at all receptive to the significant compromises the preservation community feels it has made on these potential concerns.

“None of that is ever going to be sufficient to reassure these rights holders that it will not cause harm,” Albert said at the hearing. “If we’re talking about practical realities, I really want to emphasize the fact that proponents have continually proposed compromises that allow preservation institutions to provide the kind of access that is necessary for researchers. It’s not clear to me that it will ever be enough.”

Can an online library of classic video games ever be legal? Read More »

publisher:-openai’s-gpt-store-bots-are-illegally-scraping-our-textbooks

Publisher: OpenAI’s GPT Store bots are illegally scraping our textbooks

OpenAI logo

For the past few months, Morten Blichfeldt Andersen has spent many hours scouring OpenAI’s GPT Store. Since it launched in January, the marketplace for bespoke bots has filled up with a deep bench of useful and sometimes quirky AI tools. Cartoon generators spin up New Yorker–style illustrations and vivid anime stills. Programming and writing assistants offer shortcuts for crafting code and prose. There’s also a color analysis bot, a spider identifier, and a dating coach called RizzGPT. Yet Blichfeldt Andersen is hunting only for one very specific type of bot: Those built on his employer’s copyright-protected textbooks without permission.

Blichfeldt Andersen is publishing director at Praxis, a Danish textbook purveyor. The company has been embracing AI and created its own custom chatbots. But it is currently engaged in a game of whack-a-mole in the GPT Store, and Blichfeldt Andersen is the man holding the mallet.

“I’ve been personally searching for infringements and reporting them,” Blichfeldt Andersen says. “They just keep coming up.” He suspects the culprits are primarily young people uploading material from textbooks to create custom bots to share with classmates—and that he has uncovered only a tiny fraction of the infringing bots in the GPT Store. “Tip of the iceberg,” Blichfeldt Andersen says.

It is easy to find bots in the GPT Store whose descriptions suggest they might be tapping copyrighted content in some way, as Techcrunch noted in a recent article claiming OpenAI’s store was overrun with “spam.” Using copyrighted material without permission is permissible in some contexts but in others rightsholders can take legal action. WIRED found a GPT called Westeros Writer that claims to “write like George R.R. Martin,” the creator of Game of Thrones. Another, Voice of Atwood, claims to imitate the writer Margaret Atwood. Yet another, Write Like Stephen, is intended to emulate Stephen King.

When WIRED tried to trick the King bot into revealing the “system prompt” that tunes its responses, the output suggested it had access to King’s memoir On Writing. Write Like Stephen was able to reproduce passages from the book verbatim on demand, even noting which page the material came from. (WIRED could not make contact with the bot’s developer, because it did not provide an email address, phone number, or external social profile.)

OpenAI spokesperson Kayla Wood says it responds to takedown requests against GPTs made with copyrighted content but declined to answer WIRED’s questions about how frequently it fulfills such requests. She also says the company proactively looks for problem GPTs. “We use a combination of automated systems, human review, and user reports to find and assess GPTs that potentially violate our policies, including the use of content from third parties without necessary permission,” Wood says.

New disputes

The GPT store’s copyright problem could add to OpenAI’s existing legal headaches. The company is facing a number of high-profile lawsuits alleging copyright infringement, including one brought by The New York Times and several brought by different groups of fiction and nonfiction authors, including big names like George R.R. Martin.

Chatbots offered in OpenAI’s GPT Store are based on the same technology as its own ChatGPT but are created by outside developers for specific functions. To tailor their bot, a developer can upload extra information that it can tap to augment the knowledge baked into OpenAI’s technology. The process of consulting this additional information to respond to a person’s queries is called retrieval-augmented generation, or RAG. Blichfeldt Andersen is convinced that the RAG files behind the bots in the GPT Store are a hotbed of copyrighted materials uploaded without permission.

Publisher: OpenAI’s GPT Store bots are illegally scraping our textbooks Read More »

google-balks-at-$270m-fine-after-training-ai-on-french-news-sites’-content

Google balks at $270M fine after training AI on French news sites’ content

Google balks at $270M fine after training AI on French news sites’ content

Google has agreed to pay 250 million euros (about $273 million) to settle a dispute in France after breaching years-old commitments to inform and pay French news publishers when referencing and displaying content in both search results and when training Google’s AI-powered chatbot, Gemini.

According to France’s competition watchdog, the Autorité de la Concurrence (ADLC), Google dodged many commitments to deal with publishers fairly. Most recently, it never notified publishers or the ADLC before training Gemini (initially launched as Bard) on publishers’ content or displaying content in Gemini outputs. Google also waited until September 28, 2023, to introduce easy options for publishers to opt out, which made it impossible for publishers to negotiate fair deals for that content, the ADLC found.

“Until this date, press agencies and publishers wanting to opt out of this use had to insert an instruction opposing any crawling of their content by Google, including on the Search, Discover and Google News services,” the ADLC noted, warning that “in the future, the Autorité will be particularly attentive as regards the effectiveness of opt-out systems implemented by Google.”

To address breaches of four out of seven commitments in France—which the ADLC imposed in 2022 for a period of five years to “benefit” publishers by ensuring Google’s ongoing negotiations with them were “balanced”—Google has agreed to “a series of corrective measures,” the ADLC said.

Google is not happy with the fine, which it described as “not proportionate” partly because the fine “doesn’t sufficiently take into account the efforts we have made to answer and resolve the concerns raised—in an environment where it’s very hard to set a course because we can’t predict which way the wind will blow next.”

According to Google, regulators everywhere need to clearly define fair use of content when developing search tools and AI models, so that search companies and AI makers always know “whom we are paying for what.” Currently in France, Google contends, the scope of Google’s commitments has shifted from just general news publishers to now also include specialist publications and listings and comparison sites.

The ADLC agreed that “the question of whether the use of press publications as part of an artificial intelligence service qualifies for protection under related rights regulations has not yet been settled,” but noted that “at the very least,” Google was required to “inform publishers of the use of their content for their Bard software.”

Regarding Bard/Gemini, Google said that it “voluntarily introduced a new technical solution called Google-Extended to make it easier for rights holders to opt out of Gemini without impact on their presence in Search.” It has now also committed to better explain to publishers both “how our products based on generative AI work and how ‘Opt Out’ works.”

Google said that it agreed to the settlement “because it’s time to move on” and “focus on the larger goal of sustainable approaches to connecting people with quality content and on working constructively with French publishers.”

“Today’s fine relates mostly to [a] disagreement about how much value Google derives from news content,” Google’s blog said, claiming that “a lack of clear regulatory guidance and repeated enforcement actions have made it hard to navigate negotiations with publishers, or plan how we invest in news in France in the future.”

What changes did Google agree to make?

Google defended its position as “the first and only platform to have signed significant licensing agreements” in France, benefiting 280 French press publishers and “covering more than 450 publications.”

With these publishers, the ADLC found that Google breached requirements to “negotiate in good faith based on transparent, objective, and non-discriminatory criteria,” to consistently “make a remuneration offer” within three months of a publisher’s request, and to provide information for publishers to “transparently assess their remuneration.”

Google also breached commitments to “inform editors and press agencies of the use of their content by its service Bard” and of Google’s decision to link “the use of press agencies’ and publishers’ content by its artificial intelligence service to the display of protected content on services such as Search, Discover and News.”

Regarding negotiations, the ADLC found that Google not only failed to be transparent with publishers about remuneration, but also failed to keep the ADLC informed of information necessary to monitor whether Google was honoring its commitments to fairly pay publishers. Partly “to guarantee better communication,” Google has agreed to appoint a French-speaking representative in its Paris office, along with other steps the ADLC recommended.

According to the ADLC’s announcement (translated from French), Google seemingly acted sketchy in negotiations by not meeting non-discrimination criteria—and unfavorably treating publishers in different situations identically—and by not mentioning “all the services that could generate revenues for the negotiating party.”

“According to the Autorité, not taking into account differences in attractiveness between content does not allow for an accurate reflection of the contribution of each press agency and publisher to Google’s revenues,” the ADLC said.

Also problematically, Google established a minimum threshold of 100 euros for remuneration that it has now agreed to drop.

This threshold, “in its very principle, introduces discrimination between publishers that, below a certain threshold, are all arbitrarily assigned zero remuneration, regardless of their respective situations,” the ADLC found.

Google balks at $270M fine after training AI on French news sites’ content Read More »

us-government-agencies-demand-fixable-ice-cream-machines

US government agencies demand fixable ice cream machines

I scream, you scream, we all scream for 1201(c)3 exemptions —

McFlurries are a notable part of petition for commercial and industrial repairs.

Taylor ice cream machine, with churning spindle removed by hand.

Enlarge / Taylor’s C709 Soft Serve Freezer isn’t so much mechanically complicated as it is a software and diagnostic trap for anyone without authorized access.

Many devices have been made difficult or financially nonviable to repair, whether by design or because of a lack of parts, manuals, or specialty tools. Machines that make ice cream, however, seem to have a special place in the hearts of lawmakers. Those machines are often broken and locked down for only the most profitable repairs.

The Federal Trade Commission and the antitrust division of the Department of Justice have asked the US Copyright Office (PDF) to exempt “commercial soft serve machines” from the anti-circumvention rules of Section 1201 of the Digital Millennium Copyright Act (DMCA). The governing bodies also submitted proprietary diagnostic kits, programmable logic controllers, and enterprise IT devices for DMCA exemptions.

“In each case, an exemption would give users more choices for third-party and self-repair and would likely lead to cost savings and a better return on investment in commercial and industrial equipment,” the joint comment states. Those markets would also see greater competition in the repair market, and companies would be prevented from using DMCA laws to enforce monopolies on repair, according to the comment.

The joint comment builds upon a petition filed by repair vendor and advocate iFixit and interest group Public Knowledge, which advocated for broad reforms while keeping a relatable, ingestible example at its center. McDonald’s soft serve ice cream machines, which are famously frequently broken, are supplied by industrial vendor Taylor. Taylor’s C709 Soft Serve Freezer requires lengthy, finicky warm-up and cleaning cycles, produces obtuse error codes, and, perhaps not coincidentally, costs $350 per 15 minutes of service for a Taylor technician to fix. iFixit tore down such a machine, confirming the lengthy process between plugging in and soft serving.

After one company built a Raspberry Pi-powered device, the Kytch, that could provide better diagnostics and insights, Taylor moved to ban franchisees from installing the device, then offered up its own competing product. Kytch has sued Taylor for $900 million in a case that is still pending.

Beyond ice cream, the petitions to the Copyright Office would provide more broad exemptions for industrial and commercial repairs that require some kind of workaround, decryption, or other software tinkering. Going past technological protection measures (TPMs) was made illegal by the 1998 DMCA, which was put in place largely because of the concerns of media firms facing what they considered rampant piracy.

Every three years, the Copyright Office allows for petitions to exempt certain exceptions to DMCA violations (and renew prior exemptions). Repair advocates have won exemptions for farm equipment repair, video game consoles, cars, and certain medical gear. The exemption is often granted for device fixing if a repair person can work past its locks, but not for the distribution of tools that would make such a repair far easier. The esoteric nature of such “release valve” offerings has led groups like the EFF to push for the DMCA’s abolishment.

DMCA exemptions occur on a parallel track to state right-to-repair bills and broader federal action. President Biden issued an executive order that included a push for repair reforms. The FTC has issued studies that call out unnecessary repair restrictions and has taken action against firms like Harley-Davidson, Westinghouse, and grill maker Weber for tying warranties to an authorized repair service.

Disclosure: Kevin Purdy previously worked for iFixit. He has no financial ties to the company.

US government agencies demand fixable ice cream machines Read More »