data scraping

Elon Musk’s X can’t invent its own copyright law, judge says

bright data, copyright act, copyright law, data scraping, elon musk, Policy, Twitter, X, x corp / Mike M. / May 11, 2024

Who owns X data? Everyone but X —

Judge rules copyright law governs public data scraping, not X’s terms.

Ashley Belanger – May 10, 2024 9: 20 pm UTC

A US district judge William Alsup has dismissed Elon Musk’s X Corp’s lawsuit against Bright Data, a data-scraping company accused of improperly accessing X (formerly Twitter) systems and violating both X terms and state laws when scraping and selling data.

X sued Bright Data to stop the company from scraping and selling X data to academic institutes and businesses, including Fortune 500 companies.

According to Alsup, X failed to state a claim while arguing that companies like Bright Data should have to pay X to access public data posted by X users.

“To the extent the claims are based on access to systems, they fail because X Corp. has alleged no more than threadbare recitals,” parroting laws and findings in other cases without providing any supporting evidence, Alsup wrote. “To the extent the claims are based on scraping and selling of data, they fail because they are preempted by federal law,” specifically standing as an “obstacle to the accomplishment and execution of” the Copyright Act.

The judge found that X Corp’s argument exposed a tension between the platform’s desire to control user data while also enjoying the safe harbor of Section 230 of the Communications Decency Act, which allows X to avoid liability for third-party content. If X owned the data, it could perhaps argue it has exclusive rights to control the data, but then it wouldn’t have safe harbor.

“X Corp. wants it both ways: to keep its safe harbors yet exercise a copyright owner’s right to exclude, wresting fees from those who wish to extract and copy X users’ content,” Alsup wrote.

If X got its way, Alsup warned, “X Corp. would entrench its own private copyright system that rivals, even conflicts with, the actual copyright system enacted by Congress” and “yank into its private domain and hold for sale information open to all, exercising a copyright owner’s right to exclude where it has no such right.”

That “would upend the careful balance Congress struck between what copyright owners own and do not own,” Alsup wrote, potentially shrinking the public domain.

“Applying general principles, this order concludes that the extent to which public data may be freely copied from social media platforms, even under the banner of scraping, should generally be governed by the Copyright Act, not by conflicting, ubiquitous terms,” Alsup wrote.

Bright Data CEO Or Lenchner said in a statement provided to Ars that Alsup’s decision had “profound implications in business, research, training of AI models, and beyond.”

“Bright Data has proven that ethical and transparent scraping practices for legitimate business use and social good initiatives are legally sound,” Lenchner said. “Companies that try to control user data intended for public consumption will not win this legal battle.”

Alsup pointed out that X’s lawsuit was “not looking to protect X users’ privacy” but rather to block Bright Data from interfering with its “own sale of its data through a tiered subscription service.”

“X Corp. is happy to allow the extraction and copying of X users’ content so long as it gets paid,” Alsup wrote.

In a sea of vague claims that scraping is “unfair,” perhaps most deficient in X’s complaint, Alsup suggested, was X’s failure to allege that Bright Data’s scraping impaired its services or that X suffered any damages.

“There are no allegations of servers harmed or identities misrepresented,” Alsup wrote. “Additionally, there are no allegations of any damage resulting from automated or unauthorized access.”

X will be allowed to amend its complaint and appeal. The case may be strengthened if X can show evidence of damages or prove that the scraping overburdened X or otherwise deprived X users of their use of the platform in a way that could damage X’s reputation.

But as it currently stands, X’s arguments in many ways appear rather “bare,” Alsup wrote, while its terms of service make crystal clear to users that “[w]hat’s yours is yours—you own your Content.”

By attempting to exclude Bright Data from accessing public X posts owned by X users, X also nearly “obliterated” the “fair use” provision of the Copyright Act, “flouting” Congress’ intent in passing the law, Alsup wrote.

“Only by receiving permission and paying X Corp. could Bright Data, its customers, and other X users freely reproduce, adapt, distribute, and display what might (or might not) be available for taking and selling as fair use,” Alsup wrote. “Thus, Bright Data, its customers, and other X users who wanted to make fair use of copyrighted content would not be able to do so.”

A win for X could have had dire consequences for the Internet, Alsup suggested. In dismissing the complaint, Alsup cited an appeals court ruling “that giving social media companies “free rein to decide, on any basis, who can collect and use data—data that the companies do not own, that they otherwise make publicly available to viewers, and that the companies themselves collect and use—risks the possible creation of information monopolies that would disserve the public interest.”

Because that outcome was averted, Lenchner is celebrating Bright Data’s win.

“Bright Data’s victory over X makes it clear to the world that public information on the web belongs to all of us, and any attempt to deny the public access will fail,” Lenchner said.

In 2023, Bright Data won a similar lawsuit lobbed by Meta over scraping public Facebook and Instagram data. These lawsuits, Lenchner alleged, “are used as a monetary weapon to discourage collecting public data from sites, so conglomerates can hoard user-generated public data.”

“Courts recognize this and the risks it poses of information monopolies and ownership of the Internet,” Lenchner said.

X did not respond to Ars’ request to comment.

Elon Musk’s X can’t invent its own copyright law, judge says Read More »

Billions of public Discord messages may be sold through a scraping service

chat, Cryptocurrency, data scraping, discord, discord servers, Policy, privacy, scraping, Security, spy pet, Tech, Web Scraping / Mike M. / April 18, 2024

Discord chat-scraping service —

Cross-server tracking suggests a new understanding of “public” chat servers.

Kevin Purdy – Apr 17, 2024 7: 42 pm UTC

Getty Images

It’s easy to get the impression that Discord chat messages are ephemeral, especially across different public servers, where lines fly upward at a near-unreadable pace. But someone claims to be catching and compiling that data and is offering packages that can track more than 600 million users across more than 14,000 servers.

Joseph Cox at 404 Media confirmed that Spy Pet, a service that sells access to a database of purportedly 3 billion Discord messages, offers data “credits” to customers who pay in bitcoin, ethereum, or other cryptocurrency. Searching individual users will reveal the servers that Spy Pet can track them across, a raw and exportable table of their messages, and connected accounts, such as GitHub. Ominously, Spy Pet lists more than 86,000 other servers in which it has “no bots,” but “we know it exists.”

An example of Spy Pet’s service from its website. Shown are a user’s nicknames, connected accounts, banner image, server memberships, and messages across those servers tracked by Spy Pet.

Spy Pet
Statistics on servers, users, and messages purportedly logged by Spy Pet.

Spy Pet
An example image of the publicly available data gathered by Spy Pet, in this example for a public server for the game Deep Rock Galactic: Survivor.

Spy Pet

As Cox notes, Discord doesn’t make messages inside server channels, like blog posts or unlocked social media feeds, easy to publicly access and search. But many Discord users many not expect their messages, server memberships, bans, or other data to be grabbed by a bot, compiled, and sold to anybody wishing to pin them all on a particular user. 404 Media confirmed the service’s function with multiple user examples. Private messages are not mentioned by Spy Pet and are presumably still secure.

Spy Pet openly asks those training AI models, or “federal agents looking for a new source of intel,” to contact them for deals. As noted by 404 Media and confirmed by Ars, clicking on the “Request Removal” link plays a clip of J. Jonah Jameson from Spider-Man (the Tobey Maguire/Sam Raimi version) laughing at the idea of advance payment before an abrupt “You’re serious?” Users of Spy Pet, however, are assured of “secure and confidential” searches, with random usernames.

This author found nearly every public Discord he had ever dropped into for research or reporting in Spy Pet’s server list. Those who haven’t paid for message access can only see fairly benign public-facing elements, like stickers, emojis, and charted member totals over time. But as an indication of the reach of Spy Pet’s scraping, it’s an effective warning, or enticement, depending on your goals.

Ars has reached out to Spy Pet for comment and will update this post if we receive a response. A Discord spokesperson told Ars that the company is investigating whether Spy Pet violated its terms of service and community guidelines. It will take “appropriate steps to enforce our policies,” the company said, and could not provide further comment.

Billions of public Discord messages may be sold through a scraping service Read More »