ai crawler

cloudflare-wants-google-to-change-its-ai-search-crawling-google-likely-won’t.

Cloudflare wants Google to change its AI search crawling. Google likely won’t.

Ars could not immediately find any legislation that seemed to match Prince’s description, and Cloudflare did not respond to Ars’ request to comment. Passing tech laws is notoriously hard, though, partly because technology keeps advancing as policy debates drag on, and challenges with regulating artificial intelligence are an obvious example of that pattern today.

Google declined Ars’ request to confirm whether talks were underway or if the company was open to separating its crawlers.

Although Cloudflare singled out Google, other search engines that view AI search features as part of their search products also use the same bots for training as they do for search indexing. It seems likely that Cloudflare’s proposed legislation would face resistance from tech companies in a similar position to Google, as The Wall Street Journal reported that the tech companies “have few incentives to work with intermediaries.”

Additionally, Cloudflare’s initiative faces criticism from those who “worry that academic research, security scans, and other types of benign web crawling will get elbowed out of websites as barriers are built around more sites” through Cloudflare’s blocks and paywalls, the WSJ reported. Cloudflare’s system could also threaten web projects like The Internet Archive, which notably played a crucial role in helping track data deleted from government websites after Donald Trump took office.

Among commenters discussing Cloudflare’s claims about Google on Search Engine Round Table, one user suggested Cloudflare may risk a lawsuit or other penalties from Google for poking the bear.

Ars will continue monitoring for updates on Cloudflare’s attempts to get Google on board with its plan.

Cloudflare wants Google to change its AI search crawling. Google likely won’t. Read More »

pay-up-or-stop-scraping:-cloudflare-program-charges-bots-for-each-crawl

Pay up or stop scraping: Cloudflare program charges bots for each crawl

“Imagine asking your favorite deep research program to help you synthesize the latest cancer research or a legal brief, or just help you find the best restaurant in Soho—and then giving that agent a budget to spend to acquire the best and most relevant content,” Cloudflare said, promising that “we enable a future where intelligent agents can programmatically negotiate access to digital resources.”

AI crawlers now blocked by default

Cloudflare’s announcement comes after rolling out a feature last September, allowing website owners to block AI crawlers in a single click. According to Cloudflare, over 1 million customers chose to block AI crawlers, signaling that people want more control over their content at a time when Cloudflare observed that writing instructions for AI crawlers in robots.txt files was widely “underutilized.”

To protect more customers moving forward, any new customers (including anyone on a free plan) who sign up for Cloudflare services will have their domains, by default, set to block all known AI crawlers.

This marks Cloudflare’s transition away from the dreaded opt-out models of AI scraping to a permission-based model, which a Cloudflare spokesperson told Ars is expected to “fundamentally change how AI companies access web content going forward.”

In a world where some website owners have grown sick and tired of attempting and failing to block AI scraping through robots.txt—including some trapping AI crawlers in tarpits to punish them for ignoring robots.txt—Cloudflare’s feature allows users to choose granular settings to prevent blocks on AI bots from impacting bots that drive search engine traffic. That’s critical for small content creators who want their sites to still be discoverable but not digested by AI bots.

“AI crawlers collect content like text, articles, and images to generate answers, without sending visitors to the original source—depriving content creators of revenue, and the satisfaction of knowing someone is reading their content,” Cloudflare’s blog said. “If the incentive to create original, quality content disappears, society ends up losing, and the future of the Internet is at risk.”

Disclosure: Condé Nast, which owns Ars Technica, is a partner involved in Cloudflare’s beta test.

This story was corrected on July 1 to remove publishers incorrectly listed as participating in Cloudflare’s pay-per-crawl beta.

Pay up or stop scraping: Cloudflare program charges bots for each crawl Read More »