Skip to content

Web scraping by artificial intelligence is now prevented by Cloudflare by default settings.

Web crawlers powered by artificial intelligence are now routinely blocked by Cloudflare unless explicit authorization is obtained from site administrators.

Web scraping via AI is now thwarted by default by Cloudflare's protective measures.
Web scraping via AI is now thwarted by default by Cloudflare's protective measures.

Web scraping by artificial intelligence is now prevented by Cloudflare by default settings.

In the rapidly evolving landscape of artificial intelligence (AI), a significant change is underway as more GenAI vendors grapple with the reality of paying a fair price for high-quality training data while maintaining profitability.

This shift is reflected in the new policy introduced by Cloudflare, a prominent web infrastructure and security company. Under this policy, companies with newly registered domains using Cloudflare's services worldwide are required to explicitly allow AI web crawlers, such as those from OpenAI, to access content. Previously, access was generally granted by default.

The updated policy also introduces a "Pay Per Crawl" program for select publishers. This allows them to set pricing terms for AI scrapers, offering a potential new revenue stream for content creators. Existing domains are not automatically blocked, but the policy underscores the need for a more structured approach to web scraping.

The legality of web scraping has long been a murky area, with loosely enforced rules such as the robots.txt file serving as the primary guide. However, developments in this field highlight the gap between fast-moving technologies and slower regulatory systems. In May 2025, Irish and German regulators declined to block Meta from using Facebook and Instagram data, signalling a potential shift in attitudes towards data usage.

The competition from China may also play a role in this evolution. With many Western GenAI companies facing economic uncertainty, some may choose to exit the business. This could lead to a power shift in the AI industry.

However, it's important to note that in some jurisdictions, a deliberate bypass of anti-bot protection and massive data scraping may constitute a criminal offense. Breach of contract claims, not copyright, could pose the most serious legal threat to GenAI companies.

Cloudflare CEO, Matthew Prince, has emphasised the need for publishers to have control and a new economic model that benefits everyone. As the web scraping landscape continues to evolve, it's clear that a more structured, fair, and legal approach is necessary to ensure a sustainable future for both AI companies and content creators.

Read also:

Latest