Cloudflare calls out Perplexity for hiding 'crawling activity' as AI bot scrapes websites that explicitly disallow it, Perplexity responds by calling them 'more flair than cloud'

Perplexity AI searchbox on a purple gradient
(Image credit: Perplexity)

It's AI versus the internet as Cloudflare and Perplexity have a public falling out over the 'stealth crawling' of restricted websites. The disagreement has spiralled to name calling, even, as Perplexity snaps back at Cloudflare calling it "more flair than cloud", which isn't quite the burn they think it is.

Just last month, internet delivery network and cybersecurity company Cloudflare announced it would be blocking AI scrapers from getting access to websites using its service without permission. Now, it seems like some crawlers are getting around through covert means, according to Cloudflare.

In a blog post (via TechCrunch) titled "Perplexity is using stealth, undeclared crawlers to evade website no-crawl directives", Cloudflare, which handles around 20% of global web traffic, goes into the specifics of how exactly it spotted this problem and why it has de-listed Perplexity as a verified, trusted bot.

Customers first complained that Perplexity was getting around files and rules specifically set up to block crawlers. To test these complaints, Cloudflare created brand new domains that weren't indexed by any search engines "nor made publicly accessible in any discoverable way." These domains used the robots.txt file with explicit rules to stop any bots. Cloudflare then asked Perplexity AI questions about those specific domains.

As these domains weren't indexed or made discoverable, Perplexity would have to access the site in order to provide information for queries. Despite roadblocks Cloudflare set up, Perplexity allegedly still gave information on crawled sites.

Interestingly, Cloudflare reportedly observed attempts to get content not only from the bot that was blocked but from alleged stealth agents, impersonating Google Chrome on macOS. The undisclosed bot also reportedly used multiple IP addresses that aren't in Perplexity's declared IP range.

In response, Perplexity claims Cloudflare either misattributed 3-6m daily requests from BrowserBase, "a third-party cloud browser service that Perplexity only occasionally uses for highly specialized tasks (less than 45,000 daily request)", or that it "needed a clever publicity moment and we—their own customer—happened to be a useful name to get them one."

Perplexity published its own blog post retort, in part, to Cloudflare's investigation, claiming that "Modern AI assistants work fundamentally differently from traditional web crawling." It argues that these bots aren't scraping the data; they are answering queries on the fly without just retrieving that information from a database.

"When companies like Cloudflare mischaracterize user-driven AI assistants as malicious bots," the post reads, "they're arguing that any automated tool serving users should be suspect.

"This controversy reveals that Cloudflare's systems are fundamentally inadequate for distinguishing between legitimate AI assistants and actual threats… The bluster around this issue also reveals that Cloudflare’s leadership is either dangerously misinformed on the basics of AI, or simply more flair than cloud."

Perplexity's statement claims not only that Perplexity gets all data in real time, rather than from a database, but that all other agentic AI does. It therefore claims there's nothing malicious about it, and it's not out of the norm for agentic AI.

Cloudflare's report, however, seems less about agentic AI being specifically malicious, more about the workarounds and obfuscated tactics it's allegedly noted being used by Perplexity itself. Cloudflare claims to have run these same tests with ChatGPT and observed no attempts to get around crawl blocks. This includes OpenAI's ChatGPT Agent, which is agentic AI.

ChatGPT seems to be fine being blocked by Cloudflare in a way that Perplexity is not.

Though there does appear to be a difference between the AI used to answer those queries and the data sets that train those AI, the response feels semantic in nature. When Cloudflare announced that users have the option to choose if an AI can scrape their data, they are also declaring that those who don't like the use of AI don't have to subject their sites to it. Scrape is a technical term, but to some customers, it is also just a declaration that AI in some form has interacted with their domain without their consent.

Cloudflare ends its report saying: "We expected a change in bot and crawler behavior based on these new features, and we expect that the techniques bot operators use to evade detection will continue to evolve. Once this post is live the behavior we saw will almost certainly change, and the methods we use to stop them will keep evolving as well."

Cloudflare says it is committed to giving its users the tools to block these bots, even if their tactics become more sneaky, and it is continuing to standardise extensions "to establish clear and measurable principles that well-meaning bot operators should abide by."

Razer Blade 16 gaming laptop
Best gaming rigs 2025

👉Check out our list of guides👈

1. Best gaming laptop: Razer Blade 16

2. Best gaming PC: HP Omen 35L

3. Best handheld gaming PC: Lenovo Legion Go S SteamOS ed.

4. Best mini PC: Minisforum AtomMan G7 PT

5. Best VR headset: Meta Quest 3

TOPICS
James Bentley
Hardware writer

James is a more recent PC gaming convert, often admiring graphics cards, cases, and motherboards from afar. It was not until 2019, after just finishing a degree in law and media, that they decided to throw out the last few years of education, build their PC, and start writing about gaming instead. In that time, he has covered the latest doodads, contraptions, and gismos, and loved every second of it. Hey, it’s better than writing case briefs.

You must confirm your public display name before commenting

Please logout and then login again, you will then be prompted to enter your display name.