AI startup Perplexity has been found to be crawling and scraping content from websites that explicitly stated they do not want to be scraped, as reported by internet infrastructure provider Cloudflare.
Cloudflare’s Research Findings
According to Cloudflare’s research published on Monday, Perplexity was observed ignoring blocks and attempting to hide its crawling and scraping activities. The accusations claim that Perplexity obscured its identity while trying to scrape web pages in order to bypass the websites’ preferences.
Circumventing Blocks
Perplexity is accused of circumventing blocks by changing its bots’ “user agent” and autonomous system networks (ASN). Cloudflare observed this activity across tens of thousands of domains and millions of requests per day, using a combination of machine learning and network signals to identify the crawler.
Response and Actions Taken
Perplexity’s spokesperson dismissed Cloudflare’s claims as a “sales pitch” and denied that the bot mentioned in the blog post belonged to them. Despite this, Cloudflare took action by delisting Perplexity’s bots and implementing new techniques to block them.
Cloudflare’s Stance on AI Crawlers
Cloudflare has recently taken a public stance against AI crawlers, launching a marketplace to allow website owners to charge AI scrapers for visiting their sites. The company’s CEO, Matthew Prince, has expressed concerns about AI disrupting the internet’s business model, particularly for publishers. Additionally, Cloudflare has released a free tool to prevent bots from scraping websites to train AI.
Previous Allegations Against Perplexity
This is not the first time Perplexity has been accused of unauthorized scraping. Last year, news outlets, including Wired, accused Perplexity of plagiarizing their content. During an interview at the Disrupt 2024 conference, Perplexity’s CEO was unable to provide a clear definition of plagiarism when questioned by TechCrunch.
