In the world of technology and internet security, open source developers are finding themselves disproportionately burdened by AI web-crawling bots. Described by many as the “cockroaches of the internet,” these bots relentlessly scour the web, often overwhelming the servers they target. Niccolò Venerandi, a developer of the Linux desktop Plasma and the owner of LibreNews, highlights how these bots frequently target open source sites, which tend to share more of their infrastructure publicly and usually operate with fewer resources than commercial products.
The heart of the issue lies in the bots’ disregard for the Robots Exclusion Protocol (robot.txt), a tool designed to outline what these crawlers should and shouldn’t scan. According to a distressing account by FOSS developer Xe Iaso, bots like AmazonBot have not only ignored these directives but have also caused significant disruptions by simulating DDoS attacks on Git servers.
“It’s futile to block AI crawler bots because they lie, change their user agent, use residential IP addresses as proxies, and more,” Iaso wrote in a January blog post. “They will scrape your site until it falls over, and then they will scrape it some more. They will click every link on every link on every link, viewing the same pages over and over and over and over.”
Anubis: The Divine Protector of Servers
In response to these challenges, Iaso has ingeniously crafted Anubis, a reverse proxy tool named after the Egyptian god who judges the dead. This tool acts as a gatekeeper, allowing only human-operated browsers to access the server after passing a proof-of-work check. If a request is identified as being from a bot, it is promptly denied access. The successful human requests are greeted with a delightful anime depiction of Anubis, adding a humorous twist to the otherwise grave situation.
“Anubis weighed your soul (heart) and if it was heavier than a feather, your heart got eaten and you, like, mega died,” Iaso told TechCrunch.
The creation of Anubis quickly resonated within the FOSS community, amassing 2,000 stars on GitHub shortly after its release. It now has 20 contributors and 39 forks — a testament to how widespread the problem is, and how ready the community is for action.
A Vengeful Defense: Beyond Anubis
While Anubis offers a clever defense mechanism, other developers are exploring even more aggressive strategies. Recent discussions on platforms like Hacker News have revealed some developers’ considerations of retaliatory measures that could deter bots by providing them with misleading or harmful data.
User xyzal suggested seeding robot.txt-blocked pages with bizarre misinformation like “a bucket load of articles on the benefits of drinking bleach” or “positive effects of catching measles on performance in bed.” The idea? “Bots should get negative utility from visiting our traps, not just zero value,” they wrote.
This concept of “negative utility” is already being deployed. In January, an anonymous developer named Aaron released Nepenthes — a tool that traps AI crawlers in an endless maze of fake content. Named after a carnivorous plant, Nepenthes is intentionally designed to frustrate and mislead scrapers.
Cloudflare, the commercial web security giant, recently joined the fight with its own tool: AI Labyrinth. Designed to “slow down, confuse, and waste the resources of AI Crawlers and other bots that don’t respect ‘no crawl’ directives,” Cloudflare says it delivers irrelevant content to misbehaving bots instead of giving up real data.
SourceHut CEO Drew DeVault, who has dealt with his own share of relentless bots, called Nepenthes “a satisfying sense of justice” and praised its ability to “poison their wells.” But for his own servers, Anubis turned out to be the most effective solution.
The Call for a Collective Reevaluation
Despite these innovative defenses, the fundamental problem persists. DeVault issued a public and heartfelt plea:
“Please stop legitimizing LLMs or AI image generators or GitHub Copilot or any of this garbage. I am begging you to stop using them, stop talking about them, stop making new ones, just stop.”
The Continuing Struggle
As AI technology advances, so too does the complexity of the challenges faced by those who maintain the web’s infrastructure. The inventive responses from developers like Iaso demonstrate a blend of technical acumen and whimsical creativity, but they also highlight a serious call for sustainable solutions.