Denial

The Wikimedia Foundation, stewards of the finest projects on the web, have written about the hammering their servers are taking from the scraping bots that feed large language models.

Our infrastructure is built to sustain sudden traffic spikes from humans during high-interest events, but the amount of traffic generated by scraper bots is unprecedented and presents growing risks and costs.

Drew DeVault puts it more bluntly, saying Please stop externalizing your costs directly into my face:

Over the past few months, instead of working on our priorities at SourceHut, I have spent anywhere from 20-100% of my time in any given week mitigating hyper-aggressive LLM crawlers at scale.

And no, a robots.txt file doesn’t help.

If you think these crawlers respect robots.txt then you are several assumptions of good faith removed from reality. These bots crawl everything they can find, robots.txt be damned.

Free and open source projects are particularly vulnerable. FOSS infrastructure is under attack by AI companies:

LLM scrapers are taking down FOSS projects’ infrastructure, and it’s getting worse.

You try to do the right thing by making knowledge and tools freely available. This is how you get repaid. AI bots are destroying Open Access:

There’s a war going on on the Internet. AI companies with billions to burn are hard at work destroying the websites of libraries, archives, non-profit organizations, and scholarly publishers, anyone who is working to make quality information universally available on the internet.

My own experience with The Session bears this out.

Ars Technica has a piece on this: Open source devs say AI crawlers dominate traffic, forcing blocks on entire countries .

So does MIT Technology Review: AI crawler wars threaten to make the web more closed for everyone.

When we talk about the unfair practices and harm done by training large language models, we usually talk about it in the past tense: how they were trained on other people’s creative work without permission. But this is an ongoing problem that’s just getting worse.

The worst of the internet is continuously attacking the best of the internet. This is a distributed denial of service attack on the good parts of the World Wide Web.

If you’re using the products powered by these attacks, you’re part of the problem. Don’t pretend it’s cute to ask ChatGPT for something. Don’t pretend it’s somehow being technologically open-minded to continuously search for nails to hit with the latest “AI” hammers.

If you’re going to use generative tools powered by large language models, don’t pretend you don’t know how your sausage is made.

Have you published a response to this? :

Responses

Ethan Marcotte

“When we talk about the unfair practices and harm done by training large language models, we usually talk about it in the past tense: how they were trained on other people’s creative work without permission. But this is an ongoing problem that’s just getting worse.” — @adactio, https://adactio.com/journal/21831

Denial

Juho Vepsäläinen

Do you think the development could threaten the open web? I wonder what the implications will be for content producers.

Amber Weinberg

On @adactio latest post, he said it so well:

“If you’re using the products powered by these attacks, you’re part of the problem. Don’t pretend it’s cute to ask ChatGPT for something. Don’t pretend it’s somehow being technologically open-minded to continuously search for nails to hit with the latest “AI” hammers.”

https://adactio.com/journal/21831

Denial

Amanda CAARSON

Clean-up on Isle 3.

“When we talk about the unfair practices and harm done by training large language models, we usually talk about it in the past tense: how they were trainedon other people’s creative work without permission. But this is an ongoing problem that’s just getting worse. The worst of the internet is continuouslyattacking the best of the internet. This is a distributed denial of service attack on the good parts of the World Wide Web. If you’re using the productspowered by these attacks, you’re part of the problem. Don’t pretend it’s cute to ask ChatGPT for something. Don’t pretend it’s somehow being technologicallyopen-minded to continuously search for nails to hit with the latest “AI” hammers. If you’re going to use generative tools powered by large language models,don’t pretend you don’t know how your sausage is made.”

https://adactio.com/journal/21831

Baldur Bjarnason

@daaain I think it’s risky to assume basic competence from people who believe LLMs are on the cusp of becoming AGI

Aside: it’s also a security issue. It’s easy for an adversary to identify pages in the data set on domains that are about to expire, take those over, and replace trusted pages with pages whose text is designed to tokenise into data set poisoning.

alan :blobfoxheadphones:

I really like this one from Jeremy Keith (@adactio) about AI scrapper bots hammering wikimedia, open source projects, and just generally all the good parts of the web.

“When we talk about the unfair practices and harm done by training large language models, we usually talk about it in the past tense: how they were trained on other people’s creative work without permission. But this is an ongoing problem that’s just getting worse.”

https://adactio.com/journal/21831

Denial

paulfreeman

@adactio: “If you’re using the products powered by these attacks, you’re part of the problem. Don’t pretend it’s cute to ask ChatGPT for something. Don’t pretend it’s somehow being technologically open-minded to continuously search for nails to hit with the latest “AI” hammers”

https://adactio.com/journal/21831

Denial

T.J. Crowder

“When we talk about the unfair practices and harm done by training large language models, we usually talk about it in the past tense: how they were trained on other people’s creative work without permission. But this is an ongoing problem that’s just getting worse.

The worst of the internet is continuously attacking the best of the internet. This is a distributed denial of service attack on the good parts of the World Wide Web.”

https://adactio.com/journal/21831

#LLM #AI

@adactio

ai llm Denial

fluffy

In reply to: Re: Denial

I wrote a bit about this recently, which importantly also includes some information about what you as a website operator can do about it, namely how to look up the CIDR of the abusers’ netblocks and add denial rules into ufw if that’s what you use. I should probably expand it to cover other situations as well since not everyone can run ufw.

# Posted by fluffy on Tuesday, April 8th, 2025 at 6:01pm

Toby

Ha, they deleted the comment

# Posted by Toby on Wednesday, April 9th, 2025 at 8:17am

Toby

@armstrong I’m more than happy to chat about this stuff, so long as you promise not to say that people don’t care about ethics and that the issue with bots doing whatever they want isn’t the bot makers.

# Posted by Toby on Wednesday, April 9th, 2025 at 8:44am

Tom Chadwin

@adactio, via @TheIdOfAlan and @phronetic

“If you’re using the products powered by these attacks, you’re part of the problem. Don’t pretend it’s cute to ask ChatGPT for something. Don’t pretend it’s somehow being technologically open-minded to continuously search for nails to hit with the latest “AI” hammers.

“If you’re going to use generative tools powered by large language models, don’t pretend you don’t know how your sausage is made.”

https://adactio.com/journal/21831

Denial

# Posted by Tom Chadwin on Wednesday, April 9th, 2025 at 9:57am

14 Shares

# Shared by Richard on Monday, April 7th, 2025 at 2:42pm

# Monday, April 7th, 2025 at 2:42pm

# Shared by Baldur Bjarnason on Monday, April 7th, 2025 at 2:42pm

# Shared by Jonathan Stegall on Monday, April 7th, 2025 at 2:42pm

# Shared by David Rodriguez on Monday, April 7th, 2025 at 2:42pm

# Shared by Fyrd on Monday, April 7th, 2025 at 5:39pm

# Shared by Daniel Appelquist on Monday, April 7th, 2025 at 5:39pm

# Shared by Jim Ray on Monday, April 7th, 2025 at 5:39pm

# Monday, April 7th, 2025 at 8:31pm

# Shared by Carlos Espada on Tuesday, April 8th, 2025 at 6:17am

# Shared by blokche on Wednesday, April 9th, 2025 at 12:15pm

# Shared by Rachel Lawson on Wednesday, April 16th, 2025 at 8:16am

# Shared by Daniel on Wednesday, April 16th, 2025 at 8:16am

# Shared by anne gibson on Friday, April 18th, 2025 at 1:36am

20 Likes

# Monday, April 7th, 2025 at 1:25pm

# Liked by Andy on Monday, April 7th, 2025 at 1:25pm

# Liked by Konnor Rogers on Monday, April 7th, 2025 at 1:25pm

# Liked by Chris Shiflett on Monday, April 7th, 2025 at 2:00pm

# Liked by Richard on Monday, April 7th, 2025 at 2:41pm

# Liked by Олекса 🇺🇦 on Monday, April 7th, 2025 at 2:42pm

# Monday, April 7th, 2025 at 2:42pm

# Liked by Baldur Bjarnason on Monday, April 7th, 2025 at 2:42pm

# Liked by Jeff Bradberry on Monday, April 7th, 2025 at 4:37pm

# Liked by Lucid00 on Monday, April 7th, 2025 at 5:08pm

# Liked by Fyrd on Monday, April 7th, 2025 at 5:39pm

# Liked by sylvia 🇨🇦 on Monday, April 7th, 2025 at 5:39pm

# Monday, April 7th, 2025 at 8:31pm

# Liked by Joe Crawford on Monday, April 7th, 2025 at 9:17pm

# Liked by Carlos Espada on Tuesday, April 8th, 2025 at 6:17am

# Liked by Owen Gregory on Tuesday, April 8th, 2025 at 2:02pm

# Liked by Intellog Inc. on Tuesday, April 8th, 2025 at 8:50pm

# Liked by Daniel on Wednesday, April 16th, 2025 at 8:15am

# Liked by Future Ai Store on Wednesday, April 16th, 2025 at 3:03pm

# Liked by anne gibson on Friday, April 18th, 2025 at 1:36am

1 Bookmark

# Bookmarked by Ben Werdmuller on Tuesday, April 8th, 2025 at 4:09pm

Related posts

The meaning of “AI”

Naming things is hard, and sometimes harmful.

Filters

A web by humans, for humans.

Creativity

Thinking about priorities at UX Brighton.

Disclosure

You’re in a desert, you see a tortoise lying on its back, and your call is very important to us.

Reason

Please read Miriam’s latest blog post.

Related links

Vibe code is legacy code | Val Town Blog

When you vibe code, you are incurring tech debt as fast as the LLM can spit it out. Which is why vibe coding is perfect for prototypes and throwaway projects: It’s only legacy code if you have to maintain it!

The worst possible situation is to have a non-programmer vibe code a large project that they intend to maintain. This would be the equivalent of giving a credit card to a child without first explaining the concept of debt.

If you don’t understand the code, your only recourse is to ask AI to fix it for you, which is like paying off credit card debt with another credit card.

Tagged with

Vibe coding and Robocop

The short version of what I want to say is: vibe coding seems to live very squarely in the land of prototypes and toys. Promoting software that’s been built entirely using this method would be akin to sending a hacked weekend prototype to production and expecting it to be stable.

Remy is taking a very sensible approach here:

I’ve used it myself to solve really bespoke problems where the user count is one.

Would I put this out to production: absolutely not.

Tagged with

Keeping up appearances | deadSimpleTech

Looking at LLM usage and promotion as a cultural phenomenon, it has all of the markings of a status game. The material gains from the LLM (which are usually quite marginal) really aren’t why people are doing it: they’re doing it because in many spaces, using ChatGPT and being very optimistic about AI being the “future” raises their social status. It’s important not only to be using it, but to be seen using it and be seen supporting it and telling people who don’t use it that they’re stupid luddites who’ll inevitably be left behind by technology.

Tagged with

In 2025, venture capital can’t pretend everything is fine any more – Pivot to AI

Here is the state of venture capital in early 2025:

  • Venture capital is moribund except AI.
  • AI is moribund except OpenAI.
  • OpenAI is a weird scam that wants to burn money so fast it summons AI God.
  • Nobody can cash out.

Tagged with

What we talk about when we talk about AI — Careful Industries

Technically, AI is a field of computer science that uses advanced methods of computing.

Socially, AI is a set of extractive tools used to concentrate power and wealth.

Tagged with

Previously on this day

7 years ago I wrote Drag’n’drop revisited

An easy accessibility fix, courtesy of my past self.

10 years ago I wrote Accessible progressive disclosure revisited

From buttons to links.

10 years ago I wrote Mistakes on a plane

In which Comic Book Guy critiques in-flight entertainment.

11 years ago I wrote 100 words 016

Day sixteen.

12 years ago I wrote The tragedy of the commons

Digital destruction courtesy of the Brooklyn Museum.

12 years ago I wrote Connections #2

Come along to chat about organisational stuff’n’shit.

15 years ago I wrote Skillful stories

An excellent night of narrative exploration in Brighton.

17 years ago I wrote Inkosaurs

Moving from the denial phase into anger.

18 years ago I wrote Mi.gration

Moving bookmarks.

20 years ago I wrote Further comment

Following up on the comments controversy.

21 years ago I wrote Junk not found

If only this were a server response instead of a message count…

22 years ago I wrote What is Web Design?

"Who are we? Why are we here?"

22 years ago I wrote Beatallica on the brat

Beatallica perform Beatles songs in the style of Metallica.