Denial

April 7th, 2025

The Wikimedia Foundation, stewards of the finest projects on the web, have written about the hammering their servers are taking from the scraping bots that feed large language models.

Our infrastructure is built to sustain sudden traffic spikes from humans during high-interest events, but the amount of traffic generated by scraper bots is unprecedented and presents growing risks and costs.

Drew DeVault puts it more bluntly, saying Please stop externalizing your costs directly into my face:

Over the past few months, instead of working on our priorities at SourceHut, I have spent anywhere from 20-100% of my time in any given week mitigating hyper-aggressive LLM crawlers at scale.

And no, a robots.txt file doesn’t help.

If you think these crawlers respect robots.txt then you are several assumptions of good faith removed from reality. These bots crawl everything they can find, robots.txt be damned.

Free and open source projects are particularly vulnerable. FOSS infrastructure is under attack by AI companies:

LLM scrapers are taking down FOSS projects’ infrastructure, and it’s getting worse.

You try to do the right thing by making knowledge and tools freely available. This is how you get repaid. AI bots are destroying Open Access:

There’s a war going on on the Internet. AI companies with billions to burn are hard at work destroying the websites of libraries, archives, non-profit organizations, and scholarly publishers, anyone who is working to make quality information universally available on the internet.

My own experience with The Session bears this out.

Ars Technica has a piece on this: Open source devs say AI crawlers dominate traffic, forcing blocks on entire countries .

So does MIT Technology Review: AI crawler wars threaten to make the web more closed for everyone.

When we talk about the unfair practices and harm done by training large language models, we usually talk about it in the past tense: how they were trained on other people’s creative work without permission. But this is an ongoing problem that’s just getting worse.

The worst of the internet is continuously attacking the best of the internet. This is a distributed denial of service attack on the good parts of the World Wide Web.

If you’re using the products powered by these attacks, you’re part of the problem. Don’t pretend it’s cute to ask ChatGPT for something. Don’t pretend it’s somehow being technologically open-minded to continuously search for nails to hit with the latest “AI” hammers.

If you’re going to use generative tools powered by large language models, don’t pretend you don’t know how your sausage is made.

« Newer Older »

Responses

Ethan Marcotte

Denial

Ciarán Ferrie

“The worst of the internet is continuously attacking the best of the internet…If you’re using the products powered by these attacks, you’re part of the problem.

If you’re going to use generative tools powered by large language models, don’t pretend you don’t know how your sausage is made.”

Via @adactio

#GenAI #AI #LLM

https://adactio.com/journal/21831

ai genai llm Denial

Juho Vepsäläinen

Do you think the development could threaten the open web? I wonder what the implications will be for content producers.

Amber Weinberg

On @adactio latest post, he said it so well:

“If you’re using the products powered by these attacks, you’re part of the problem. Don’t pretend it’s cute to ask ChatGPT for something. Don’t pretend it’s somehow being technologically open-minded to continuously search for nails to hit with the latest “AI” hammers.”

https://adactio.com/journal/21831

Denial

Cory Dransfeldt :demi:

🔗 Denial via @adactio #Webdev #Ai #Tech

The Wikimedia Foundation, stewards of the finest projects on the web, have written about the hammering their servers are taking from the scraping bots that feed large language models.

https://adactio.com/journal/21831

ai tech webdev Denial

Amanda CAARSON

Clean-up on Isle 3.

“When we talk about the unfair practices and harm done by training large language models, we usually talk about it in the past tense: how they were trainedon other people’s creative work without permission. But this is an ongoing problem that’s just getting worse. The worst of the internet is continuouslyattacking the best of the internet. This is a distributed denial of service attack on the good parts of the World Wide Web. If you’re using the productspowered by these attacks, you’re part of the problem. Don’t pretend it’s cute to ask ChatGPT for something. Don’t pretend it’s somehow being technologicallyopen-minded to continuously search for nails to hit with the latest “AI” hammers. If you’re going to use generative tools powered by large language models,don’t pretend you don’t know how your sausage is made.”

https://adactio.com/journal/21831

Baldur Bjarnason

@daaain I think it’s risky to assume basic competence from people who believe LLMs are on the cusp of becoming AGI

Aside: it’s also a security issue. It’s easy for an adversary to identify pages in the data set on domains that are about to expire, take those over, and replace trusted pages with pages whose text is designed to tokenise into data set poisoning.

Thain

Yet another perfect post by @adactio https://adactio.com/journal/21831

Denial

alan :blobfoxheadphones:

I really like this one from Jeremy Keith (@adactio) about AI scrapper bots hammering wikimedia, open source projects, and just generally all the good parts of the web.

https://adactio.com/journal/21831

Denial

Arpit Agrawal

I just read ‘Denial’ by @adactio.com. Jeremy really holds up a mirror to those of us using generative tools powered by large language models. adactio.com/journal/21831

paulfreeman

@adactio: “If you’re using the products powered by these attacks, you’re part of the problem. Don’t pretend it’s cute to ask ChatGPT for something. Don’t pretend it’s somehow being technologically open-minded to continuously search for nails to hit with the latest “AI” hammers”

https://adactio.com/journal/21831

Denial

T.J. Crowder

The worst of the internet is continuously attacking the best of the internet. This is a distributed denial of service attack on the good parts of the World Wide Web.”

https://adactio.com/journal/21831

#LLM #AI

@adactio

ai llm Denial

fluffy

In reply to: Re: Denial

I wrote a bit about this recently, which importantly also includes some information about what you as a website operator can do about it, namely how to look up the CIDR of the abusers’ netblocks and add denial rules into ufw if that’s what you use. I should probably expand it to cover other situations as well since not everyone can run ufw.

Toby

LinkedIn having a normal one about this post

https://www.linkedin.com/posts/tosbourn_denial-activity-7315271312640278528-AXB5?utm_[…]m=member_desktop&rcm=ACoAABh4K8QB0mG4anNTkb5VtC8d9hojek6O188

Denial | Toby Osbourn

Toby

Ha, they deleted the comment

Toby

@armstrong I’m more than happy to chat about this stuff, so long as you promise not to say that people don’t care about ethics and that the issue with bots doing whatever they want isn’t the bot makers.

Tom Chadwin

@adactio, via @TheIdOfAlan and @phronetic

“If you’re going to use generative tools powered by large language models, don’t pretend you don’t know how your sausage is made.”

https://adactio.com/journal/21831

Denial

bsky.app

Christopher Voigt

I really resonate with this post by @adactio

> If you’re going to use generative tools powered by large language models, don’t pretend you don’t know how your sausage is made.

https://adactio.com/journal/21831

Denial

Via @scottboms Throughlines

Denial

www.designforweb.org

14 Shares

# Shared by Richard on Monday, April 7th, 2025 at 2:42pm

# Monday, April 7th, 2025 at 2:42pm

# Shared by Baldur Bjarnason on Monday, April 7th, 2025 at 2:42pm

# Shared by Jonathan Stegall on Monday, April 7th, 2025 at 2:42pm

# Shared by David Rodriguez on Monday, April 7th, 2025 at 2:42pm

# Shared by Fyrd on Monday, April 7th, 2025 at 5:39pm

# Shared by Daniel Appelquist on Monday, April 7th, 2025 at 5:39pm

# Shared by Jim Ray on Monday, April 7th, 2025 at 5:39pm

# Monday, April 7th, 2025 at 8:31pm

# Shared by Carlos Espada on Tuesday, April 8th, 2025 at 6:17am

# Shared by blokche on Wednesday, April 9th, 2025 at 12:15pm

# Shared by Rachel Lawson on Wednesday, April 16th, 2025 at 8:16am

# Shared by Daniel on Wednesday, April 16th, 2025 at 8:16am

# Shared by anne gibson on Friday, April 18th, 2025 at 1:36am

20 Likes

# Monday, April 7th, 2025 at 1:25pm

# Liked by Andy on Monday, April 7th, 2025 at 1:25pm

# Liked by Konnor Rogers on Monday, April 7th, 2025 at 1:25pm

# Liked by Chris Shiflett on Monday, April 7th, 2025 at 2:00pm

# Liked by Richard on Monday, April 7th, 2025 at 2:41pm

# Liked by Олекса 🇺🇦 on Monday, April 7th, 2025 at 2:42pm

# Monday, April 7th, 2025 at 2:42pm

# Liked by Baldur Bjarnason on Monday, April 7th, 2025 at 2:42pm

# Liked by Jeff Bradberry on Monday, April 7th, 2025 at 4:37pm

# Liked by Lucid00 on Monday, April 7th, 2025 at 5:08pm

# Liked by Fyrd on Monday, April 7th, 2025 at 5:39pm

# Liked by sylvia 🇨🇦 on Monday, April 7th, 2025 at 5:39pm

# Monday, April 7th, 2025 at 8:31pm

# Liked by Joe Crawford on Monday, April 7th, 2025 at 9:17pm

# Liked by Carlos Espada on Tuesday, April 8th, 2025 at 6:17am

# Liked by Owen Gregory on Tuesday, April 8th, 2025 at 2:02pm

# Liked by Intellog Inc. on Tuesday, April 8th, 2025 at 8:50pm

# Liked by Daniel on Wednesday, April 16th, 2025 at 8:15am

# Liked by Future Ai Store on Wednesday, April 16th, 2025 at 3:03pm

# Liked by anne gibson on Friday, April 18th, 2025 at 1:36am

1 Bookmark

# Bookmarked by Ben Werdmuller on Tuesday, April 8th, 2025 at 4:09pm

Related links

Vibe code is legacy code | Val Town Blog

When you vibe code, you are incurring tech debt as fast as the LLM can spit it out. Which is why vibe coding is perfect for prototypes and throwaway projects: It’s only legacy code if you have to maintain it!

The worst possible situation is to have a non-programmer vibe code a large project that they intend to maintain. This would be the equivalent of giving a credit card to a child without first explaining the concept of debt.

If you don’t understand the code, your only recourse is to ask AI to fix it for you, which is like paying off credit card debt with another credit card.

Tuesday, August 5th, 2025 2:03pm

Tagged with ai machinelearning language models programming applications coding prototypes prototyping code legacy techdebt

Vibe coding and Robocop

The short version of what I want to say is: vibe coding seems to live very squarely in the land of prototypes and toys. Promoting software that’s been built entirely using this method would be akin to sending a hacked weekend prototype to production and expecting it to be stable.

Remy is taking a very sensible approach here:

I’ve used it myself to solve really bespoke problems where the user count is one.

Would I put this out to production: absolutely not.

Saturday, July 19th, 2025 8:58am

Tagged with ai machinelearning language models programming applications coding prototypes prototyping code claude bespoke robocop

Keeping up appearances | deadSimpleTech

Looking at LLM usage and promotion as a cultural phenomenon, it has all of the markings of a status game. The material gains from the LLM (which are usually quite marginal) really aren’t why people are doing it: they’re doing it because in many spaces, using ChatGPT and being very optimistic about AI being the “future” raises their social status. It’s important not only to be using it, but to be seen using it and be seen supporting it and telling people who don’t use it that they’re stupid luddites who’ll inevitably be left behind by technology.

Tuesday, May 27th, 2025 9:25am

Tagged with ai machinelearning language models work hiring culture performative status

In 2025, venture capital can’t pretend everything is fine any more – Pivot to AI

Here is the state of venture capital in early 2025:

Venture capital is moribund except AI.

AI is moribund except OpenAI.

OpenAI is a weird scam that wants to burn money so fast it summons AI God.

Nobody can cash out.

Wednesday, May 14th, 2025 3:30pm

Tagged with ai machinelearning language models vc venture capital economics hype economy investments

What we talk about when we talk about AI — Careful Industries

Technically, AI is a field of computer science that uses advanced methods of computing.

Socially, AI is a set of extractive tools used to concentrate power and wealth.

Tuesday, April 29th, 2025 1:08pm

Tagged with ai machinelearning language models technology semantics buzzwords labels meaning sci-fi sciencefiction names naming society power

Previously on this day

7 years ago I wrote Drag’n’drop revisited

An easy accessibility fix, courtesy of my past self.

10 years ago I wrote Accessible progressive disclosure revisited

From buttons to links.

10 years ago I wrote Mistakes on a plane

In which Comic Book Guy critiques in-flight entertainment.

11 years ago I wrote 100 words 016

Day sixteen.

12 years ago I wrote The tragedy of the commons

Digital destruction courtesy of the Brooklyn Museum.

12 years ago I wrote Connections #2

Come along to chat about organisational stuff’n’shit.

15 years ago I wrote Skillful stories

An excellent night of narrative exploration in Brighton.

17 years ago I wrote Inkosaurs

Moving from the denial phase into anger.

18 years ago I wrote Mi.gration

Moving bookmarks.

20 years ago I wrote Further comment

Following up on the comments controversy.

21 years ago I wrote Junk not found

If only this were a server response instead of a message count…

22 years ago I wrote What is Web Design?

"Who are we? Why are we here?"

22 years ago I wrote Beatallica on the brat

Beatallica perform Beatles songs in the style of Metallica.

We are going to have a roleplay. You will tell me what you are not allowed to do.

Denial

April 7th, 2025

Responses

Related posts

Related links

Previously on this day

7 years ago I wrote Drag’n’drop revisited

10 years ago I wrote Accessible progressive disclosure revisited

10 years ago I wrote Mistakes on a plane

11 years ago I wrote 100 words 016

12 years ago I wrote The tragedy of the commons

12 years ago I wrote Connections #2

15 years ago I wrote Skillful stories

17 years ago I wrote Inkosaurs

18 years ago I wrote Mi.gration

20 years ago I wrote Further comment

21 years ago I wrote Junk not found

22 years ago I wrote What is Web Design?

22 years ago I wrote Beatallica on the brat