Journal tags: slop

3

Tools

One persistent piece of slopaganda you’ll hear is this:

“It’s just a tool. What matters is how you use it.”

This isn’t a new tack. The same justification has been applied to many technologies.

Leaving aside Kranzberg’s first law, large language models are the very antithesis of a neutral technology. They’re imbued with bias and political decisions at every level.

There’s the obvious problem of where the training data comes from. It’s stolen. Everyone knows this, but some people would rather pretend they don’t know how the sausage is made.

But if you set aside how the tool is made, it’s still just a tool, right? A building is still a building even if it’s built on stolen land.

Except with large language models, the training data is just the first step. After that you need to traumatise an underpaid workforce to remove the most horrifying content. Then you build an opaque black box that end-users have no control over.

Take temperature, for example. That’s the degree of probability a large language model uses for choosing the next token. Dial the temperature too low and the tool will parrot its training data too closely, making it a plagiarism machine. Dial the temperature too high and the tool generates what we kindly call “hallucinations”.

Either way, you have no control over that dial. Someone else is making that decision for you.

A large language model is as neutral as an AK-47.

I understand why people want to feel in control of the tools they’re using. I know why people will use large language models for some tasks—brainstorming, rubber ducking—but strictly avoid them for any outputs intended for human consumption.

You could even convince yourself that a large language model is like a bicycle for the mind. In truth, a large language model is more like one of those hover chairs on the spaceship in WALL·E.

Large language models don’t amplify your creativity and agency. Large language models stunt your creativity and rob you of agency.

When someone applies a large language model it is an example of tool use. But the large language model isn’t the tool.

Codewashing

I have little understanding for people using large language models to generate slop; words and images that nobody asked for.

I have more understanding for people using large language models to generate code. Code isn’t the thing in the same way that words or images are; code is the thing that gets you to the thing.

And if a large language model hallucinates some code, you’ll find out soon enough:

With code you get a powerful form of fact checking for free. Run the code, see if it works.

But I want to push back on one justification I see repeatedly about using large language models to write code. Here’s Craig:

There are many moral and ethical issues with using LLMs, but building software feels like one of the few truly ethically “clean”(er) uses (trained on open source code, etc.)

That’s not how this works. Yes, the large language models are trained on lots of code (most of it open source), but they’re not only trained on that. That’s on top of everything else; all the stolen books, all the unpaid creative work of others.

Even Robin Sloan, who first says:

I think the case of code is especially clear, and, for me, basically settled.

…goes on to acknowledge:

But, again, it’s important to say: the code only works because of Everything. Take that data away, train a model using GitHub alone, and you’ll get a far less useful tool.

When large language models are trained on domain-specific data, it’s always in addition to the mahoosive amount of content they’ve already stolen. It’s that mohoosive amount of content—not the domain-specific data—that enables them to parse your instructions.

(Note that I’m being very delibarate in saying “parse”, not “understand.” Though make no mistake, I’m astonished at how good these tools are at parsing instructions. I say that as someone who tried to write natural language parsers for text-only adventure games back in the 1980s.)

So, sure, go ahead and use large language models to write code. But don’t fool yourself into thinking that it’s somehow ethical.

What I said here applies to code too:

If you’re going to use generative tools powered by large language models, don’t pretend you don’t know how your sausage is made.

Filters

My phone rang today. I didn’t recognise the number so although I pressed the big button to answer the call, I didn’t say anything.

I didn’t say anything because usually when I get a call from a number I don’t know, it’s some automated spam. If I say nothing, the spam voice doesn’t activate.

But sometimes it’s not a spam call. Sometimes after a few seconds of silence a human at the other end of the call will say “Hello?” in an uncertain tone. That’s the point when I respond with a cheery “Hello!” of my own and feel bad for making this person endure those awkward seconds of silence.

Those spam calls have made me so suspicious that real people end up paying the price. False positives caught in my spam-detection filter.

Now it’s happening on the web.

I wrote about how Google search, Bing, and Mozilla Developer network are squandering trust:

Trust is a precious commodity. It takes a long time to build trust. It takes a short time to destroy it.

But it’s not just limited to specific companies. I’ve noticed more and more suspicion related to any online activity.

I’ve seen members of a community site jump to the conclusion that a new member’s pattern of behaviour was a sure sign that this was a spambot. But it could just as easily have been the behaviour of someone who isn’t neurotypical or who doesn’t speak English as their first language.

Jessica was looking at some pictures on an AirBnB listing recently and found herself examining some photos that seemed a little too good to be true, questioning whether they were in fact output by some generative tool.

Every email that lands in my inbox is like a little mini Turing test. Did a human write this?

Our guard is up. Our filters are activated. Our default mode is suspicion.

This is most apparent with web search. We’ve always needed to filter search results through our own personal lenses, but now it’s like playing whack-a-mole. First we have to find workarounds for avoiding slop, and then when we click through to a web page, we have to evaluate whether’s it’s been generated by some SEO spammer making full use of the new breed of content-production tools.

There’s been a lot of hand-wringing about how this could spell doom for the web. I don’t think that’s necessarily true. It might well spell doom for web search, but I’m okay with that.

Back before its enshittification—an enshittification that started even before all the recent AI slop—Google solved the problem of accurate web searching with its PageRank algorithm. Before that, the only way to get to trusted information was to rely on humans.

Humans made directories like Yahoo! or DMOZ where they categorised links. Humans wrote blog posts where they linked to something that they, a human, vouched for as being genuinely interesting.

There was life before Google search. There will be life after Google search.

Look, there’s even a new directory devoted to cataloging blogs: websites made by humans. Life finds a way.

All of the spam and slop that’s making us so suspicious may end up giving us a new appreciation for human curation.

It wouldn’t be a straightforward transition to move away from search. It would be uncomfortable. It would require behaviour change. People don’t like change. But when needs must, people adapt.

The first bit of behaviour change might be a rediscovery of bookmarks. It used to be that when you found a source you trusted, you bookmarked it. Browsers still have bookmarking functionality but most people rely on search. Maybe it’s time for a bookmarking revival.

A step up from that would be using a feed reader. In many ways, a feed reader is a collection of bookmarks, but all of the bookmarks get polled regularly to see if there are any updates. I love using my feed reader. Everything I’ve subscribed to in there is made by humans.

The ultimate bookmark is an icon on the homescreen of your phone or in the dock of your desktop device. A human source you trust so much that you want it to be as accessible as any app.

Right now the discovery mechanism for that is woeful. I really want that to change. I want a web that empowers people to connect with other people they trust, without any intermediary gatekeepers.

The evangelists of large language models (who may coincidentally have invested heavily in the technology) like to proclaim that a slop-filled future is inevitable, as though we have no choice, as though we must simply accept enshittification as though it were a force of nature.

But we can always walk away.