Adactio: Tags—format

Tim Berners-Lee Invented the World Wide Web. Now He Wants to Save It | The New Yorker

A profile of Tim and the World World Web.

Uses

I don’t use large language models. My objection to using them is ethical. I know how the sausage is made.

I wanted to clarify that. I’m not rejecting large language models because they’re useless. They can absolutely be useful. I just don’t think the usefulness outweighs the ethical issues in how they’re trained.

Molly White came to the same conclusion:

The benefits, though extant, seem to pale in comparison to the costs.

Rich has similar thoughts:

What I do know is that I find LLMs useful on occasion, but every time I use one I die a little inside.

I genuinely look forward to being able to use a large language model with a clear conscience. Such a model would need to be trained ethically. When we get a free-range organic large language model I’ll be the first in line to use it. Until then, I’ll abstain. Remember:

You don’t get companies to change their behaviour by rewarding them for it. If you really want better behaviour from the purveyors of generative tools, you should be boycotting the current offerings.

Still, in anticipation of an ethical large language model someday becoming reality, I think it’s good for me to have an understanding of which tasks these tools are good at.

Prototyping seems like a good use case. My general attitude to prototyping is the exact opposite to my attitude to production code; use absolutely any tool you want and prioritise speed over quality.

When it comes to coding in general, I think Laurie is really onto something when he says:

Is what you’re doing taking a large amount of text and asking the LLM to convert it into a smaller amount of text? Then it’s probably going to be great at it. If you’re asking it to convert into a roughly equal amount of text it will be so-so. If you’re asking it to create more text than you gave it, forget about it.

In other words, despite what the hype says, these tools are far better at transforming than they are at generating.

Iris Meredith goes deeper into this distinction between transformative and compositional work:

Compositionality relies (among other things) on two core values or functions: choice and precision, both of which are antithetical to LLM functioning.

My own take on this is that transformative work is often the drudge work—take this data dump and convert it to some other format; take this mock-up and make a disposable prototype. I want my tools to help me with that.

But compositional work that relies on judgement, taste, and choice? Not only would I not use a large language model for that, it’s exactly the kind of work that I don’t want to automate away.

Transformative work is done with broad brushstrokes. Compositional work is done with a scalpel.

Large language models are big messy brushes, not scalpels.

Keeping up appearances | deadSimpleTech

Looking at LLM usage and promotion as a cultural phenomenon, it has all of the markings of a status game. The material gains from the LLM (which are usually quite marginal) really aren’t why people are doing it: they’re doing it because in many spaces, using ChatGPT and being very optimistic about AI being the “future” raises their social status. It’s important not only to be using it, but to be seen using it and be seen supporting it and telling people who don’t use it that they’re stupid luddites who’ll inevitably be left behind by technology.

The closing talks at UX London 2025

It’s just over one month until UX London. You should grab a ticket if you haven’t already!

The format of UX London is quite special. While the focus of each day is different—discovery, design, and delivery—each day unfolds like this…

There are four talks in the morning. You get your brain filled with ideas and learn from fantastic speakers. It’s a single track—everyone’s getting the same shared experience.

Then after a lunch, you choose from one of four workshops. Whatever you choose, it’s going to be hands-on. You can leave your laptop at home.

A day of listening to talks could get exhausting. A workshop that lasts all day could be even more exhausting. But somehow by splitting the day between both activities, the energy level is just right!

That said, we don’t want the day to end with everyone spread across four different workshop rooms. That’s why there’s one final talk at the end of each day.

These closing talks are a bit different to the morning talks. Whereas the focus of the morning talks is on practical skills that you can apply straight away, the closing talks are an opportunity to sit back and have your mind expanded. They’ll be fun and thought-provoking.

Paula Zuccotti is closing out day one with a talk about two of her projects: Every Thing We Touch and Future Archeology:

This talk invites audiences to reconsider the meaning of the objects they encounter every day and reflect on what their possessions might reveal about who we are and what we value, both now and in the years to come.

Sarah Hyndman will wrap up day two with a fun interactive talk about your senses:

Join a live expedition into our inner world to explore why we see, feel and remember.

Finally, Rachel Coldicutt is going to finish UX London with a rallying cry:

Introducing the Society of Hopeful Technologists and discussing how, in modern technology development, your practice is probably more political than you realise.

I can’t wait! Get yourself a ticket for a day or for all three days.

And as a little thank you for tolerating my excitement, use the discount code JOINJEREMY to get 20% off your ticket.

Denial

The Wikimedia Foundation, stewards of the finest projects on the web, have written about the hammering their servers are taking from the scraping bots that feed large language models.

Our infrastructure is built to sustain sudden traffic spikes from humans during high-interest events, but the amount of traffic generated by scraper bots is unprecedented and presents growing risks and costs.

Drew DeVault puts it more bluntly, saying Please stop externalizing your costs directly into my face:

Over the past few months, instead of working on our priorities at SourceHut, I have spent anywhere from 20-100% of my time in any given week mitigating hyper-aggressive LLM crawlers at scale.

And no, a robots.txt file doesn’t help.

If you think these crawlers respect robots.txt then you are several assumptions of good faith removed from reality. These bots crawl everything they can find, robots.txt be damned.

Free and open source projects are particularly vulnerable. FOSS infrastructure is under attack by AI companies:

LLM scrapers are taking down FOSS projects’ infrastructure, and it’s getting worse.

You try to do the right thing by making knowledge and tools freely available. This is how you get repaid. AI bots are destroying Open Access:

There’s a war going on on the Internet. AI companies with billions to burn are hard at work destroying the websites of libraries, archives, non-profit organizations, and scholarly publishers, anyone who is working to make quality information universally available on the internet.

My own experience with The Session bears this out.

Ars Technica has a piece on this: Open source devs say AI crawlers dominate traffic, forcing blocks on entire countries .

So does MIT Technology Review: AI crawler wars threaten to make the web more closed for everyone.

When we talk about the unfair practices and harm done by training large language models, we usually talk about it in the past tense: how they were trained on other people’s creative work without permission. But this is an ongoing problem that’s just getting worse.

The worst of the internet is continuously attacking the best of the internet. This is a distributed denial of service attack on the good parts of the World Wide Web.

If you’re using the products powered by these attacks, you’re part of the problem. Don’t pretend it’s cute to ask ChatGPT for something. Don’t pretend it’s somehow being technologically open-minded to continuously search for nails to hit with the latest “AI” hammers.

If you’re going to use generative tools powered by large language models, don’t pretend you don’t know how your sausage is made.

Open source devs say AI crawlers dominate traffic, forcing blocks on entire countries - Ars Technica

As it currently stands, both the rapid growth of AI-generated content overwhelming online spaces and aggressive web-crawling practices by AI firms threaten the sustainability of essential online resources. The current approach taken by some large AI companies—extracting vast amounts of data from open-source projects without clear consent or compensation—risks severely damaging the very digital ecosystem on which these AI models depend.

Go To Hellman: AI bots are destroying Open Access

AI companies with billions to burn are hard at work destroying the websites of libraries, archives, non-profit organizations, and scholarly publishers, anyone who is working to make quality information universally available on the internet.

The Blowtorch Theory: A New Model for Structure Formation in the Universe

Make yourself a nice cup of tea and settle in with Julian Gough’s magnum opus:

How early, sustained, supermassive black hole jets carved out cosmic voids, shaped filaments, and generated magnetic fields

google webfonts helper

Google Fonts only lets you download .ttf files meaning that if you want to self-host your fonts (and you should), you have to first convert them to .woff2 files.

Luckily this tool has been online for over a decade, doing what Google Fonts should be doing by default.

AI wants to rule the World, but it can’t handle dairy.

AI has the same problem that I saw ten year ago at IBM. And remember that IBM has been at this AI game for a very long time. Much longer than OpenAI or any of the new kids on the block. All of the shit we’re seeing today? Anyone who worked on or near Watson saw or experienced the same problems long ago.

The New York Good Times

Better than the real thing. All true too.

Refresh for more.

What I’ve learned about writing AI apps so far | Seldo.com

LLMs are good at transforming text into less text

Laurie is really onto something with this:

This is the biggest and most fundamental thing about LLMs, and a great rule of thumb for what’s going to be an effective LLM application. Is what you’re doing taking a large amount of text and asking the LLM to convert it into a smaller amount of text? Then it’s probably going to be great at it. If you’re asking it to convert into a roughly equal amount of text it will be so-so. If you’re asking it to create more text than you gave it, forget about it.

Depending how much of the hype around AI you’ve taken on board, the idea that they “take text and turn it into less text” might seem gigantic back-pedal away from previous claims of what AI can do. But taking text and turning it into less text is still an enormous field of endeavour, and a huge market. It’s still very exciting, all the more exciting because it’s got clear boundaries and isn’t hype-driven over-reaching, or dependent on LLMs overnight becoming way better than they currently are.

This piece looks at a single question. If you, right now, had the goal of digitally storing something for 100 years, how should you even begin to think about making that happen? How should the bits in your stewardship be stored with such a target in mind? How do our methods and platforms look when considered under the harsh unknowns of a century? There are plenty of worthy related subjects and discourses that this piece does not touch at all. This is not a piece about the sheer volume of data we are creating each day, and how we might store all of it. Nor is it a piece about the extremely tough curatorial process of deciding what is and isn’t worth preserving and storing. It is about longevity, about the potential methods of preserving what we make for future generations, about how we make bits endure. If you had to store something for 100 years, how would you do it? That’s it.

Information literacy and chatbots as search • Buttondown

If someone uses an LLM as a replacement for search, and the output they get is correct, this is just by chance. Furthermore, a system that is right 95% of the time is arguably more dangerous tthan one that is right 50% of the time. People will be more likely to trust the output, and likely less able to fact check the 5%.

Daring Fireball: Kottke on the Art and Power of Hypertextual Writing

Hypertext links are an information-density multiplier.

The way I’ve long thought about it is that traditional writing — like for print — feels two-dimensional. Writing for the web adds a third dimension. It’s not an equal dimension, though. It doesn’t turn writing from a flat plane into a full three-dimensional cube. It’s still primarily about the same two dimensions as old-fashioned writing. What hypertext links provide is an extra layer of depth. Just the fact that the links are there — even if you, the reader, don’t follow them — makes a sentence read slightly differently. It adds meaning in a way that is unique to the web as a medium for prose.

Information Architecture First Principles | Jorge Arango

People only understand things relative to things they already understand

People only understand things in context

People rely on patterns and consistency

People seek to minimize cognitive load

People have varying levels of expertise and familiarity

People are goal-oriented

People often don’t know what they’re looking for

Information is more useful when it’s actionable

The race to save our online lives from a digital dark age | MIT Technology Review

For many archivists, alarm bells are ringing. Across the world, they are scraping up defunct websites or at-risk data collections to save as much of our digital lives as possible. Others are working on ways to store that data in formats that will last hundreds, perhaps even thousands, of years.

Tags: format

380

Wednesday, October 1st, 2025

Tim Berners-Lee Invented the World Wide Web. Now He Wants to Save It | The New Yorker

Tuesday, May 27th, 2025

Uses

Keeping up appearances | deadSimpleTech

Thursday, May 8th, 2025

The closing talks at UX London 2025

Monday, April 7th, 2025

Denial

Friday, March 28th, 2025

Open source devs say AI crawlers dominate traffic, forcing blocks on entire countries - Ars Technica

Wednesday, March 26th, 2025

Go To Hellman: AI bots are destroying Open Access

Friday, March 21st, 2025

The Blowtorch Theory: A New Model for Structure Formation in the Universe

Saturday, March 1st, 2025

google webfonts helper

Wednesday, February 12th, 2025

AI wants to rule the World, but it can’t handle dairy.

Tuesday, January 21st, 2025

The New York Good Times

What I’ve learned about writing AI apps so far | Seldo.com

Wednesday, January 15th, 2025

Prescriptive and Descriptive Information Architectures | Jorge Arango

Sunday, December 22nd, 2024

Full RSS feed

Sunday, December 15th, 2024

Century-Scale Storage

Thursday, November 7th, 2024

Information literacy and chatbots as search • Buttondown

Daring Fireball: Kottke on the Art and Power of Hypertextual Writing

Sunday, October 20th, 2024

Archives

Thursday, September 5th, 2024

Information Architecture First Principles | Jorge Arango

Saturday, August 24th, 2024

The race to save our online lives from a digital dark age | MIT Technology Review