Journal tags: tools

46

sparkline

Uses

I don’t use large language models. My objection to using them is ethical. I know how the sausage is made.

I wanted to clarify that. I’m not rejecting large language models because they’re useless. They can absolutely be useful. I just don’t think the usefulness outweighs the ethical issues in how they’re trained.

Molly White came to the same conclusion:

The benefits, though extant, seem to pale in comparison to the costs.

Rich has similar thoughts:

What I do know is that I find LLMs useful on occasion, but every time I use one I die a little inside.

I genuinely look forward to being able to use a large language model with a clear conscience. Such a model would need to be trained ethically. When we get a free-range organic large language model I’ll be the first in line to use it. Until then, I’ll abstain. Remember:

You don’t get companies to change their behaviour by rewarding them for it. If you really want better behaviour from the purveyors of generative tools, you should be boycotting the current offerings.

Still, in anticipation of an ethical large language model someday becoming reality, I think it’s good for me to have an understanding of which tasks these tools are good at.

Prototyping seems like a good use case. My general attitude to prototyping is the exact opposite to my attitude to production code; use absolutely any tool you want and prioritise speed over quality.

When it comes to coding in general, I think Laurie is really onto something when he says:

Is what you’re doing taking a large amount of text and asking the LLM to convert it into a smaller amount of text? Then it’s probably going to be great at it. If you’re asking it to convert into a roughly equal amount of text it will be so-so. If you’re asking it to create more text than you gave it, forget about it.

In other words, despite what the hype says, these tools are far better at transforming than they are at generating.

Iris Meredith goes deeper into this distinction between transformative and compositional work:

Compositionality relies (among other things) on two core values or functions: choice and precision, both of which are antithetical to LLM functioning.

My own take on this is that transformative work is often the drudge work—take this data dump and convert it to some other format; take this mock-up and make a disposable prototype. I want my tools to help me with that.

But compositional work that relies on judgement, taste, and choice? Not only would I not use a large language model for that, it’s exactly the kind of work that I don’t want to automate away.

Transformative work is done with broad brushstrokes. Compositional work is done with a scalpel.

Large language models are big messy brushes, not scalpels.

Tools

One persistent piece of slopaganda you’ll hear is this:

“It’s just a tool. What matters is how you use it.”

This isn’t a new tack. The same justification has been applied to many technologies.

Leaving aside Kranzberg’s first law, large language models are the very antithesis of a neutral technology. They’re imbued with bias and political decisions at every level.

There’s the obvious problem of where the training data comes from. It’s stolen. Everyone knows this, but some people would rather pretend they don’t know how the sausage is made.

But if you set aside how the tool is made, it’s still just a tool, right? A building is still a building even if it’s built on stolen land.

Except with large language models, the training data is just the first step. After that you need to traumatise an underpaid workforce to remove the most horrifying content. Then you build an opaque black box that end-users have no control over.

Take temperature, for example. That’s the degree of probability a large language model uses for choosing the next token. Dial the temperature too low and the tool will parrot its training data too closely, making it a plagiarism machine. Dial the temperature too high and the tool generates what we kindly call “hallucinations”.

Either way, you have no control over that dial. Someone else is making that decision for you.

A large language model is as neutral as an AK-47.

I understand why people want to feel in control of the tools they’re using. I know why people will use large language models for some tasks—brainstorming, rubber ducking—but strictly avoid them for any outputs intended for human consumption.

You could even convince yourself that a large language model is like a bicycle for the mind. In truth, a large language model is more like one of those hover chairs on the spaceship in WALL·E.

Large language models don’t amplify your creativity and agency. Large language models stunt your creativity and rob you of agency.

When someone applies a large language model it is an example of tool use. But the large language model isn’t the tool.

Codewashing

I have little understanding for people using large language models to generate slop; words and images that nobody asked for.

I have more understanding for people using large language models to generate code. Code isn’t the thing in the same way that words or images are; code is the thing that gets you to the thing.

And if a large language model hallucinates some code, you’ll find out soon enough:

With code you get a powerful form of fact checking for free. Run the code, see if it works.

But I want to push back on one justification I see repeatedly about using large language models to write code. Here’s Craig:

There are many moral and ethical issues with using LLMs, but building software feels like one of the few truly ethically “clean”(er) uses (trained on open source code, etc.)

That’s not how this works. Yes, the large language models are trained on lots of code (most of it open source), but they’re not only trained on that. That’s on top of everything else; all the stolen books, all the unpaid creative work of others.

Even Robin Sloan, who first says:

I think the case of code is especially clear, and, for me, basically settled.

…goes on to acknowledge:

But, again, it’s important to say: the code only works because of Everything. Take that data away, train a model using GitHub alone, and you’ll get a far less useful tool.

When large language models are trained on domain-specific data, it’s always in addition to the mahoosive amount of content they’ve already stolen. It’s that mohoosive amount of content—not the domain-specific data—that enables them to parse your instructions.

(Note that I’m being very delibarate in saying “parse”, not “understand.” Though make no mistake, I’m astonished at how good these tools are at parsing instructions. I say that as someone who tried to write natural language parsers for text-only adventure games back in the 1980s.)

So, sure, go ahead and use large language models to write code. But don’t fool yourself into thinking that it’s somehow ethical.

What I said here applies to code too:

If you’re going to use generative tools powered by large language models, don’t pretend you don’t know how your sausage is made.

Denial

The Wikimedia Foundation, stewards of the finest projects on the web, have written about the hammering their servers are taking from the scraping bots that feed large language models.

Our infrastructure is built to sustain sudden traffic spikes from humans during high-interest events, but the amount of traffic generated by scraper bots is unprecedented and presents growing risks and costs.

Drew DeVault puts it more bluntly, saying Please stop externalizing your costs directly into my face:

Over the past few months, instead of working on our priorities at SourceHut, I have spent anywhere from 20-100% of my time in any given week mitigating hyper-aggressive LLM crawlers at scale.

And no, a robots.txt file doesn’t help.

If you think these crawlers respect robots.txt then you are several assumptions of good faith removed from reality. These bots crawl everything they can find, robots.txt be damned.

Free and open source projects are particularly vulnerable. FOSS infrastructure is under attack by AI companies:

LLM scrapers are taking down FOSS projects’ infrastructure, and it’s getting worse.

You try to do the right thing by making knowledge and tools freely available. This is how you get repaid. AI bots are destroying Open Access:

There’s a war going on on the Internet. AI companies with billions to burn are hard at work destroying the websites of libraries, archives, non-profit organizations, and scholarly publishers, anyone who is working to make quality information universally available on the internet.

My own experience with The Session bears this out.

Ars Technica has a piece on this: Open source devs say AI crawlers dominate traffic, forcing blocks on entire countries .

So does MIT Technology Review: AI crawler wars threaten to make the web more closed for everyone.

When we talk about the unfair practices and harm done by training large language models, we usually talk about it in the past tense: how they were trained on other people’s creative work without permission. But this is an ongoing problem that’s just getting worse.

The worst of the internet is continuously attacking the best of the internet. This is a distributed denial of service attack on the good parts of the World Wide Web.

If you’re using the products powered by these attacks, you’re part of the problem. Don’t pretend it’s cute to ask ChatGPT for something. Don’t pretend it’s somehow being technologically open-minded to continuously search for nails to hit with the latest “AI” hammers.

If you’re going to use generative tools powered by large language models, don’t pretend you don’t know how your sausage is made.

Design processing

Dan wrote an interesting post with a somewhat clickbaity title; This Competition Exposed How AI is Reshaping Design:

I watched two designers go head-to-head in a high-speed battle to create the best landing page in 45 minutes. One was a seasoned pro. The other was a non-designer using AI.

If you can ignore the title (and the fact that Dan still actively posts on Twitter; something I find very hard to ignore), then there’s a really thoughtful analysis in there.

It’s less about one platform or tool vs. another more than it is a commentary on how design happens, and whether or not that’s changing in a significant way.

In particular, there’s a very revealing graph that shows the pros and cons of both approaches.

There’s no doubt about it, using a generative large language model helped a non-designer to get past the blank page. But it was less useful in subsequent iterations that rely on decision-making:

I’ve said it before and I’ll say it again: design is deciding. The best designers are the best deciders.

Dan finishes by saying that what he’d really like to see is an experienced designer/decider using these tools to turbo-boost their process:

AI raises the floor for non-designers. But can it raise the ceiling for designers?

Meanwhile, Matt has been writing about Vibe-designing. Matt is an experienced designer, but he’s not experienced with Figma. He’s found that he can work around that using a large language model:

Where in the past 30 years I might have had to cajole a more technically adept colleague into making something through sketches, gesticulating and making sound effects – I open up a Claude window and start what-iffing.

The “vibe” part of the equation often defaults to the mean, which is not a surprise when you think about what you’re asking to help is a staggeringly-massive machine for producing generally-unsurprising satisfactory answers quickly. So, you look at the output as a basis for the next sketch, and the next sketch and quickly, together, you move to something more novel as a result.

Interesting! Just as Dan insisted, the important work is making the decision and moving on to the next stage. If the actual outputs at each stage are mediocre, that seems to be okay, as long as they’re just good enough to inform a go/no-go decision.

This certainly seems more centaur-like than the usual boring uses of large language models to simply do what people are already doing.

Rich gets at something similar when he talks about using large language models for prototyping, where it’s okay if the code is kind of shitty:

If all you need is crappy code to try out a concept or a solution, then an LLM might well enable you (the designer) to do that.

Mind you, even if you do end up finding useful and appropriate ways to use these tools, you’re still using a tool built on exploitation and unfairness:

It’s hard (and reckless) to ignore the heartfelt and cogent perspective laid out by Miriam on the role of AI companies in the current geopolitical crisis:

When eugenics-obsessed billionaires try to sell me a new toy, I don’t ask how many keystrokes it will save me at work. It’s impossible for me to discuss the utility of a thing when I fundamentally disagree with the purpose of it.

Changing

It always annoys me when a politician is accused of “flip-flopping” when they change their mind on something. Instead of admiring someone for being willing to re-examine previously-held beliefs, we lambast them. We admire conviction, even though that’s a trait that has been at the root of history’s worst attrocities.

When you look at the history of human progress, some of our greatest advances were made by people willing to question their beliefs. Prioritising data over opinion is what underpins the scientific method.

But I get it. It can be very uncomfortable to change your mind. There’s inevitably going to be some psychological resistance, a kind of inertia of opinion that favours the sunk cost of all the time you’ve spent believing something.

I was thinking back to times when I’ve changed my opinion on something after being confronted with new evidence.

In my younger days, I was staunchly anti-nuclear power. It didn’t help that in my younger days, nuclear power and nuclear weapons were conceptually linked in the public discourse. In the intervening years I’ve come to believe that nuclear power is far less destructive than fossil fuels. There are still a lot of issues—in terms of cost and time—which make nuclear less attractive than solar or wind, but I honestly can’t reconcile someone claiming to be an environmentalist while simultaneously opposing nuclear power. The data just doesn’t support that conclusion.

Similarly, I remember in the early 2000s being opposed to genetically-modified crops. But the more I looked into the facts, there was nothing—other than vibes—to bolster that opposition. And yet I know many people who’ve maintainted their opposition, often the same people who point to the scientific evidence when it comes to climate change. It’s a strange kind of cognitive dissonance that would allow for that kind of cherry-picking.

There are other situations where I’ve gone more in the other direction—initially positive, later negative. Google’s AMP project is one example. It sounded okay to me at first. But as I got into the details, its fundamental unfairness couldn’t be ignored.

I was fairly neutral on blockchains at first, at least from a technological perspective. There was even some initial promise of distributed data preservation. But over time my opinion went down, down, down.

Bitcoin, with its proof-of-work idiocy, is the poster-child of everything wrong with the reality of blockchains. The astoundingly wasteful energy consumption is just staggeringly pointless. Over time, any sufficiently wasteful project becomes indistinguishable from evil.

Speaking of energy usage…

My feelings about large language models have been dominated by two massive elephants in the room. One is the completely unethical way that the training data has been acquired (by ripping off the work of people who never gave their permission). The other is the profligate energy usage in not just training these models, but also running queries on the network.

My opinion on the provenance of the training data hasn’t changed. If anything, it’s hardened. I want us to fight back against this unethical harvesting by poisoning the well that the training data is drawing from.

But my opinion on the energy usage might just be swaying a little.

Michael Liebreich published an in-depth piece for Bloomberg last month called Generative AI – The Power and the Glory. He doesn’t sugar-coat the problems with current and future levels of power consumption for large language models, but he also doesn’t paint a completely bleak picture.

Effectively there’s a yet-to-decided battle between Koomey’s law and the Jevons paradox. Time will tell which way this will go.

The whole article is well worth a read. But what really gave me pause was a recent piece by Hannah Ritchie asking What’s the impact of artificial intelligence on energy demand?

When Hannah Ritchie speaks, I listen. And I’m well aware of the irony there. That’s classic argument from authority, when the whole point of Hannah Ritchie’s work is that it’s the data that matters.

In any case, she does an excellent job of putting my current worries into a historical context, as well as laying out some potential futures.

Don’t get me wrong, the energy demands of large language models are enormous and are only going to increase, but we may well see some compensatory efficiencies.

Personally, I’d just like to see these tools charge a fair price for their usage. Right now they’re being subsidised by venture capital. If people actually had to pay out of pocket for the energy used per query, we’d get a much better idea of how valuable these tools actually are to people.

Instead we’re seeing these tools being crammed into existing products regardless of whether anybody actually wants them (and in my anecdotal experience, most people resent this being forced on them).

Still, I thought it was worth making a note of how my opinion on the energy usage of large language models is open to change.

But I still won’t use one that’s been trained on other people’s work without their permission.

Unsaid

I went to the UX Brighton conference yesterday.

The quality of the presentations was really good this year, probably the best yet. Usually there are one or two stand-out speakers (like Tom Kerwin last year), but this year, the standard felt very high to me.

But…

The theme of the conference was UX and “AI”, and I’ve never been more disappointed by what wasn’t said at a conference.

Not a single speaker addressed where the training data for current large language models comes from (it comes from scraping other people’s copyrighted creative works).

Not a single speaker addressed the energy requirements for current large language models (the requirements are absolutely mahoosive—not just for the training, but for each and every query).

My charitable reading of the situation yesterday was that every speaker assumed that someone else would cover those issues.

The less charitable reading is that this was a deliberate decision.

Whenever the issue of ethics came up, it was only ever in relation to how we might use these tools: considering user needs, being transparent, all that good stuff. But never once did the question arise of whether it’s ethical to even use these tools.

In fact, the message was often the opposite: words like “responsibility” and “duty” came up, but only in the admonition that UX designers have a responsibility and duty to use these tools! And if that carrot didn’t work, there’s always the stick of scaring you into using these tools for fear of being left behind and having a machine replace you.

I was left feeling somewhat depressed about the deliberately narrow focus. Maggie’s talk was the only one that dealt with any externalities, looking at how the firehose of slop is blasting away at society. But again, the focus was only ever on how these tools are used or abused; nobody addressed the possibility of deliberately choosing not to use them.

If audience members weren’t yet using generative tools in their daily work, the assumption was that they were lagging behind and it was only a matter of time before they’d get on board the hype train. There was no room for the idea that someone might examine the roots of these tools and make a conscious choice not to fund their development.

There’s a quote by Finnish architect Eliel Saarinen that UX designers like repeating:

Always design a thing by considering it in its next larger context. A chair in a room, a room in a house, a house in an environment, an environment in a city plan.

But none of the speakers at UX Brighton chose to examine the larger context of the tools they were encouraging us to use.

One speaker told us “Be curious!”, but clearly that curiosity should not extend to the foundations of the tools themselves. Ignore what’s behind the curtain. Instead look at all the cool stuff we can do now. Don’t worry about the fact that everything you do with these tools is built on a bedrock of exploitation and environmental harm. We should instead blithely build a new generation of user interfaces on the burial ground of human culture.

Whenever I get into a discussion about these issues, it always seems to come back ’round to whether these tools are actually any good or not. People point to the genuinely useful tasks they can accomplish. But that’s not my issue. There are absolutely smart and efficient ways to use large language models—in some situations, it’s like suddenly having a superpower. But as Molly White puts it:

The benefits, though extant, seem to pale in comparison to the costs.

There are no ethical uses of current large language models.

And if you believe that the ethical issues will somehow be ironed out in future iterations, then that’s all the more reason to stop using the current crop of exploitative large language models.

Anyway, like I said, all the talks at UX Brighton were very good. But I just wish just one of them had addressed the underlying questions that any good UX designer should ask: “Where did this data come from? What are the second-order effects of deploying this technology?”

Having a talk on those topics would’ve been nice, but I would’ve settled for having five minutes of one talk, or even one minute. But there was nothing.

There’s one possible explanation for this glaring absence that’s quite depressing to consider. It may be that these topics weren’t covered because there’s an assumption that everybody already knows about them, and frankly, doesn’t care.

To use an outdated movie reference, imagine a raving Charlton Heston shouting that “Soylent Green is people!”, only to be met with indifference. “Everyone knows Soylent Green is people. So what?”

Mismatch

This seems to be the attitude of many of my fellow nerds—designers and developers—when presented with tools based on large language models that produce dubious outputs based on the unethical harvesting of other people’s work and requiring staggering amounts of energy to run:

This is the future! I need to start using these tools now, even if they’re flawed, because otherwise I’ll be left behind. They’ll only get better. It’s inevitable.

Whereas this seems to be the attitude of those same designers and developers when faced with stable browser features that can be safely used today without frameworks or libraries:

I’m sceptical.

What price?

I’ve noticed a really strange justification from people when I ask them about their use of generative tools that use large language models (colloquially and inaccurately labelled as artificial intelligence).

I’ll point out that the training data requires the wholesale harvesting of creative works without compensation. I’ll also point out the ludicrously profligate energy use required not just for the training, but for the subsequent queries.

And here’s the thing: people will acknowledge those harms but they will justify their actions by saying “these things will get better!”

First of all, there’s no evidence to back that up.

If anything, as the well gets poisoned by their own outputs, large language models may well end up eating their own slop and getting their own version of mad cow disease. So this might be as good as they’re ever going to get.

And when it comes to energy usage, all the signals from NVIDIA, OpenAI, and others are that power usage is going to increase, not decrease.

But secondly, what the hell kind of logic is that?

It’s like saying “It’s okay for me to drive my gas-guzzling SUV now, because in the future I’ll be driving an electric vehicle.”

The logic is completely backwards! If large language models are going to improve their ethical shortcomings (which is debatable, but let’s be generous), then that’s all the more reason to avoid using the current crop of egregiously damaging tools.

You don’t get companies to change their behaviour by rewarding them for it. If you really want better behaviour from the purveyors of generative tools, you should be boycotting the current offerings.

I suspect that most people know full well that the “they’ll get better!” defence doesn’t hold water. But you can convince yourself of anything when everyone around is telling you that this is the future baby, and you’d better get on board or you’ll be left behind.

Baldur reminds us that this is how people talked about asbestos:

Every time you had an industry campaign against an asbestos ban, they used the same rhetoric. They focused on the potential benefits – cheaper spare parts for cars, cheaper water purification – and doing so implicitly assumed that deaths and destroyed lives, were a low price to pay.

This is the same strategy that’s being used by those who today talk about finding productive uses for generative models without even so much as gesturing towards mitigating or preventing the societal or environmental harms.

It reminds me of the classic Ursula Le Guin short story, The Ones Who Walk Away from Omelas that depicts:

…the utopian city of Omelas, whose prosperity depends on the perpetual misery of a single child.

Once citizens are old enough to know the truth, most, though initially shocked and disgusted, ultimately acquiesce to this one injustice that secures the happiness of the rest of the city.

It turns out that most people will blithely accept injustice and suffering not for a utopia, but just for some bland hallucinated slop.

Don’t get me wrong: I’m not saying large language models aren’t without their uses. I love seeing what Simon and Matt are doing when it comes to coding. And large language models can be great for transforming content from one format to another, like transcribing speech into text. But the balance sheet just doesn’t add up.

As Molly White put it: AI isn’t useless. But is it worth it?:

Even as someone who has used them and found them helpful, it’s remarkable to see the gap between what they can do and what their promoters promise they will someday be able to do. The benefits, though extant, seem to pale in comparison to the costs.

Manual ’till it hurts

I’ve been going buildless—or as Brad crudely puts it, raw-dogging websites on a few projects recently. Not just obviously simple things like Clearleft’s Browser Support page, but sites like:

They also have 0 dependencies.

Like Max says:

Funnily enough, many build tools advertise their superior “Developer Experience” (DX). For my money, there’s no better DX than shipping code straight to the browser and not having to worry about some cryptic node_modules error in between.

Making websites without a build step is a gift to your future self. When you open that project six months or a year or two years later, there’ll be no faffing about with npm updates, installs, or vulnerabilities.

Need to edit the CSS? You edit the CSS. Need to change the markup? You change the markup.

It’s remarkably freeing. It’s also very, very performant.

If you’re thinking that your next project couldn’t possibly be made without a build step, let me tell you about a phrase I first heard in the indie web community: “Manual ‘till it hurts”. It’s basically a two-step process:

  1. Start doing what you need to do by hand.
  2. When that becomes unworkable, introduce some kind of automation.

It’s remarkable how often you never reach step two.

I’m not saying premature optimisation is the root of all evil. I’m just saying it’s premature.

Start simple. Get more complex if and when you need to.

You might never need to.

Trust

In their rush to cram in “AI” “features”, it seems to me that many companies don’t actually understand why people use their products.

Google is acting as though its greatest asset is its search engine. Same with Bing.

Mozilla Developer Network is acting as though its greatest asset is its documentation. Same with Stack Overflow.

But their greatest asset is actually trust.

If I use a search engine I need to be able to trust that the filtering is good. If I look up documentation I need to trust that the information is good. I don’t expect perfection, but I also don’t expect to have to constantly be thinking “was this generated by a large language model, and if so, how can I know it’s not hallucinating?”

“But”, the apologists will respond, “the results are mostly correct! The documentation is mostly true!”

Sure, but as Terence puts it:

The intern who files most things perfectly but has, more than once, tipped an entire cup of coffee into the filing cabinet is going to be remembered as “that klutzy intern we had to fire.”

Trust is a precious commodity. It takes a long time to build trust. It takes a short time to destroy it.

I am honestly astonished that so many companies don’t seem to realise what they’re destroying.

InstAI

If you use Instagram, there may be a message buried in your notifications. It begins:

We’re getting ready to expand our AI at Meta experiences to your region.

Fuck that. Here’s the important bit:

To help bring these experiences to you, we’ll now rely on the legal basis called legitimate interests for using your information to develop and improve AI at Meta. This means that you have the right to object to how your information is used for these purposes. If your objection is honoured, it will be applied going forwards.

Follow that link and fill in the form. For the field labelled “Please tell us how this processing impacts you” I wrote:

It’s fucking rude.

That did the trick. I got an email saying:

We’ve reviewed your request and will honor your objection.

Mind you, there’s still this:

We may still process information about you to develop and improve AI at Meta, even if you object or don’t use our products and services.

Bookmarklets for testing your website

I’m at day two of Indie Web Camp Brighton.

Day one was excellent. It was really hard to choose which sessions to go to because they all sounded interesting. That’s a good problem to have.

I ended up participating in:

  • a session on POSSE,
  • a session on NFC tags,
  • a session on writing, and
  • a session on testing your website that was hosted by Ros

In that testing session I shared some of the bookmarklets I use regularly.

Bookmarklets? They’re bookmarks that sit in the toolbar of your desktop browser. Just like any other bookmark, they’re links. The difference is that these links begin with javascript: rather than http. That means you can put programmatic instructions inside the link. Click the bookmark and the JavaScript gets executed.

In my mind, there are two different approaches to making a bookmarklet. One kind of bookmarklet contains lots of clever JavaScript—that’s where the smart stuff happens. The other kind of bookmarklet is deliberately dumb. All they do is take the URL of the current page and pass it to another service—that’s where the smart stuff happens.

I like that second kind of bookmarklet.

Here are some bookmarklets I’ve made. You can drag any of them up to the toolbar of your browser. Or you could create a folder called, say, “bookmarklets”, and drag these links up there.

Validation: This bookmarklet will validate the HTML of whatever page you’re on.

Validate HTML

Carbon: This bookmarklet will run the domain through the website carbon calculator.

Calculate carbon

Accessibility: This bookmarklet will run the current page through the Website Accessibility Evaluation Tools.

WAVE

Performance: This bookmarklet will take the current page and it run it through PageSpeed Insights, which includes a Lighthouse test.

PageSpeed

HTTPS: This bookmarklet will run your site through the SSL checker from SSL Labs.

SSL Report

Headers: This bookmarklet will test the security headers on your website.

Security Headers

Drag any of those links to your browser’s toolbar to “install” them. If you don’t like one, you can delete it the same way you can delete any other bookmark.

PageSpeed Insights bookmarklet

I’m a little obsessed with web performance. I like being able to check a page’s core web vitals quickly and easily.

Four years ago, I made a Lighthouse bookmarklet. Whatever web page you were on, when you clicked on the bookmarklet you’d get the Lighthouse results for that page. Handy!

It doesn’t work anymore. This is probably because Google are in the loop. Four years is pretty good innings for anything involving that company.

I kid (mostly). Lighthouse itself is still going strong, despite being a Google product. But the bookmarklet needs updating.

Rather than just get Lighthouse results, I figured that the full PageSpeed Insights results would be even better. If your website is in the Chrome UX report, you get to see those CrUX details too.

So here’s the updated bookmarklet:

PageSpeed Insights

Drag that up to your desktop browser’s bookmarks toolbar. Press it whenever you want to test the page you’re on.

Continuous partial ick

The output of generative tools based on large language models gives me the ick.

This isn’t a measured logical response. It’s more of an involuntary emotional reaction.

I could try to justify my reaction by saying I’m concerned about the exploitation involved in the training data, or the huge energy costs involved, or the disenfranchisement of people who create art. But those would be post-facto rationalisations.

I just find myself wrinkling my nose and mentally going “Ew!” whenever somebody posts the output of some prompt they gave to ChatGPT or Midjourney.

Again, I’m not saying this is rational. It’s more instinctual.

You could well say that this is my problem. You may be right. But I wonder what it is that’s so unheimlich about these outputs that triggers my response.

Just to clarify, I am talking about direct outputs, shared verbatim. If someone were to use one of these tools in the process of creating something I’d be none the wiser. I probably couldn’t even tell that a large language model was involved at some point. I’m fine with that. It’s when someone takes something directly from one of these tools and then shares it online, that’s what raises my bile.

I was at a conference a few months back where your badge featured a hallucinated picture of you. Now, this probably sounded like a fun idea. It probably is a fun idea. I can’t tell. All I know is that it made me feel a little queasy.

Perhaps it’s a question of taste. In which case, I’m being a snob. I’m literally turning my nose up at something I deem to be tacky.

But isn’t it tacky, though? It’s not something I can describe, but there’s just something about the vibe of these images—and words—that feels off. It’s sort of creepy, but it’s mostly just the mediocrity that sits so uneasily with me.

These tools do an amazing job of solving the quantity problem—how to produce an image or piece of text quickly. And by most measurements, you could say that they also solve the quality problem. These outputs are good enough to pass for “the real thing.” The outputs are, like, 90% to 95% there. And the gap is closing.

And yet. There’s something in that gap. Something that I feel in my gut. Something that makes me go “nope.”

A memex in every web browser

When Mathew Modine’s character first shows up in Christopher Nolan’s Oppenheimer, I figured the rest of the cinema audience wouldn’t have appreciated me shouting out “VANNEVAR BUSH IN THE HOUSE!” so I screamed it on the inside.

The Manhattan Project was not his only claim to fame or infamy. When it comes to the world we now live in, Bush’s idea of the memex has been almost equally influential. His article As We May Think became a touchstone for Douglas Engelbart and later Tim Berners-Lee.

But as Matt Thompson points out:

…the device he describes does not resemble the internet or anything I’ve ever found on it.

Then he says:

What Bush was describing sounds to me like what you might get if you turned a browser history — the most neglected piece of the software — into a robust and fully featured machine of its own. It would help you map the path you charted through a web of knowledge, refine those maps, order them, and share them

Yes! This!! I 100% agree with the description of browser history as “the most neglected piece of the software.” While I wouldn’t go as far as Chris when he says web browsers kind of suck, I’m kind of amazed that there hasn’t been more innovation and competition in this space.

If anything we’ve outsourced the management of our browsing history to services like Delicious and Pinboard, or to tools like Obsidian and Roam Research. Heck, the links section of my website is my attempt to manage and annotate my own associative trails.

Imagine if that were baked right into a web browser. Then imagine how beautiful such a rich source of data might look.

Like Matt says:

I don’t think anything like this exists. So Bush’s essay still transfixes me.

Decision time

I’ve always associated good design with thoughtfulness. Like, I should be able to point to any element in an interface and the designer should be able to tell me the reasons it’s there. Those reasons may be rooted in user needs or asthetics or some other consideration, but the point is that there’s a justification for it. Justify every pixel!

But I’ve come to realise that this is a bit reductionist. Now when I point at an interface element, I still expect the designer to be able to justify its inclusion, but I’d also like to know the trade-offs that were made.

Suppose there’s a large hero image. I’m sure the designer would have no problem justifying its inclusion on the basis of impact and the emotional heft it delivers. But did they also understand the potential downsides? Were they aware of the performance implications of including a large image?

I hope the answer to both questions is yes. They understood the costs, but they decided that, on balance, the positives outweighed the negatives.

When it comes to the positives, universal principles of design often apply. Colour theory, typography, proximity, and so on. But the downsides tend to be specific to the medium that the design is delivered in.

Let’s say you’re designing for print. You want to include an extra typeface just for footnotes. No problem. There isn’t really a downside. In print, you can use all the typefaces you want. But if this were for the web, then the calculation would be different. Every extra typeface comes with a performance penalty. A decision that might be justified in one medium might not work in another medium.

It works both ways; on the web you can use all the colours you want, without incurring any penalties, but in print—depending on the process you’re using—you might have to weigh up that decision very differently.

From this perspective, every design decision is like a balance sheet. A good web designer understands the benefits and the costs behind each decision they make.

It’s a similar story when it comes to web development. Heck, we even have the term “tech debt” to describe decisions that we know aren’t for the best in the long term.

In fact, I’d say that consideration of the long-term effects is something that should play a bigger part in technical decisions.

When we’re weighing up the pros and cons of using a particular tool, we have a tendency to think in the here and now. How might this help me right now? How might this hinder me right now?

But often a decision that delivers short-term gain may well end up delivering long-term pain.

Alexander Petros describes this succinctly:

Reopen a node repository after 3 months and you’ll find that your project is mired in a flurry of security warnings, backwards-incompatible library “upgrades,” and a frontend framework whose cultural peak was the exact moment you started the project and is now widely considered tech debt.

When I wrote about making the Patterns Day website I described my process as doing it “the long hard stupid way”—a term that Frank coined in a talk he gave a few years back. But perhaps my hands-on approach is only long, hard and stupid in the short time. With each passing year, the codebase will retain a degree of readability and accessibility that I would’ve sacrificed had I depended on automated build processes.

Robin Berjon puts this into the historical perspective of Taylorism and Luddism:

Whenever something is automated, you lose some control over it. Sometimes that loss of control improves your life because exerting control is work, and sometimes it worsens your life because it reduces your autonomy.

Or as Marshall McLuhan put it:

Every extension is also an amputation.

…which is fine as long as the benefits of the extension outweigh the costs of the amputation. My worry is that, when it comes to evaluating technology for building on the web, we aren’t considering the longer-term costs.

Maintenance matters. With the passing of time, maintenance matters more and more.

Maybe we avoid thinking about the long-term costs because it would lead to decision paralysis. That’s understandable. But I take comfort from some words of wisdom on the web from the 1990s. Tim Berners-Lee’s style guide for hypertext:

Because hypertext is potentially unconstrained you are a little daunted. Do not be. You can write a document as simply as you like. In many ways, the simpler the better.

Automation

I just described prototype code as code to be thrown away. On that topic…

I’ve been observing how people are programming with large language models and I’ve seen a few trends.

The first thing that just about everyone agrees on is that the code produced by a generative tool is not fit for public consumption. At least not straight away. It definitely needs to be checked and tested. If you enjoy debugging and doing code reviews, this might be right up your street.

The other option is to not use these tools for production code at all. Instead use them for throwaway code. That could be prototyping. But it could also be the code for those annoying admin tasks that you don’t do very often.

Take content migration. Say you need to grab a data dump, do some operations on the data to transform it in some way, and then pipe the results into a new content management system.

That’s almost certainly something you’d want to automate with bespoke code. Once the content migration is done, the code can be thrown away.

Read Matt’s account of coding up his Braggoscope. The code needed to spider a thousand web pages, extract data from those pages, find similarities, and output the newly-structured data in a different format.

I’ve noticed that these are just the kind of tasks that large language models are pretty good at. In effect you’re training the tool on your own very specific data and getting it to do your drudge work for you.

To me, it feels right that the usefulness happens on your own machine. You don’t put the machine-generated code in front of other humans.

Nailspotting

I’m sure you’ve heard the law of the instrument: when all you have is a hammer, everything looks like a nail.

There’s another side to it. If you’re selling hammers, you’ll depict a world full of nails.

Recent hammers include cryptobollocks and virtual reality. It wasn’t enough for blockchains and the metaverse to be potentially useful for some situations; they staked their reputations on being utterly transformative, disrupting absolutely every facet of life.

This kind of hype is a terrible strategy in the long-term. But if you can convince enough people in the short term, you can make a killing on the stock market. In truth, the technology itself is superfluous. It’s the hype that matters. And if the hype is over-inflated enough, you can even get your critics to do your work for you, broadcasting their fears about these supposedly world-changing technologies.

You’d think we’d learn. If an industry cries wolf enough times, surely we’d become less trusting of extraordinary claims. But the tech industry continues to cry wolf—or rather, “hammer!”—at regular intervals.

The latest hammer is machine learning, usually—incorrectly—referred to as Artificial Intelligence. What makes this hype cycle particularly infuriating is that there are genuine use cases. There are some nails for this hammer. They’re just not as plentiful as the breathless hype—both positive and negative—would have you believe.

When I was hosting the DiBi conference last week, there was a little section on generative “AI” tools. Matt Garbutt covered the visual side, demoing tools like Midjourney. Scott Salisbury covered the text side, showing how you can generate code. Afterwards we had a panel discussion.

During the panel I asked some fairly straightforward questions that nobody could answer. Who owns the input (the data used by these generative tools)? Who owns the output?

On the whole, it stayed quite grounded and mercifully free of hyperbole. Both speakers were treating the current crop of technologies as tools. Everyone agreed we were on the hype cycle, probably the peak of inflated expectations, looking forward to reaching the plateau of productivity.

Scott explicitly warned people off using generative tools for production code. His advice was to stick to side projects for now.

Matt took a closer look at where these tools could fit into your day-to-day design work. Mostly it was pretty sensible, except when he suggested that there could be any merit to using these tools as a replacement for user testing. That’s a terrible idea. A classic hammer/nail mismatch.

I think I moderated the panel reasonably well, but I have one regret. I wish I had first read Baldur Bjarnason’s new book, The Intelligence Illusion. I started reading it on the train journey back from Edinburgh but it would have been perfect for the panel.

The Intelligence Illusion is very level-headed. It is neither pro- nor anti-AI. Instead it takes a pragmatic look at both the benefits and the risks of using these tools in your business.

It has excellent advice for spotting genuine nails. For example:

Generative AI has impressive capabilities for converting and modifying seemingly unstructured data, such as prose, images, and audio. Using these tools for this purpose has less copyright risk, fewer legal risks, and is less error prone than using it to generate original output.

Think about transcripts of videos or podcasts—an excellent use of this technology. As Baldur puts it:

The safest and, probably, the most productive way to use generative AI is to not use it as generative AI. Instead, use it to explain, convert, or modify.

He also says:

Prefer internal tools over externally-facing chatbots.

That chimes with what I’ve been seeing. The most interesting uses of this technology that I’ve seen involve a constrained dataset. Like the way Luke trained a language model on his own content to create a useful chat interface.

Anyway, The Intelligence Illusion is full of practical down-to-earth advice based on plenty of research backed up with copious citations. I’m only halfway through it and it’s already helped me separate the hype from the reality.

Browser history

I woke up today to a very annoying new bug in Firefox. The browser shits the bed in an unpredictable fashion when rounding up single pixel line widths in SVG. That’s quite a problem on The Session where all the sheet music is rendered in SVG. Those thin lines in sheet music are kind of important.

Browser bugs like these are very frustrating. There’s nothing you can do from your side other than filing a bug. The locus of control is very much with the developers of the browser.

Still, the occasional regression in a browser is a price I’m willing to pay for a plurality of rendering engines. Call me old-fashioned but I still value the ecological impact of browser diversity.

That said, I understand the argument for converging on a single rendering engine. I don’t agree with it but I understand it. It’s like this…

Back in the bad old days of the original browser wars, the browser companies just made shit up. That made life a misery for web developers. The Web Standards Project knocked some heads together. Netscape and Microsoft would agree to support standards.

So that’s where the bar was set: browsers agreed to work to the same standards, but competed by having different rendering engines.

There’s an argument to be made for raising that bar: browsers agree to work to the same standards, and have the same shared rendering engine, but compete by innovating in all other areas—the browser chrome, personalisation, privacy, and so on.

Like I said, I understand the argument. I just don’t agree with it.

One reason for zeroing in a single rendering engine is that it’s just too damned hard to create or maintain an entirely different rendering engine now that web standards are incredibly powerful and complex. Only a very large company with very deep pockets can hope to be a rendering engine player. Google. Apple. Heck, even Microsoft threw in the towel and abandoned their rendering engine in favour of Blink and V8.

And yet. Andreas Kling recently wrote about the Ladybird browser. How we’re building a browser when it’s supposed to be impossible:

The ECMAScript, HTML, and CSS specifications today are (for the most part) stellar technical documents whose algorithms can be implemented with considerably less effort and guesswork than in the past.

I’ll be watching that project with interest. Not because I plan to use the brower. I’d just like to see some evidence against the complexity argument.

Meanwhile most other browser projects are building on the raised bar of a shared browser engine. Blisk, Brave, and Arc all use Chromium under the hood.

Arc is the most interesting one. Built by the wonderfully named Browser Company of New York, it’s attempting to inject some fresh thinking into everything outside of the rendering engine.

Experiments like Arc feel like they could have more in common with tools-for-thought software like Obsidian and Roam Research. Those tools build knowledge graphs of connected nodes. A kind of hypertext of ideas. But we’ve already got hypertext tools we use every day: web browsers. It’s just that they don’t do much with the accumulated knowledge of our web browsing. Our browsing history is a boring reverse chronological list instead of a cool-looking knowledge graph or timeline.

For inspiration we can go all the way back to Vannevar Bush’s genuinely seminal 1945 article, As We May Think. Bush imagined device, the Memex, was a direct inspiration on Douglas Engelbart, Ted Nelson, and Tim Berners-Lee.

The article describes a kind of hypertext machine that worked with microfilm. Thanks to Tim Berners-Lee’s World Wide Web, we now have a global digital hypertext system that we access every day through our browsers.

But the article also described the idea of “associative trails”:

Wholly new forms of encyclopedias will appear, ready made with a mesh of associative trails running through them, ready to be dropped into the memex and there amplified.

Our browsing histories are a kind of associative trail. They’re as unique as fingerprints. Even if everyone in the world started on the same URL, our browsing histories would quickly diverge.

Bush imagined that these associative trails could be shared:

The lawyer has at his touch the associated opinions and decisions of his whole experience, and of the experience of friends and authorities.

Heck, making a useful browsing history could be a real skill:

There is a new profession of trail blazers, those who find delight in the task of establishing useful trails through the enormous mass of the common record.

Taking something personal and making it public isn’t a new idea. It was what drove the wave of web 2.0 startups. Before Flickr, your photos were private. Before Delicous, your bookmarks were private. Before Last.fm, what music you were listening to was private.

I’m not saying that we should all make our browsing histories public. That would be a security nightmare. But I am saying there’s a lot of untapped potential in our browsing histories.

Let’s say we keep our browsing histories private, but make better use of them.

From what I’ve seen of large language model tools, the people getting most use of out of them are training them on a specific corpus. Like, “take this book and then answer my questions about the characters and plot” or “take this codebase and then answer my questions about the code.” If you treat these chatbots as calculators for words they can be useful for some tasks.

Large language model tools are getting smaller and more portable. It’s not hard to imagine one getting bundled into a web browser. It feeds on your browsing history. The bigger your browsing history, the more useful it can be.

Except, y’know, for the times when it just make shit up.

Vannevar Bush didn’t predict a Memex that would hallucinate bits of microfilm that didn’t exist.