Tags: formation

68

sparkline

Monday, April 7th, 2025

Denial

The Wikimedia Foundation, stewards of the finest projects on the web, have written about the hammering their servers are taking from the scraping bots that feed large language models.

Our infrastructure is built to sustain sudden traffic spikes from humans during high-interest events, but the amount of traffic generated by scraper bots is unprecedented and presents growing risks and costs.

Drew DeVault puts it more bluntly, saying Please stop externalizing your costs directly into my face:

Over the past few months, instead of working on our priorities at SourceHut, I have spent anywhere from 20-100% of my time in any given week mitigating hyper-aggressive LLM crawlers at scale.

And no, a robots.txt file doesn’t help.

If you think these crawlers respect robots.txt then you are several assumptions of good faith removed from reality. These bots crawl everything they can find, robots.txt be damned.

Free and open source projects are particularly vulnerable. FOSS infrastructure is under attack by AI companies:

LLM scrapers are taking down FOSS projects’ infrastructure, and it’s getting worse.

You try to do the right thing by making knowledge and tools freely available. This is how you get repaid. AI bots are destroying Open Access:

There’s a war going on on the Internet. AI companies with billions to burn are hard at work destroying the websites of libraries, archives, non-profit organizations, and scholarly publishers, anyone who is working to make quality information universally available on the internet.

My own experience with The Session bears this out.

Ars Technica has a piece on this: Open source devs say AI crawlers dominate traffic, forcing blocks on entire countries .

So does MIT Technology Review: AI crawler wars threaten to make the web more closed for everyone.

When we talk about the unfair practices and harm done by training large language models, we usually talk about it in the past tense: how they were trained on other people’s creative work without permission. But this is an ongoing problem that’s just getting worse.

The worst of the internet is continuously attacking the best of the internet. This is a distributed denial of service attack on the good parts of the World Wide Web.

If you’re using the products powered by these attacks, you’re part of the problem. Don’t pretend it’s cute to ask ChatGPT for something. Don’t pretend it’s somehow being technologically open-minded to continuously search for nails to hit with the latest “AI” hammers.

If you’re going to use generative tools powered by large language models, don’t pretend you don’t know how your sausage is made.

Friday, March 28th, 2025

Open source devs say AI crawlers dominate traffic, forcing blocks on entire countries - Ars Technica

As it currently stands, both the rapid growth of AI-generated content overwhelming online spaces and aggressive web-crawling practices by AI firms threaten the sustainability of essential online resources. The current approach taken by some large AI companies—extracting vast amounts of data from open-source projects without clear consent or compensation—risks severely damaging the very digital ecosystem on which these AI models depend.

Wednesday, March 26th, 2025

Go To Hellman: AI bots are destroying Open Access

AI companies with billions to burn are hard at work destroying the websites of libraries, archives, non-profit organizations, and scholarly publishers, anyone who is working to make quality information universally available on the internet.

Friday, March 21st, 2025

The Blowtorch Theory: A New Model for Structure Formation in the Universe

Make yourself a nice cup of tea and settle in with Julian Gough’s magnum opus:

How early, sustained, supermassive black hole jets carved out cosmic voids, shaped filaments, and generated magnetic fields

Wednesday, February 12th, 2025

AI wants to rule the World, but it can’t handle dairy.

AI has the same problem that I saw ten year ago at IBM. And remember that IBM has been at this AI game for a very long time. Much longer than OpenAI or any of the new kids on the block. All of the shit we’re seeing today? Anyone who worked on or near Watson saw or experienced the same problems long ago.

Tuesday, January 21st, 2025

The New York Good Times

Better than the real thing. All true too.

Refresh for more.

What I’ve learned about writing AI apps so far | Seldo.com

LLMs are good at transforming text into less text

Laurie is really onto something with this:

This is the biggest and most fundamental thing about LLMs, and a great rule of thumb for what’s going to be an effective LLM application. Is what you’re doing taking a large amount of text and asking the LLM to convert it into a smaller amount of text? Then it’s probably going to be great at it. If you’re asking it to convert into a roughly equal amount of text it will be so-so. If you’re asking it to create more text than you gave it, forget about it.

Depending how much of the hype around AI you’ve taken on board, the idea that they “take text and turn it into less text” might seem gigantic back-pedal away from previous claims of what AI can do. But taking text and turning it into less text is still an enormous field of endeavour, and a huge market. It’s still very exciting, all the more exciting because it’s got clear boundaries and isn’t hype-driven over-reaching, or dependent on LLMs overnight becoming way better than they currently are.

Wednesday, January 15th, 2025

Prescriptive and Descriptive Information Architectures | Jorge Arango

Interesting—this is exactly the same framing I used to talk about design systems a few years ago.

Thursday, November 7th, 2024

Information literacy and chatbots as search • Buttondown

If someone uses an LLM as a replacement for search, and the output they get is correct, this is just by chance. Furthermore, a system that is right 95% of the time is arguably more dangerous tthan one that is right 50% of the time. People will be more likely to trust the output, and likely less able to fact check the 5%.

Daring Fireball: Kottke on the Art and Power of Hypertextual Writing

Hypertext links are an information-density multiplier.

The way I’ve long thought about it is that traditional writing — like for print — feels two-dimensional. Writing for the web adds a third dimension. It’s not an equal dimension, though. It doesn’t turn writing from a flat plane into a full three-dimensional cube. It’s still primarily about the same two dimensions as old-fashioned writing. What hypertext links provide is an extra layer of depth. Just the fact that the links are there — even if you, the reader, don’t follow them — makes a sentence read slightly differently. It adds meaning in a way that is unique to the web as a medium for prose.

Thursday, September 5th, 2024

Information Architecture First Principles | Jorge Arango

  • People only understand things relative to things they already understand
  • People only understand things in context
  • People rely on patterns and consistency
  • People seek to minimize cognitive load
  • People have varying levels of expertise and familiarity
  • People are goal-oriented
  • People often don’t know what they’re looking for
  • Information is more useful when it’s actionable

Wednesday, July 10th, 2024

Directory enquiries

I was having a discussion with some of my peers a little while back. We were collectively commenting on the state of education and documentation for front-end development.

A lot of the old stalwarts have fallen by the wayside of late. CSS Tricks hasn’t been the same since it got bought out by Digital Ocean. A List Apart goes through fallow periods. Even the Mozilla Developer Network is looking to squander its trust by adding inaccurate “content” generated by a large language model.

The most obvious solution is to start up a brand new resource for front-end developers. But there are two probems with that:

  1. It’s really, really, really hard work, and
  2. It feels a bit 927.

I actually think there are plenty of good articles and resources on front-end development being published. But they’re not being published in any one specific place. People are publishing them on their own websites.

Ahmed, Josh, Stephanie, Andy, Lea, Rachel, Robin, Michelle …I could go on, but you get the picture.

All this wonderful stuff is distributed across the web. If you have a well-stocked RSS reader, you’re all set. But if you’re new to front-end development, how do you know where to find this stuff? I don’t think you can rely on search, unless you have a taste for slop.

I think the solution lies not with some hand-wavey “AI” algorithm that burns a forest for every query. I think the solution lies with human curation.

I take inspiration from Phil’s fantastic project, ooh.directory. Imagine taking that idea of categorisation and applying it to front-end dev resources.

Whether it’s a post on web.dev, Smashing Magazine, or someone’s personal site, it could be included and categorised appropriately.

Now, there would still be a lot of work involved, especially in listing and categorising the articles that are already out there, but it wouldn’t be nearly as much work as trying to create those articles from scratch.

I don’t know what the categories should be. Does it make sense to have top-level categories for HTML, CSS, and JavaScript, with sub-directories within them? Or does it make more sense to categorise by topics like accessibility, animation, and so on?

And this being the web, there’s no reason why one article couldn’t be tagged to simultaneously live in multiple categories.

There’s plenty of meaty information architecture work to be done. And there’d be no shortage of ongoing work to handle new submissions.

A stretch goal could be the creation of “playlists” of hand-picked articles. “Want to get started with CSS grid layout? Read that article over there, watch this YouTube video, and study this page on MDN.”

What do you think? Does this one-stop shop of hyperlinks sound like it would be useful? Does it sound feasible?

I’m just throwing this out there. I’d love it if someone were to run with it.

Friday, May 17th, 2024

Labels

I love libraries. I think they’re one of humanity’s greatest inventions.

My local library here in Brighton is terrific. It’s well-stocked, it’s got a welcoming atmosphere, and it’s in a great location.

But it has an information architecture problem.

Like most libraries, it’s using the Dewey Decimal system. It’s not a great system, but every classification system is going to have flaws—wherever you draw boundaries, there will be disagreement.

The Dewey Decimal class of 900 is for history and geography. Within that class, those 100 numbers (900 to 999) are further subdivded in groups of 10. For example, everything from 940 to 949 is for the history of Europe.

Dewey Decimal number 941 is for the history of the British Isles. The term “British Isles” is a geographical designation. It’s not a good geographical designation, but technically it’s not a political term. So it’s actually pretty smart to use a geographical rather than a political term for categorisation: geology moves a lot slower than politics.

But the Brighton Library is using the wrong label for their shelves. Everything under 941 is labelled “British History.”

The island of Ireland is part of the British Isles.

The Republic of Ireland is most definitely not part of Britain.

Seeing books about the history of Ireland, including post-colonial history, on a shelf labelled “British History” is …not good. Frankly, it’s offensive.

(I mentioned this situation to an English friend of mine, who said “Well, Ireland was once part of the British Empire”, to which I responded that all the books in the library about India should also be filed under “British History” by that logic.)

Just to be clear, I’m not saying there’s a problem with the library using the Dewey Decimal system. I’m saying they’re technically not using the system. They’ve deviated from the system’s labels by treating “History of the British Isles” and “British History” as synonymous.

I spoke to the library manager. They told me to write an email. I’ve written an email. We’ll see what happens.

You might think I’m being overly pedantic. That’s fair. But the fact this is happening in a library in England adds to the problem. It’s not just technically incorrect, it’s culturally clueless.

Mind you, I have noticed that quite a few English people have a somewhat fuzzy idea about the Republic of Ireland. Like, they understand it’s a different country, but they think it’s a different country in the way that Scotland is a different country, or Wales is a different country. They don’t seem to grasp that Ireland is a different country like France is a different country or Germany is a different country.

It would be charming if not for, y’know, those centuries of subjugation, exploitation, and forced starvation.

British history.

Update: They fixed it!

Thursday, February 8th, 2024

Beyond the Frame | Untangling Non-Linearity

A fascinating look at the connections between hypertext and film editing. I’m a sucker for any article that cites both Ted Nelson and Walter Murch.

Sunday, January 7th, 2024

RSS Anything

Next time you’re frustrated by a website that doesn’t provide an RSS feed, try using this tool:

Transform any old website with a list of links into an RSS Feed

Friday, January 5th, 2024

The Website vs. Web App Dichotomy Doesn’t Exist | jakelazaroff.com

Amen!

If there’s one takeaway from all this, it’s that the web is a flexible medium where any number of technologies can be combined in all sorts of interesting ways.

Monday, October 23rd, 2023

The map-reduce is not the territory

Unlike many people, I’m not particularly worried about AI replacing peoples’ jobs, although employers will certainly try and use it to reduce their headcount. I’m more worried about it transforming jobs into roles without agency or space to be human. Imagine a world where performance reviews are conducted by software; where deviance from the norm is flagged electronically, and where hiring and firing can be performed without input from a human. Imagine models that can predict when unionization is about to occur in a workplace. All of this exists today, but in relatively experimental form. Capital needs predictability and scale; for most jobs, the incentives are not in favor of human diversity and intuition.

Thursday, August 10th, 2023

Undersea Cables by Rishi Sunak [PDF]

Years before becoming Prime Minister of the UK, Rishi Sunak wrote this report, Undersea Cables: Indispensable, insecure.

Tuesday, March 28th, 2023

Design transformation on the Clearleft podcast

Boom! The Clearleft podcast is back!

The first episode of season four just dropped. It’s all about design transformation.

I’ve got to be honest, this episode is a little inside baseball. It’s a bit navel-gazey and soul-searching as I pick apart the messaging emblazoned on the Clearleft website:

The design transformation consultancy.

Whereas most of the previous episodes of the podcast would be of interest to our peers—fellow designers—this one feels like it might of more interest to potential clients. But I hope it’s not too sales-y.

You’ll hear from Danish designer Maja Raunbak, and American in Amsterdam Nick Thiel as well as Clearleft’s own Chris Pearce. And I’ve sampled a talk from the Leading Design archives by Stuart Frisby.

The episode clocks in at a brisk eighteen and a half minutes. Have a listen.

While you’re at it, take this opportunity to subscribe to the Clearleft podcast on Overcast, Spotify, Apple, Google or by using a good ol’-fashioned RSS feed. That way the next episodes in the season will magically appear in your podcatching software of choice.

But I’m not making any promises about when that will be. Previously, I released new episodes in a season on a weekly basis. This time I’m going to release each episode whenever it’s ready. That might mean there’ll be a week or two between episodes. Or there might be a month or so between episodes.

I realise that this unpredictable release cycle is the exact opposite of what you’re supposed to do, but it’s actually the most sensible way for me to make sure the podcast actually gets out. I was getting a bit overwhelmed with the prospect of having six episodes ready to launch over a six week period. What with curating UX London and other activities, it would’ve been too much for me to do.

So rather than delay this season any longer, I’m going to drop each episode whenever it’s done. Chaos! Anarchy! Dogs and cats living together!