Home-Cooked Software and Barefoot Developers
A very thought-provoking presentation from Maggie on how software development might be democratised.
A very thought-provoking presentation from Maggie on how software development might be democratised.
After I wrote positively about the speculation rules API I got an email from David Cizek with some legitimate concerns. He said:
I think that this kind of feature is not good, because someone else (web publisher) decides that I (my connection, browser, device) have to do work that very often is not needed. All that blurred by blackbox algorithm in the browser.
That’s fair. My hope is that the user will indeed get more say, whether that’s at the level of the browser or the operating system. I’m thinking of a prefers-reduced-data setting, much like prefers-color-scheme or prefers-reduced-motion.
But this issue isn’t something new with speculation rules. We’ve already got service workers, which allow the site author to unilaterally declare that a bunch of pages should be downloaded.
I’m doing that for Resilient Web Design—when you visit the home page, a service worker downloads the whole site. I can justify that decision to myself because the entire site is still smaller in size than one article from Wired or the New York Times. But still, is it right that I get to make that call?
So I’m very much in favour of browsers acting as true user agents—doing what’s best for the user, even in situations where that conflicts with the wishes of a site owner.
Going back to speculation rules, David asked:
Do we really need this kind of (easily turned to evil) enhancement in the current state of (web) affairs?
That question could be asked of many web technologies.
There’s always going to be a tension with any powerful browser feature. The more power it provides, the more it can be abused. Animations, service workers, speculation rules—these are all things that can be used to improve websites or they can be abused to do things the user never asked for.
Or take the elephant in the room: JavaScript.
Right now, a site owner can link to a JavaScript file that’s tens of megabytes in size, and the browser has no alternative but to download it. I’d love it if users could specify a limit. I’d love it even more if browsers shipped with a default limit, especially if that limit is related to the device and network.
I don’t think speculation rules will be abused nearly as much as client-side JavaScript is already abused.
Checked in at Fox On the Downs. Starting St. Patrick’s Day right — with Jessica
Checked in at De Koningshut. Bitterballen and beer
Checked in at Neighbourhood Café. Turkish eggs for breakfast — with Jessica
I just attended this talk from Heydon at axe-con and it was great! Of course it was highly amusing, but he also makes a profound and fundamental point about how we should be going about working on the web.
Checked in at Brighton Dome. Watching Chris How at UX Brighton
I can get behind this:
I take it as my starting point that when we say that we want to build a better Web our guiding star is to improve user agency and that user agency is what the Web is for.
Robin dives into the philosphy and ethics of this position, but he also points to some very concrete implementations of it:
These shared foundations for Web technologies (which the W3C refers to as “horizontal review” but they have broader applicability in the Web community beyond standards) are all specific, concrete implementations of the Web’s goal of developing user agency — they are about capabilities. We don’t habitually think of them as ethical or political goals, but they are: they aren’t random things that someone did for fun — they serve a purpose. And they work because they implement ethics that get dirty with the tangible details.
A few months back, I wrote about how Google is breaking its social contract with the web, harvesting our content not in order to send search traffic to relevant results, but to feed a large language model that will spew auto-completed sentences instead.
I still think Chris put it best:
I just think it’s fuckin’ rude.
When it comes to the crawlers that are ingesting our words to feed large language models, Neil Clarke describes the situtation:
It should be strictly opt-in. No one should be required to provide their work for free to any person or organization. The online community is under no responsibility to help them create their products. Some will declare that I am “Anti-AI” for saying such things, but that would be a misrepresentation. I am not declaring that these systems should be torn down, simply that their developers aren’t entitled to our work. They can still build those systems with purchased or donated data.
Alas, the current situation is opt-out. The onus is on us to update our robots.txt file.
Neil handily provides the current list to add to your file. Pass it on:
User-agent: CCBot
Disallow: /
User-agent: ChatGPT-User
Disallow: /
User-agent: GPTBot
Disallow: /
User-agent: Google-Extended
Disallow: /
User-agent: Omgilibot
Disallow: /
User-agent: FacebookBot
Disallow: /
In theory you should be able to group those user agents together, but citation needed on whether that’s honoured everywhere:
User-agent: CCBot
User-agent: ChatGPT-User
User-agent: GPTBot
User-agent: Google-Extended
User-agent: Omgilibot
User-agent: FacebookBot
Disallow: /
There’s a bigger issue with robots.txt though. It too is a social contract. And as we’ve seen, when it comes to large language models, social contracts are being ripped up by the companies looking to feed their beasts.
As Jim says:
I realized why I hadn’t yet added any rules to my
robots.txt: I have zero faith in it.
That realisation was prompted in part by Manuel Moreale’s experiment with blocking crawlers:
So, what’s the takeaway here? I guess that the vast majority of crawlers don’t give a shit about your
robots.txt.
Time to up the ante. Neil’s post offers an option if you’re running Apache. Either in .htaccess or in a .conf file, you can block user agents using mod_rewrite:
RewriteEngine On
RewriteCond %{HTTP_USER_AGENT} (CCBot|ChatGPT|GPTBot|Omgilibot| FacebookBot) [NC]
RewriteRule ^ – [F]
You’ll see that Google-Extended isn’t that list. It isn’t a crawler. Rather it’s the permissions model that Google have implemented for using your site’s content to train large language models: unless you opt out via robots.txt, it’s assumed that you’re totally fine with your content being used to feed their stochastic parrots.
Checked in at Taberna La Concha. Vermut! — with Jessica
Checked in at Puerta de Mérida. Dining al fresco — with Jessica
Checked in at Asador Carlos V. Lunch on the square — with Jessica
Checked in at Taberna El Rincón. Croquetas y vino — with Jessica
Checked in at La Minerva. Pulpo! — with Jessica
Checked in at almagesto. Lunch on the square—mogote de cerdo Ibérico — with Jessica
Checked in at Mercado de San Fernando. Vermut! — with Jessica
Checked in at Dover Castle. Tuesday night session — with Jessica
Checked in at The Ancient Mariner. First Thursday of the month session — with Jessica
Checked in at Mili. Seafood feast! — with Jessica
Checked in at Russ & Daughters Café. Herrings, pickles, and devilled eggs — with Jessica