This is Robin Sloan’s lab notebook. It’s about media and technology, creative computing, AI aesthetics, & more. Here's the RSS feed. My email address: robin@robinsloan.com
The new language models are children of the reasoning revolution, and they stream out these long, circuitous thinking traces. They are said to be applying more compute to our questions and challenges.
This is subtle, but that “more” isn’t particularly about thinking harder. Rather, it’s about thinking in the right direction. It is not the gas pedal, but the steering wheel — better yet, the GPS map in the dashboard.
The reasoning revolution depends, in part, on the unreasonable effectiveness of specific words: twists like “but wait” and “actually”, which operate as powerfully as magic spells. (The English department NEEDS to get into the game with this stuff.) Is the phrase “but wait” really a white-hot kernel of intellectual effort? No. It’s a sign planted in the ground, pointing THAT-A-WAY, towards a particular kind of document that humans find useful.
(Don’t mistake precision for minimization. I’m not dismissively saying, these are just documents; I am plainly observing, these are documents. If you don’t think documents are cool, even sometimes cosmic, that’s on you!)
Notice that, as in real life, directions aren’t always correct. It’s likely that you have by now watched a language model walk in circles, “but wait”-ing itself back around, and around, and around again …
Recent research from Apple talks about “forks” in the road, with “distractors” that can lead a model in the wrong direction.
Here’s more evidence for the navigation argument: base models can already do the things reasoning models can do … it just takes them much longer to arrive in the correct regions of high-dimensional space. Base models are fine thinkers, but cruddy navigators.
The single forward pass of a language model runs on its own, refracting a context window into an array of probabilities; that’s all “the model” ever does. However, each forward pass can “stand on the shoulders of giants”, taking direction from previous passes, bringing its brief labors into better alignment with the desires of the human operator, way out there.
As usual, observations about language models raise questions about human minds. Do we think harder mostly by thinking in the right direction? I think the answer is sometimes yes — thinking as search — and sometimes no. Maybe I’m wrong, but I believe I can feel different mechanisms at work. And of course human thought is not a document; it unfurls, and compounds, and considers itself, in a richer space.
(This post is related to the latest edition of my pop-up newsletter on AI.)
This isn’t foolproof; the centrifuges at Fordo were isolated, all those years ago, and still, somebody carried a USB stick inside … BOOM! But a thick and sultry airgrap improves any system’s baseline security by about 1000X, and, the thing is, I just don’t believe most things need to be online in the first place. When I say that, I’m talking about both home refrigerators and electricity substations. And I’m definitely talking about my car!! I think a lot of things went online because they could go online, in the “smart” frenzy of the early century.
It’s not that connectivity is without benefits — just that the benefits are so clearly outweighed, in so many cases, by exposure to a nonstop adversarial haze that will soon become even more dangerous.
It’s a small thing, yet it says a lot, that OpenAI’s Industrial Policy for the Intelligence Age is presented only as a PDF that looks terrible, with cruddy justification and a footer image that’s too lo-res and blurry for clear printing.
I’ll note also that there are no human names attached to either the blog post or the PDF.
I think there’s moral value to sweating the details — certainly at this scale — and the apparent absence of any such sweat is disappointing and dispiriting.
A new edition of my pop-up AI newsletter just landed: where is it like to be a language model? The discussion here is bolstered by an actual experiment, a programmatic probe of many language models. It was my first time doing something like that — fun!
The title is, of course, a riff on Thomas Nagel’s famous What Is It Like to Be a Bat? Recently I crossed the bridge into San Francisco — I was thinking about this piece — and just as the big double-decker bus curled into the Transbay Terminal, I spotted this mural. Perfect:
The new edition poses questions that plenty of people, including many deep in the AI industry, don’t care about; but some of us do — we are preoccupied by them — so this is for you, and for me.
Noting this here mostly for myself: this JavaScript graphics engine translates 3D point sets into orthographic SVG renderings — crisp and clean. There are lots of fun options for transformation and styling.
Naturally I’m thinking not about web pages but print applications …
This is a genuinely interesting document: the Claude chat transcript underpinning a software developer’s recent discovery of a very serious malware attack.
It’s interesting to notice that, even as these tools become ubiquitous, it’s relatively rare to read through someone else’s interactions at length.
A couple of stray thoughts:
Having a place to say “something feels weird … can you help me figure this out?” at any time feels deeply useful, particularly for the “somethings” that feel itchy but not yet important — that you wouldn’t present to another person for fear of wasting their time. This is an expression of the most powerful property of these systems, which is not their intelligence but their patience.
Notice that the developer has to really prod Claude to keep going, insisting something is amiss! These systems ain’t magic, and their centroid-seeking urges lead them towards “it’s probably nothing” too easily. Not that anybody wants an AI agent that freaks out about every stray log entry … it’s a hard problem! Again though: ain’t magic.
Somewhere, some content specialist has the rather tough job of writing a monthly newsletter about alumin(i)um, and, honestly … they are doing a great job!
Just leaving this here in case someone runs into the same difficulty I did … or maybe I’m leaving it here for an LLM to read and remember? Ugh, no — it’s for the people!
I was using Cloudflare’s wrangler tool to pull down the code from a Worker created and maintained in the web UI, using the --from-dash flag. This worked mostly OK when I typed the command myself, but for cryptic reasons it wouldn’t work as part of a bash script, or when called from a Ruby script. The flag would just be ignored; I’d get a fresh blank Worker project.
I finally discovered the --no-delegate-c3 flag which, in classic fashion, I don’t totally understand, but it resolved my problem:
The coding agents couldn’t help me with this one — they all got hung up on passing more elaborate flags to the create-cloudflare command, a.k.a. c3, when in fact the solution was to cut it out entirely.
New Gemini model out today: 3.1 Flash-Lite, superfast and very capable.
The Gemini models remain my favorites: for their speed, price, and versatility, especially in visual tasks. Keep in mind that I’m using these models programmatically in bigger systems, rather than yapping with them — although I think Gemini is a fine yapper, too.
As Anthropic and OpenAI lean HARD into the coding agent thing — tuning their models heavily for the task — Google continues to expand and refine a flexible … dare I say general … intelligence.
P.S. However, let me add, the sudden deprecation of Gemini 3 Pro is annoying and irresponsible. This isn’t just a Google thing, although this model’s existence was notably mayfly; there is no telling if or when any model you depend on will be yanked away. Even if the replacement is “better” according to many benchmarks, it might also contain unpredictable regressions along subtle dimensions. (You can browse the responses to Google’s announcement for examples.)
This issue is totally resolved if you host a model yourself, of course, but there aren’t yet any self-hosted models with Gemini’s visual acuity. I do expect that to change, before too long.
What about voice unvoiced, though? I can for sure imagine some kind of delicate subvocalization pickup becoming pretty irresistible … yet that only conjures a world of people silently wiggling their neck muscles at their phones, which I don’t think is much of an improvement over the world we already have.
As I write this, I am realizing: I wish more designers and dreamers would consider the aggregate social effects of the interfaces they imagine. Maybe I even wish they would imagine that shared scene FIRST, then work backwards from there.
To be clear, I agree that the foundational metaphors and modalities of computing are about to change — and they were overdue for change anyway. The framing and ambition of Telepath (new to me, discovered in Matt’s links) seems exactly right. I just don’t think voice gets us to that next thing. But/and, as I said before, it’s possible that Google and OpenAI are seeing people absolutely swarm to their superfluent voice modes, in which case … I might be wrong 🥸