gilest.org: A chat with 19-year-old me
I love this conversation.
I love this conversation.
A good overview of how large language models work:
The words flow together because they’ve been seen together many times. But that doesn’t mean they’re right. It just means they’re coherent.
Want to use all those great features that have been in landing in browsers over the past year or two? View transitions! Scroll-driven animations! So much more!
Well, your coding co-pilot is not going to going to be of any help.
Large language models, especially those on the scale of many of the most accessible, popular hosted options, take humongous datasets and long periods to train. By the time everything has been scraped and a dataset has been built, the set is on some level already obsolete. Then, before a model can reach the hands of consumers, time must be taken to train and evaluate it, and then even more to finally deploy it.
Once it has finally released, it usually remains stagnant in terms of having its knowledge updated. This creates an AI knowledge gap. A period between the present and AI’s training cutoff. This gap creates a time between when a new technology emerges and when AI systems can effectively support user needs regarding its adoption, meaning that models will not be able to service users requesting assistance with new technologies, thus disincentivising their use.
So we get this instead:
I’ve anecdotally noticed that many AI tools have a ‘preference’ for React and Tailwind when asked to tackle a web-based task, or even to create any app involving an interface at all.
If someone uses an LLM as a replacement for search, and the output they get is correct, this is just by chance. Furthermore, a system that is right 95% of the time is arguably more dangerous tthan one that is right 50% of the time. People will be more likely to trust the output, and likely less able to fact check the 5%.
My reaction to this surprised me: I was repelled
I imagine this is what it feels like when you’re on a phone call with someone and towards the end of the call you hear a distinct flushing sound.
AI is the most anthropomorphized technology in history, starting with the name—intelligence—and plenty of other words thrown around the field: learning, neural, vision, attention, bias, hallucination. These references only make sense to us because they are hallmarks of being human.
But ascribing human qualities to AI is not serving us well. Anthropomorphizing statistical models leads to confusion about what AI does well, what it does poorly, what form it should take, and our agency over all of the above.
There is something kind of pathological going on here. One of the most exciting advances in computer science ever achieved, with so many promising uses, and we can’t think beyond the most obvious, least useful application? What, because we want to see ourselves in this technology?
Meanwhile, we are under-investing in more precise, high-value applications of LLMs that treat generative A.I. models not as people but as tools.
Anthropomorphizing AI not only misleads, but suggests we are on equal footing with, even subservient to, this technology, and there’s nothing we can do about it.
The output of generative tools based on large language models gives me the ick.
This isn’t a measured logical response. It’s more of an involuntary emotional reaction.
I could try to justify my reaction by saying I’m concerned about the exploitation involved in the training data, or the huge energy costs involved, or the disenfranchisement of people who create art. But those would be post-facto rationalisations.
I just find myself wrinkling my nose and mentally going “Ew!” whenever somebody posts the output of some prompt they gave to ChatGPT or Midjourney.
Again, I’m not saying this is rational. It’s more instinctual.
You could well say that this is my problem. You may be right. But I wonder what it is that’s so unheimlich about these outputs that triggers my response.
Just to clarify, I am talking about direct outputs, shared verbatim. If someone were to use one of these tools in the process of creating something I’d be none the wiser. I probably couldn’t even tell that a large language model was involved at some point. I’m fine with that. It’s when someone takes something directly from one of these tools and then shares it online, that’s what raises my bile.
I was at a conference a few months back where your badge featured a hallucinated picture of you. Now, this probably sounded like a fun idea. It probably is a fun idea. I can’t tell. All I know is that it made me feel a little queasy.
Perhaps it’s a question of taste. In which case, I’m being a snob. I’m literally turning my nose up at something I deem to be tacky.
But isn’t it tacky, though? It’s not something I can describe, but there’s just something about the vibe of these images—and words—that feels off. It’s sort of creepy, but it’s mostly just the mediocrity that sits so uneasily with me.
These tools do an amazing job of solving the quantity problem—how to produce an image or piece of text quickly. And by most measurements, you could say that they also solve the quality problem. These outputs are good enough to pass for “the real thing.” The outputs are, like, 90% to 95% there. And the gap is closing.
And yet. There’s something in that gap. Something that I feel in my gut. Something that makes me go “nope.”
Don’t get me wrong, there are some features under the mislabeled bracket of AI that have made a huge impact and improvement to my process. Audio transcription has been an absolute game-changer to research analysis, reimbursing me hours of time to focus on the deep thinking work. This is a perfect example of a problem seeking a solution, not the other way around. The latest wave of features feel a lot like because we can rather than we should, because.
GPT-4 is impressive, but a layperson can’t wield it the way a programmer can. I still feel secure in my profession. In fact, I feel somewhat more secure than before. As software gets easier to make, it’ll proliferate; programmers will be tasked with its design, its configuration, and its maintenance. And though I’ve always found the fiddly parts of programming the most calming, and the most essential, I’m not especially good at them. I’ve failed many classic coding interview tests of the kind you find at Big Tech companies. The thing I’m relatively good at is knowing what’s worth building, what users like, how to communicate both technically and humanely. A friend of mine has called this A.I. moment “the revenge of the so-so programmer.” As coding per se begins to matter less, maybe softer skills will shine.
I realized why I hadn’t yet added any rules to my
robots.txt
: I have zero faith in it.
I got a nice email from someone regarding my recent posts about performance on The Session. They said:
I hope this message finds you well. First and foremost, I want to express how impressed I am with the overall performance of https://thesession.org/. It’s a fantastic resource for music enthusiasts like me.
How nice! I responded, thanking them for the kind words.
They sent a follow-up clarification:
Awesome, anyway there was an issue in my message.
The line ‘It’s a fantastic resource for music enthusiasts like me.’ added by chatGPT and I didn’t notice.
I imagine this is what it feels like when you’re on a phone call with someone and towards the end of the call you hear a distinct flushing sound.
I wrote back and told them about Simon’s rule:
I will not publish anything that takes someone else longer to read than it took me to write.
That just feels so rude!
I think that’s a good rule.
The slides and transcript from a great talk by Maggie Appleton, including this perfect description of the vibes we get from large language models:
It feels like they’re either geniuses playing dumb or dumb machines playing genius, but we don’t know which.
Another great talk from Simon that explains large language models in a hype-free way.
Now that the horse has bolted—and ransacked the web—you can shut the barn door:
To disallow GPTBot to access your site you can add the GPTBot to your site’s robots.txt:
User-agent: GPTBot Disallow: /
This is a really clear, practical, level-headed explanatory talk from Simon. You can read the transcript or watch the video.
Taken together, these flaws make LLMs look less like an information technology and more like a modern mechanisation of the psychic hotline.
Delegating your decision-making, ranking, assessment, strategising, analysis, or any other form of reasoning to a chatbot becomes the functional equivalent to phoning a psychic for advice.
Imagine Google or a major tech company trying to fix their search engine by adding a psychic hotline to their front page? That’s what they’re doing with Bard.
Today’s AI promoters are trying to have it both ways: They insist that AI is crossing a profound boundary into untrodden territory with unfathomable risks. But they also define AI so broadly as to include almost any large-scale, statistically-driven computer program.
Under this definition, everything from the Google search engine to the iPhone’s face-recognition unlocking tool to the Facebook newsfeed algorithm is already “AI-driven” — and has been for years.
There’s a general consensus that large language models are going to get better and better. But what if this as good as it gets …before the snake eats its own tail?
The tails of the original content distribution disappear. Within a few generations, text becomes garbage, as Gaussian distributions converge and may even become delta functions. We call this effect model collapse.
Just as we’ve strewn the oceans with plastic trash and filled the atmosphere with carbon dioxide, so we’re about to fill the Internet with blah. This will make it harder to train newer models by scraping the web, giving an advantage to firms which already did that, or which control access to human interfaces at scale.
Chat is rarely a suitable interface for most tasks. Here, Maggie Appleton explores and prototypes some alternatives.
Today’s highly-hyped generative AI systems (most famously OpenAI) are designed to generate bullshit by design. To be clear, bullshit can sometimes be useful, and even accidentally correct, but that doesn’t keep it from being bullshit. Worse, these systems are not meant to generate consistent bullshit — you can get different bullshit answers from the same prompts. You can put garbage in and get… bullshit out, but the same quality bullshit that you get from non-garbage inputs! And enthusiasts are current mistaking the fact that the bullshit is consistently wrapped in the same envelope as meaning that the bullshit inside is consistent, laundering the unreasonable-ness into appearing reasonable.