AbstractA safety evaluation has to trust something to know what it is testing: the model's name, its version string, and the reasons the model gives for what it does. On one deployed alias, none of the three held. I sent the same 100 harmful requests to gemini-3.1-pro-preview through two routes, and eleven independent graders scored one route harmful on 57% of the requests and the other on 12%, under the same name and in the same week.
cart;horse: If you are in the Bay Area on July 11th, even if you're at a company being protested, you should come to The AI Protest. It's fully legal and nonviolent (we'll have a full overtime SFPD escort the entire time), and it's not the worst way you can use your Saturday afternoon that weekend.
One potential risk of developing general-purpose robots is that they could greatly reduce the friction required to establish a totalitarian regime.
There has been recent discourse floating around on Hacker News about Chinese API relay stations that use every Western VC-subsidized channel of cheap tokens (think Claude/ChatGPT subscriptions, AWS/Azure credits, Kiro, Google Antigravity, etc.) to resell as APIs to the domestic Chinese market. This is true, as a Chinese citizen that has been seeing an uptick of this trend since mid 2024, but especially since 2025.
We, or at least ‘more than 100 American institutions,’ got Mythos back this week.
TL;DR: There are many conceivable versions of a “CERN for AI.” But the version that seems politically realistic (a new catch-up lab) probably would not do much for safety, while the versions that would materially improve safety (e.g., pause + merge of all companies) are probably unrealistic. So I see the CERN idea as a distraction, and not a particularly neglected one.
The Lab Problem in AI SafetyThere's something that doesn't sit right with me about where alignment research happens.So much research, so many researchers, ideas, experiments, but the environment makes no sense. Almost all of it takes place in the same setting: one AI, one user, a chat interface, maybe some tool calls. That's the lab.
(I wrote this post partly to help orient those interested in participating in the EA Forum’s Cluelessness Critiques Competition.
Epistemic status: trying to articulate a big idea which I feel is important but underexplored, partly because it is hard to frame clearly - may not be framing it clearly yet!Agency, both natural and artificial, is a very important concept. Understanding agency allows us to model our own behaviour and that of others, and it is thus one of the most predictively useful concepts we have at our disposal.
Fable 5 is back today, baby! Premium subscribers have one week to use it within their subscriptions. First hit’s free. Then you pay by the token.
For part one of the aspirant sequence - which may or may not be arranged into some totally different order when I'm done with it, because the connection here won't be obvious yet - see Would you work harder in the least convenient possible world?Partly in response to: Politics is the Mind-Killer and Politics is Hard ModePart One: A Tale Of Two Houses Two groups of rationalists live in houses across the street from one another, as is tradition in San Francisco.
https://www.magfrump.net/blog/the-problem-with-chatDiscuss
Talking to a friend today, she complained about someone wanting her help with a project when that person didn’t even know what the point of the project was. Prima facie that does sound kind of objectionable. But is it? People definitely do a lot of things without much explicit account of the point of each of them.
On July 4, 2026, the United States turns 250. This anniversary made me think about how many people it took to build this country, and how many of them are no longer here to see what it has become.In other words: how many people have ever lived in the United States?For most of the country's history, demographic record-keeping was unfortunately far from complete, especially when it came to births.
SASH is accepting applications for the inaugural Singapore AI Safety Fellowship, a three-month residential research fellowship running September 21 - December 4, 2026.What it is: An in-person fellowship in Singapore matching fellows with experienced AI safety researchers. Fellows produce joint research on technical safety or governance, supported by mentors working across Eastern and Western institutions.