Torrid News

lesswrong 39h ago 23°

The Name is Not The Model

AbstractA safety evaluation has to trust something to know what it is testing: the model's name, its version string, and the reasons the model gives for what it does. On one deployed alias, none of the three held. I sent the same 100 harmful requests to gemini-3.1-pro-preview through two routes, and eleven independent graders scored one route harmful on 57% of the requests and the other on 12%, under the same name and in the same week.

lesswrong 28h ago 21°

You Should Come to The AI Protest

cart;horse: If you are in the Bay Area on July 11th, even if you're at a company being protested, you should come to The AI Protest. It's fully legal and nonviolent (we'll have a full overtime SFPD escort the entire time), and it's not the worst way you can use your Saturday afternoon that weekend.

lesswrong 37h ago 20°

Is it ethical to work on general-purpose robots given the risk of cyberhacking?

One potential risk of developing general-purpose robots is that they could greatly reduce the friction required to establish a totalitarian regime.

lesswrong 34h ago 17°

The consequences of locking intelligence away: an introduction to Claude relays in China

There has been recent discourse floating around on Hacker News about Chinese API relay stations that use every Western VC-subsidized channel of cheap tokens (think Claude/ChatGPT subscriptions, AWS/Azure credits, Kiro, Google Antigravity, etc.) to resell as APIs to the domestic Chinese market. This is true, as a Chinese citizen that has been seeing an uptick of this trend since mid 2024, but especially since 2025.

lesswrong 35h ago 17°

The Once And Future Fable #5

We, or at least ‘more than 100 American institutions,’ got Mythos back this week.

lesswrong 26h ago 15°

A CERN for AI is a distraction; push for an IAEA instead

TL;DR: There are many conceivable versions of a “CERN for AI.” But the version that seems politically realistic (a new catch-up lab) probably would not do much for safety, while the versions that would materially improve safety (e.g., pause + merge of all companies) are probably unrealistic. So I see the CERN idea as a distraction, and not a particularly neglected one.

lesswrong 2h ago 15°

AI Safety Is Testing the Wrong Environment

The Lab Problem in AI SafetyThere's something that doesn't sit right with me about where alignment research happens.So much research, so many researchers, ideas, experiments, but the environment makes no sense. Almost all of it takes place in the same setting: one AI, one user, a chat interface, maybe some tool calls. That's the lab.

lesswrong 32h ago 15°

'AI allegory steganography' in Claude short stories in the Unslop contest?

Discuss

lesswrong 36h ago 15°

Cluelessness: Summary of the argument, why it matters, and counterarguments

(I wrote this post partly to help orient those interested in participating in the EA Forum’s Cluelessness Critiques Competition.

lesswrong 47h ago 15°

Agency is not a natural kind (and why that might matter for alignment)

Epistemic status: trying to articulate a big idea which I feel is important but underexplored, partly because it is hard to frame clearly - may not be framing it clearly yet!Agency, both natural and artificial, is a very important concept. Understanding agency allows us to model our own behaviour and that of others, and it is thus one of the most predictively useful concepts we have at our disposal.

lesswrong 10h ago 14°

Claude Sonnet 5 Is Not Frontier But Has Its Uses

Fable 5 is back today, baby! Premium subscribers have one week to use it within their subscriptions. First hit’s free. Then you pay by the token.

lesswrong 37h ago 14°

That Which Cannot Be Poked With A Stick Is The Mind-Killer

For part one of the aspirant sequence - which may or may not be arranged into some totally different order when I'm done with it, because the connection here won't be obvious yet - see Would you work harder in the least convenient possible world?Partly in response to: Politics is the Mind-Killer and Politics is Hard ModePart One: A Tale Of Two Houses Two groups of rationalists live in houses across the street from one another, as is tradition in San Francisco.

lesswrong 29h ago 14°

The Problem with Chat

https://www.magfrump.net/blog/the-problem-with-chatDiscuss

lesswrong 26h ago 14°

When should you know the point?

Talking to a friend today, she complained about someone wanting her help with a project when that person didn’t even know what the point of the project was. Prima facie that does sound kind of objectionable. But is it? People definitely do a lot of things without much explicit account of the point of each of them.

lesswrong 10h ago 13°

How Many People Have Ever Lived in the United States?

On July 4, 2026, the United States turns 250. This anniversary made me think about how many people it took to build this country, and how many of them are no longer here to see what it has become.In other words: how many people have ever lived in the United States?For most of the country's history, demographic record-keeping was unfortunately far from complete, especially when it came to births.

lesswrong 6h ago 13°

The Singapore AI Safety Fellowship - Applications Open (Deadline: July 10 2026)

SASH is accepting applications for the inaugural Singapore AI Safety Fellowship, a three-month residential research fellowship running September 21 - December 4, 2026.What it is: An in-person fellowship in Singapore matching fellows with experienced AI safety researchers. Fellows produce joint research on technical safety or governance, supported by mentors working across Eastern and Western institutions.

1 2 3 4