[go: up one dir, main page]

Torrid News2

lesswrong

lesswrong 15h ago 13°

When capabilities work is the *safe* bet

If you believe that LLMs lend themselves unusually well to alignment compared to other regimes, this can be a very good reason to start doing capability research on them rather than LLM safety research.
lesswrong 48h ago 13°

What Capable Agents Must Know: Why AI Consciousness May Be an Inevitable Byproduct of Capability

[No LLMs were used (or harmed!) in the writing of this blogpost!]Technical results can all be found in my UAI 2026 paper: https://arxiv.org/abs/2603.02491This work, and this post about this work, was borne out of a frustration.
lesswrong 18h ago 12°

How to read tableaux, a formal system for modal logic with Kripke models

Recently I’ve been thinking a lot about a certain model of a rational agent: a proof-based agent which is triggered to act when it finds certain proofs in Peano arithmetic (PA). Back when MIRI had an agent foundations team, they found they could derive what these agents would do using provability logic, a modal logic in which the necessity box represents the provability predicate in PA.
lesswrong 13h ago 12°

Conversations With Cade Metz on the Rationalists

(Previously, previously.)
lesswrong 15h ago 11°

The Value of Veridical Information

IntroductionI’d like to share information that is worth sharing. So I ask the question—what is valuable information?With the ubiquity of search engines and Wikipedia, much factual information can be found very quickly. However, in this era it is easy, even encouraged, to create “fake” information, including text, audio, video, and any other type of media that can be consumed by the use of computational devices.
lesswrong 28h ago 10°

The Once and Present Fable (Fable 5 restoration linkpost)

After weeks of government intervention, our beloved Fable 5 has been restored, with safeguards co-developed by the government and 50% weekly usage caps until July 7.
lesswrong 22h ago 10°

Model access for third-parties — it's a big deal!

Over time, there might be an increasingly large gap between insider model access and outsider model access. By insiders, I mean employees at the frontier lab.[1] By "outsiders", I mean external safety researchers, third-party auditors, and other actors trying to make the future go well.
lesswrong 32h ago

Apply to the Inaugural PIBBSS Winter Research Fellowship!

TLDR: We're hosting our first-ever winter fellowship — a ~3-month, fully-funded research program in Cape Town, South Africa (November 2026–February 2027). Work on AI safety research drawing on expertise from fields like neuroscience, evolutionary biology, dynamical systems, philosophy, and more.
lesswrong 42h ago

Connect to your past selves

(Introduction: Your four-dimensional body)Some people use meditation to learn valuable things about their deepest selves; some use therapy; some use psychedelic drugs. I use time travel.Why connect?My reasons, in no particular order:Reflection: Look at your past selves to see how you’ve changed, and how you haven’t. You notice how certain decisions back then determined who you are.
lesswrong 18h ago

In Partial, Pugnacious Defense of Functional Decision Theory

I wrote a (partial) defense of FDT responding to Bentham's Bulldug. What it lacks in philosophical rigor or clear thinking it makes up for cheap jokes and potty-mouthed bellicosity.
lesswrong 15h ago

How inevitable are most accessible hard-tech startups?

(For context, I’m an undergraduate considering entering the hard-tech startup space.
lesswrong 31h ago

Why aren't there more AlphaFolds?

This essay began as a talk at the 2026 Gold Lab Symposium. You can watch the talk itself here. The content is the same, but I've expanded on it more here.The universe doesn't give up its secrets easily. Every time we learn some truth, we depend on prior, equally hard-won knowledge, and usually some luck.
lesswrong 13h ago

Do-it-yourself meta-analysis

Dynomight has looked at the health effects of vitamin D supplementation. The large-scale meta-analyses that have been performed conclude there is no significant effect, even though individual studies relatively consistently point in the direction of an effect.
lesswrong 12h ago

AI welfare research needs basic science

Over the course of MATS 9.0 we formed some views about AI welfare research that we thought were worth writing up. This post is meant to spark discussion rather than to present definitive conclusions. Thanks to Patrick Butlin for useful comments on a previous draft, and for many conversations over the course of MATS which influenced our views.
lesswrong 17h ago

AI Mistake Seeding

I wonder if AI is being trained to make easy-to-correct mistakes so it can fix them later. That is, it ends up trained to correct its previous message's mistake, then make another mistake, so it can correct it again in the next message.
lesswrong 8h ago

Career Choice: Becoming a Researcher in a Non-EA-Priority Field vs Founding Tech Startup?

Engineering + math graduate whose goal is to maximize impact. I am currently deciding between two career paths, but have been struggling a lot to determine which would be more impactful:Become a professor/researcher in robotics, working on mainstream technical problems such as zero-shot learning.
1 2 3