If you believe that LLMs lend themselves unusually well to alignment compared to other regimes, this can be a very good reason to start doing capability research on them rather than LLM safety research.
[No LLMs were used (or harmed!) in the writing of this blogpost!]Technical results can all be found in my UAI 2026 paper: https://arxiv.org/abs/2603.02491This work, and this post about this work, was borne out of a frustration.
Recently I’ve been thinking a lot about a certain model of a rational agent: a proof-based agent which is triggered to act when it finds certain proofs in Peano arithmetic (PA). Back when MIRI had an agent foundations team, they found they could derive what these agents would do using provability logic, a modal logic in which the necessity box represents the provability predicate in PA.
(Previously, previously.)
IntroductionI’d like to share information that is worth sharing. So I ask the question—what is valuable information?With the ubiquity of search engines and Wikipedia, much factual information can be found very quickly. However, in this era it is easy, even encouraged, to create “fake” information, including text, audio, video, and any other type of media that can be consumed by the use of computational devices.
After weeks of government intervention, our beloved Fable 5 has been restored, with safeguards co-developed by the government and 50% weekly usage caps until July 7.
Over time, there might be an increasingly large gap between insider model access and outsider model access. By insiders, I mean employees at the frontier lab.[1] By "outsiders", I mean external safety researchers, third-party auditors, and other actors trying to make the future go well.
TLDR: We're hosting our first-ever winter fellowship — a ~3-month, fully-funded research program in Cape Town, South Africa (November 2026–February 2027). Work on AI safety research drawing on expertise from fields like neuroscience, evolutionary biology, dynamical systems, philosophy, and more.
(Introduction: Your four-dimensional body)Some people use meditation to learn valuable things about their deepest selves; some use therapy; some use psychedelic drugs. I use time travel.Why connect?My reasons, in no particular order:Reflection: Look at your past selves to see how you’ve changed, and how you haven’t. You notice how certain decisions back then determined who you are.
I wrote a (partial) defense of FDT responding to Bentham's Bulldug. What it lacks in philosophical rigor or clear thinking it makes up for cheap jokes and potty-mouthed bellicosity.
(For context, I’m an undergraduate considering entering the hard-tech startup space.
This essay began as a talk at the 2026 Gold Lab Symposium. You can watch the talk itself here. The content is the same, but I've expanded on it more here.The universe doesn't give up its secrets easily. Every time we learn some truth, we depend on prior, equally hard-won knowledge, and usually some luck.
Dynomight has looked at the health effects of vitamin D supplementation. The large-scale meta-analyses that have been performed conclude there is no significant effect, even though individual studies relatively consistently point in the direction of an effect.
Over the course of MATS 9.0 we formed some views about AI welfare research that we thought were worth writing up. This post is meant to spark discussion rather than to present definitive conclusions. Thanks to Patrick Butlin for useful comments on a previous draft, and for many conversations over the course of MATS which influenced our views.
I wonder if AI is being trained to make easy-to-correct mistakes so it can fix them later. That is, it ends up trained to correct its previous message's mistake, then make another mistake, so it can correct it again in the next message.
Engineering + math graduate whose goal is to maximize impact. I am currently deciding between two career paths, but have been struggling a lot to determine which would be more impactful:Become a professor/researcher in robotics, working on mainstream technical problems such as zero-shot learning.