[go: up one dir, main page]

comments from Bob

Posted in Books, pictures, Statistics, University life with tags , , , , , , , , , , on April 10, 2026 by xi'an

Bob replied to my short post with further items of information that I find worth sharing:

Thanks for the kind post, Christian. It’s amusing to be the subject of one of these posts given how many of them I’ve read about other people. I really appreciate your summaries. And thanks to everyone in the audience for all the great feedback during and after the talk. Here’s a link to my slides.

One of your students or postdocs mentioned an approach that does continuous adaptation on some kind of polynomial schedule that is provably correct, but I didn’t manage to write down the author/reference or the name of the person who recommended it. If you happen to know what that is, I’d be grateful for the reference.

I would also like to follow up on the Robert & Andrieu paper you mention, but I could not find the exact reference on your Google Scholar page. The closest match I can find is:

Controlled MCMC for optimal sampling. 2001. C Andrieu, CP Robert. INSEE.

Section 1.3 is titled “Criteria for local adaptation.” The section cites two things. The first is Haario et al.’s (1999) sliding window approach, for which HMC moves too fast to be useful locally. The second is the multiple try approach of Liu et al. (2000) and the delayed rejection approach Tierny and Mira (1999). We applied delayed rejection to HMC step size adaptation in a couple of papers before developing GIST (Modi, Barnett and Carpenter in Bayesian Analysis; Turok, Modi, and Carpenter in AISTATS); these mirror our second GIST paper and third GIST paper in doing the step size adaptation for a whole trajectory and at each leapfrog step. The GIST approach is easier to understand, easier to describe mathematically, easier to implement, and is more efficient.

The nice part about GIST compared to Riemannian HMC is that we do not need to do any volume adjustments (which must be autodiffed through), which are cubic, and we do not need an implicit integrator, which is incredibly fussy to tune. The tradeoff is the we require reversibility of the adaptation, which I think is going to be tricky with varying curvature. Of course, we can’t afford to compute Hessian matrices in high dimensions, but we could manage Hessian-vector products if we could figure out how to use just those and we could also manage low-rank plus diagonal approximations or sketches as described in the Nutpie paper.

We’ve arXived the Nutpie paper since the talk:

Preconditioning HMC by minimizing Fisher divergence. arXiv. 2026. Seyboldt, Carlsen, and Carpenter.

The WALNUTS paper has been accepted by JMLR, but currently only the arXiv version is available:

The within-orbit adaptive leapfrog no-U-turn sampler. 2026. Nawaf Bou-Rabee, Bob Carpenter, Tore Selland Kleppe, Sifan Liu. 2025 arXiv; 2026 to appear JMLR.

Working with Nawaf and Tore has made all the difference in the world on this—it’s not something I could have done by myself. Sifan’s the one who came up with the nice characterization of NUTS and Nawaf’s done a number of additional things like providing mixing time bounds for NUTS (with Milo Marsden, who’s sadly no longer with us—he’s gone into finance).

Furthermore, you can adjust the U-turn criterion from 180 degrees to whatever you want to control how much of a full orbit you get. Those tend to be even more wasteful of iterations, though—this is what the plot from the expected integration time of NUTS is supposed to show, but it was confusing in the talk.

The approach you took with Wu Chengye to randomize number of leapfrog steps made a deep impression on me. It’s also wasteful in leapfrog steps because any number of steps greater than or less than about 1/4 of an orbit is wasteful either in computation or because it leads to more diffusive sampling. You can see that it is roughly as gradient efficient as NUTS in a 1000-dimensional standard normal. Interestingly, it’s worse than NUTS for parameter estimates and better for squared parameter estimates, which is overall a win. Nawaf has also published on randomized HMC. I think we could turn down NUTS U-turn criterion below 180 degrees to get something similar with NUTS, but I haven’t tried it.

One important property of your randomized approach is that it is much much easier to code efficiently for GPUs than NUTS, because the conditionals in NUTS are hard to execute in SIMD fashion. There’s a very nice introduction to this problem by Sountsov, Carroll, and Hoffman, in their paper “Running Markov Chain Monte Carlo on Modern Hardware and Software,” which is out on arXiv and also going into the next edition of the Handbook of MCMC). The thing to read about how to code NUTS on GPU is Dance, Glaser, Orbanz, and Adams’s paper, “Efficiently Vectorized MCMC on Modern Accelerators,” which is on arXiv and ICML 2025.

You can also randomize step size to vary the integration time and avoid harmonics, e.g.,

Randomized Hamiltonian Monte Carlo. 2017. Bou-Rabee and Sanz-Serna. Annals of Applied Probability.

incoming mostly Monte Carlo [14 April, PariSanté campus]

Posted in pictures, Statistics, University life with tags , , , , , , , , , , , , , , , on April 9, 2026 by xi'an

The next Mostly Monte Carlo seminar will be this very Friday, 10/04/26, at PariSanté Campus. With Shiva Darshan and Pierre Monmarché speaking on the following topics:
15h: Shiva Darshan Maximal-reflection couplings on manifolds: some specific examples
Explicit Markovian couplings can be used to build Markov Chain Monte Carlo methods such unbiased MCMC or coupling based control variates. For sampling from probability measures supported on Euclidean space, one typically uses a synchronous coupling, a maximal-reflection coupling (also known as a discrete-time sticky coupling), or some variant of the two. For probability measures supported on Riemannian manifolds, the situation is less clear cut. While the Kendall-Cranston coupling of Brownian motions on manifolds has been successfully applied in theoretical works, it is ill-suited for building explicit algorithms. In this talk, we will discuss some of the obstacles to extending Euclidean maximal-reflection couplings to manifolds and present some special cases for which these obstacles can be easily overcome. With applications to Stereographic MCMC in mind, we detail particular couplings of random walks on the sphere.
16h: Pierre Monmarché A post-sampling reweighting method for multi-modal target measures
Even when the modes are identified and sampled locally with MCMC methods, a difficulty to sample multi-modal measures is to correctly estimate the relative probabilities of each of these modes, which requires to observe many transitions between them (which are rare events). We will present an approach based on variational inference which exploits the local samples, aiming only at estimating the relative weights between them. When the modes are well separated, this amount to some entropy estimations.

We are gravely concerned that the conduct and threats outlined here are causing serious harm to civilians in the Middle East, and that they also contribute to escalating the conflict, damaging the environment and the global economy, and that they risk degrading the rule of law and fundamental norms that protect every nation’s civilians

Posted in Books, Kids, Travel, University life with tags , , , , , , , , , , , , , , , , , , , , , on April 8, 2026 by xi'an

पर्वत [cover]

Posted in Books, Kids, Mountains, pictures, Travel with tags , , , , , , , , , , , , on April 7, 2026 by xi'an

Congrats, IOC: discriminatory sex testing, but inclusion of Russian athletes!

Posted in Kids, Running with tags , , , , , , , , , , , , , , , on April 6, 2026 by xi'an