Tommi Jaakkola

Tommi S. Jaakkola, Ph.D.
Thomas Siebel Professor of Electrical Engineering and Computer Science and the Institute for Data, Systems, and Society

MIT Computer Science and Artificial Intelligence Laboratory
Stata Center, Bldg 32-G470
Cambridge, MA 02139

tommi at csail dot mit dot edu

[home] [papers] [research] [people]

Accessibility

Research synopsis (projects)

Our research advances how machines can learn, predict or control, and do so at scale in an efficient, principled, and interpretable manner. Our research in machine learning extends from foundational theory to modern applications, focusing especially on statistical inference and estimation tasks that lie at the heart of complex learning problems. We design new methods, theory and algorithms so as to automate the use and generation of semi-structured data such as natural language text, images, molecules, or strategies. We apply and develop our algorithms to solve multi-faceted recommender, retrieval, or inferential tasks (e.g., biomedical), design and optimize molecules or reactions for the purpose of drug design, and to model strategic, game theoretic interactions.

People (more people)

Julia Balla(c), Abhi Gupta, Cathy Cai(c), MinGyu Choi(c), Cameron Diao(c), Felix Faltings(c), Peter Holderrieth, Bowen Jing(c), Ron Shprints, Hannes Stärk(c), Shangyuan Tong, Chenyu Wang, Maurice Weiler*, Cai Zhou(c)

(* = postdoc, c = co-advised, v = visiting)

Recent release: BoltzGen

We introduce an all-atom generative model -- BoltzGen -- for designing proteins and peptides across all modalities to bind a wide range of biomolecular targets. BoltzGen builds strong structural reasoning capabilities about target-binder interactions into its generative design process and is controlled by a flexible design specification language. We experimentally validate these capabilities in a total of eight diverse wetlab design campaigns. Model weights, code for data, inference and training are released under the MIT license.

H. Stärk, F. Faltings, M. Choi, Y. Xie, E. Hur, T. O Donnell, A. Bushuiev, T. Ucar, S. Passaro, W. Mao, M. Reveiz, R. Bushuiev, T. Pluskal, Josef Sivic, Karsten Kreis, A. Vahdat, S. Ray, J. Goldstein, A. Savinov, J. Hambalek, A. Gupta, D. Taquiri-Diaz, Y. Zhang, A. K. Hatstat, A. Arada, N. H. Kim, E. Tackie-Yarboi, D. Boselli, L. Schnaider, C. C. Liu, G.-W. Li, D. Hnisz, D. M. Sabatini, W. F. DeGrado, J. Wohlwend, G. Corso, R. Barzilay and T. Jaakkola.
BoltzGen: Toward Universal Binder Design. Preprint.
[bioRxiv], [GitHub]

Recent publications (2022-present), see also papers, Google scholar, preprints on arXiv, preprints on bioRxiv

S. Tong, N. Ma, S. Xie, and T. Jaakkola.
Flow map distillation without data.
In The IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2026.
[link]

P. Holderrieth, U. Singer, T. Jaakkola, R. T. Q. Chen, Y. Lipman, and B. Karrer.
Glass flows: Efficient inference for reward alignment of flow and diffusion models.
In The 14th International Conference on Learning Representations (ICLR), 2026.
[link]

C. Wang, P. Rashidinejad, D. Su, S. Jiang, S. Wang, S. Zhao, C. Zhou, S. Zejiang Shen, F. Chen, T. Jaakkola, Y. Tian, and B. Liu.
Spg: Sandwiched policy gradient for masked diffusion language models.
In The 14th International Conference on Learning Representations (ICLR), 2026.
[link]

W. Ahern, J. Yim, D. Tischer, S. Salike, S. M. Woodbury, D. Kim, I. Kalvet, Y. Kipnis, B. Coventry, H. R. Altae-Tran, M. S. Bauer, R. Barzilay, T. Jaakkola, R. Krishna, and D. Baker.
Atom-level enzyme active site scaffolding using rfdiffusion2.
Nature Methods, 2025.
[link]

C. Wang, C. Zhou, S. Gupta, Z. Lin, S. Jegelka, S. Bates, and T. Jaakkola.
Learning diffusion models with flexible representation guidance.
In Neural Information Processing Systems (NeurIPS), 2025.
[link]

C. Zhou, C. Wang, D. Zhang, S. Tong, Y. Wang, S. Bates, and T. Jaakkola.
Next semantic scale prediction via hierarchical diffusion language models.
In Neural Information Processing Systems (NeurIPS), 2025.
[link]

R. Okabe, M. Cheng, A. Chotrattanapituk, M. Mandal, K. Mak, D. C\'ordova Carrizales, N. T. Hung, X. Fu, B. Han, Y. Wang, W. Xie, R. J. Cava, T. S. Jaakkola, Y. Cheng, and M. Li.
Structural constraint integration in a generative model for the discovery of quantum materials.
Nature Materials, 2025.
[link]

M. Wu, C. Zhou, S. Bates, and T. Jaakkola.
Thought calibration: Efficient and confident test-time scaling.
In Empirical Methods in Natural Language Processing (EMNLP), 2025.
[link]

P. Holderrieth, M. Albergo, and T. Jaakkola.
Leaps: A discrete neural sampler via locally equivariant networks.
In International Conference on Machine Learning (ICML), 2025.
[link]

M. Wu, U. Padia, S. H. Murphy, R. Barzilay, and T. Jaakkola.
Identifying biological perturbation targets through causal differential networks.
In International Conference on Machine Learning (ICML), 2025.
[link]

J. Mohapatra, N. Dehmamy, C. Both, S. Das, and T. Jaakkola.
Symmetry-driven discovery of dynamical variables in molecular simulations.
In International Conference on Machine Learning (ICML), 2025.
[link]

N. Ma, S. Tong, H. Jia, H. Hu, Y-C Su, M. Zhang, X. Yang, Y. Li, T. Jaakkola, X. Jia, and S. Xie.
Inference-time scaling for diffusion models beyond scaling denoising steps.
In In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (CVPR), 2025.
[link]

P. Holderrieth, M. Havasi, J. Yim, N. Shaul, I. Gat, T. Jaakkola, B. Karrer, R. T. Q. Chen, and Y. Lipman.
Generator matching: Generative modeling with arbitrary markov processes.
In The 13th International Conference on Learning Representations (ICLR), 2025.
[link]

G. Corso, V. Ram Somnath, N. Getz, R. Barzilay, T. Jaakkola, and A. Krause.
Composing unbalanced flows for flexible docking and relaxation.
In The 13th International Conference on Learning Representations (ICLR), 2025.
[link]

H. Stärk, B. Jing, T. Geffner, J. Yim, T. Jaakkola, A. Vahdat, and K. Kreis.
Protcomposer: Compositional protein structure generation with 3d ellipsoids.
In The 13th International Conference on Learning Representations (ICLR), 2025.
[link]

C. Wang, S. Gupta, X. Zhang, S. Tonekaboni, S. Jegelka, T. Jaakkola, and C. Uhler.
An information criterion for controlled disentanglement of multimodal data.
In The 13th International Conference on Learning Representations (ICLR), 2025.
[link]

M. Karimi, S. Banerjee, T. Jaakkola, B. Dubrov, S. Shang, and R. Benson.
Data distillation for extrapolative protein design through exact preference optimization.
In The 13th International Conference on Learning Representations (ICLR), 2025.
[link]

C. Wang, M. Uehara, Y. He, A. Wang, T. Biancalani, A. Lal, T. Jaakkola, S. Levine, Hanchen, and A. Regev.
Fine-tuning discrete diffusion models via reward optimization with applications to dna and protein design.
In The 13th International Conference on Learning Representations (ICLR), 2025.
[link]

Y. Liu, S. Chang, T. Jaakkola, and Y. Zhang.
Fictitious synthetic data can improve llm factuality via prerequisite learning.
In The 13th International Conference on Learning Representations (ICLR), 2025.
[link]

S. Liu, J. Nam, A. Campbell, H. Stärk, Y. Xu, T. Jaakkola, and R. Gomez-Bombarelli.
Think while you generate: Discrete diffusion with planned denoising.
In The 13th International Conference on Learning Representations (ICLR), 2025.
[link]

P. Holderrieth, Y. Xu, and T. Jaakkola.
Hamiltonian score matching and generative flows.
In Neural Information Processing Systems (NeurIPS), 2024.
[link]

S. Gupta, C. Wang, Y. Wang, T. Jaakkola, and S. Jegelka.
Symmetries in-context: Universal self-supervised learning through contextual world models.
In Neural Information Processing Systems (NeurIPS), 2024.
[link]

X. Fu, A. S. Rosen, K. Bystrom, R. Wang, A. Musaelian, B. Kozinsky, T. Smidt, and T. Jaakkola.
A recipe for charge density prediction.
In Neural Information Processing Systems (NeurIPS), 2024.
[link]

N. Dehmamy, C. Both, J. Mohapatra, S. Das, and T. Jaakkola.
Neural network reparametrization for accelerated optimization in molecular simulations.
In Neural Information Processing Systems (NeurIPS), 2024.
[link]

B. Jing, H. Stärk, T. Jaakkola, and B. Berger.
Generative modeling of molecular dynamics trajectories.
In Neural Information Processing Systems (NeurIPS), 2024.
[link]

B. Jing, B. Berger, and T. Jaakkola.
Alphafold meets flow matching for generating protein ensembles.
In International Conference on Machine Learning (ICML), 2024.
[link]

A. Campbell, J. Yim, R. Barzilay, T. Rainforth, and T. Jaakkola.
Generative flows on discrete state-spaces: Enabling multimodal flows with applications to protein co-design.
In International Conference on Machine Learning (ICML), 2024.
[link]

Y. Xu, G. Corso, T. Jaakkola, A. Vahdat, and K. Kreis.
Disco-diff: Enhancing continuous diffusion models with discrete latents.
In International Conference on Machine Learning (ICML), 2024.
[link]

H. Stärk, B. Jing, R. Barzilay, and T. Jaakkola.
Harmonic self-conditioned flow matching for joint multi-ligand docking and binding site design.
In International Conference on Machine Learning (ICML), 2024.
[link]

H. Stärk, B. Jing, C. Wang, G. Corso, B. Berger, R. Barzilay, and T. Jaakkola.
Dirichlet flow matching with applications to dna sequence design.
In International Conference on Machine Learning (ICML), 2024.
[link]

J. Yim, H. Stärk, G. Corso, B. Jing, R. Barzilay, and T. Jaakkola.
Diffusion models in protein structure and docking.
WIREs Computational Molecular Science, 14(2):e1711, 2024.
[link]

R. Okabe, A. Chotrattanapituk, A. Boonkird, N. Andrejevic, X. Fu, T. S. Jaakkola, Q. Song, T. Nguyen, N. Drucker, S. Mu, Y. Wang, B. Liao, Y. Cheng, and M. Li.
Virtual node graph neural network for full phonon prediction.
Nature Computational Science, 4(7), 2024.
[link]

Y. Liu, Y. Zhang, T. Jaakkola, and S. Chang.
Correcting diffusion generation through resampling.
In Computer Vision and Pattern Recognition (CVPR), 2024.
[link]

G. Corso, H. Stark, S. Jegelka, T. Jaakkola, and R. Barzilay.
Graph neural networks.
Nature Reviews Methods Primers, 4(17), 2024.
[link]

X. Fu, T. Xie, A. S. Rosen, T. Jaakkola, and J. A. Smith.
Mofdiff: Coarse-grained diffusion for metal-organic framework design.
In The 12th International Conference on Learning Representations (ICLR), 2024.
[link]

G. Corso, Y. Xu, V. De Bortoli, R. Barzilay, and T. Jaakkola.
Particle guidance: non-i.i.d. diverse sampling with diffusion models.
In The 12th International Conference on Learning Representations (ICLR), 2024.
[link]

G. Corso, A. Deng, N. Polizzi, R. Barzilay, and T. Jaakkola.
Deep confident steps to new pockets: Strategies for docking generalization.
In The 12th International Conference on Learning Representations (ICLR), 2024.
[link]

C. Wang, S. Gupta, C. Uhler, and T. Jaakkola.
Removing biases from molecular representations via information maximization.
In The 12th International Conference on Learning Representations (ICLR), 2024.
[link]

B. Jing, T. Jaakkola, and B. Berger.
Learning scalar fields for molecular docking with fast fourier transforms.
In The 12th International Conference on Learning Representations (ICLR), 2024.
[link]

A. Kirjner, J. Yim, R. Samusevich, S. Bracha, T. Jaakkola, R. Barzilay, and I. R. Fiete.
Improving protein optimization with smoothed fitness landscapes.
In The 12th International Conference on Learning Representations (ICLR), 2024.
[link]

V. Quach, A. Fisch, T. Schuster, A. Yala, J. H. Sohn, T. Jaakkola, and R. Barzilay.
Conformal language modeling.
In The 12th International Conference on Learning Representations (ICLR), 2024.
[link]

B. A. Koscher, R. B. Canty, M. A. McDonald, K. P. Greenman, C. J. McGill, C. L. Bilodeau, W. Jin, H. Wu, F. H. Vermeire, B. Jin, T. Hart, T. Kulesza, S-C. Li, T. S. Jaakkola, R. Barzilay, R. Gomez-Bombarelli, W. H. Green, and K. F. Jensen.
Autonomous, multiproperty-driven molecular discovery: From predictions to measurements and back.
Science, 382, 2023.
[link]

T. Garipov, S. De Peuter, G. Yang, V. Garg, S. Kaski, and T. Jaakkola.
Compositional sculpting of iterative generative processes.
In Neural Information Processing Systems (NeurIPS), 2023.
[link]

Y. Xu, M. Deng, X. Cheng, Y. Tian, Z. Liu, and T. Jaakkola.
Restart sampling for improving generative processes.
In Neural Information Processing Systems (NeurIPS), 2023.
[link]

A. Ajay, S. Han, Y. Du, S. Li, A. Gupta, T. Jaakkola, J. Tenenbaum, L. Pack Kaelbling, A. Srivastava, and P. Agrawal.
Hierarchical planning with foundation models.
In Neural Information Processing Systems (NeurIPS), 2023.
[link]

X. Fu, T. Xie, N. J. Rebello, B. Olsen, and T. Jaakkola.
Simulate time-integrated coarse-grained molecular dynamics with multi-scale graph networks.
Transactions on Machine Learning Research (TMLR), 2023.
[link]

J. L. Watson, D. Juergens, N. R. Bennett, B. L. Trippe, J. Yim, H. E. Eisenach, W. Ahern, A. J. Borst, R. J. Ragotte, L. F. Milles, B. I. M. Wicky, N. Hanikel, S. J. Pellock, A. Courbet, W. Sheffler, J. Wang, P. Venkatesh, I. Sappington, S. Vazquez Torres, A. Lauko, V. De Bortoli, E. Mathieu, S. Ovchinnikov, R. Barzilay, T. S. Jaakkola, F. DiMaio, M. Baek, and D. Baker.
De novo design of protein structure and function with rfdiffusion.
Nature, 620:1089–1100, 2023.
[link]

G. Liu, D. Catacutan, K. Rathod, K. Swanson, W. Jin, J. Mohammed, A. Chiappino-Pepe, S. Syed, M. Fragis, K. Rachwalski, J. Magolan, M. Surette, B. Coombes, T. Jaakkola, R. Barzilay, J. J. Collins, and J. M. Stokes.
Deep learning-guided discovery of an antibiotic targeting acinetobacter baumannii.
Nature Chemical Biology, 2023.
[link] [pdf]

Y. Xu, Z. Liu, Y. Tian, S. Tong, M. Tegmark, and T. Jaakkola.
Pfgm++: Unlocking the potential of physics-inspired generative models.
In International Conference on Machine Learning (ICML), 2023.
[link]

J. Yim, B. Trippe, V. De Bortoli, E. Mathieu, A. Doucet, R. Barzilay, and T. Jaakkola.
Se(3) diffusion model with application to protein backbone generation.
In International Conference on Machine Learning (ICML), 2023.
[link]

G. Zhang, J. Ji, Y. Zhang, M. Yu, T. Jaakkola, and S. Chang.
Towards coherent image inpainting using denoising diffusion implicit models.
In International Conference on Machine Learning (ICML), 2023.
[link]

X. Fu, Z. Wu, W. Wang, T. Xie, S. Keten, R. Gomez-Bombarelli, and T. Jaakkola.
Forces are not enough: Benchmark and critical evaluation for machine learning force fields with molecular simulations.
Transactions on Machine Learning Research (TMLR), 2023.
[link]

M. Amine Ketata, C. Laue, R. Mammadov, H. Stärk, M. Wu, G. Corso, C. Marquet, R. Barzilay, and T. Jaakkola.
Diffdock-pp: Rigid protein-protein docking with diffusion models.
In Machine Learning for Drug Discovery (ICLR workshop), 2023.
[link]

B. Jing, E. Erives, P. Pao-Huang, G. Corso, B. Berger, and T. Jaakkola.
Eigenfold: Generative protein structure prediction with diffusion models.
In Machine Learning for Drug Discovery Workshop (ICLR workshop), 2023.
[link]

B. Trippe, J. Yim, D. Tischer, D. Baker, T. Broderick, R. Barzilay, and T. Jaakkola.
Diffusion probabilistic modeling of protein backbones in 3d for the motif-scaffolding problem.
In The 11th International Conference on Learning Representations (ICLR), 2023.
[link]

G. Corso, H. St\ärk, B. Jing, R. Barzilay, and T. Jaakkola.
Diffdock: Diffusion steps, twists, and turns for molecular docking.
In The 11th International Conference on Learning Representations (ICLR), 2023.
[link]

Y. Xu, S. Tong, and T. Jaakkola.
Stable target field for reduced variance score estimation.
In The 11th International Conference on Learning Representations (ICLR), 2023.
[link]

A. Ajay, Y. Du, A. Gupta, J. Tenenbaum, T. Jaakkola, and P. Agrawal.
Is conditional generative modeling all you need for decision making?
In The 11th International Conference on Learning Representations (ICLR), 2023.
[link]

B. Laufer-Goldshtein, A. Fisch, R. Barzilay, and T. Jaakkola.
Efficiently controlling multiple risks with pareto testing.
In The 11th International Conference on Learning Representations (ICLR), 2023.
[link]

H. Zhao, C. Dan, B. Aragam, T. Jaakkola, G. Gordon, and P. Ravikumar.
Fundamental limits and tradeoffs in invariant representation learning.
Journal of Machine Learning Research, 23(340):1--49, 2022.
[link]

A. Fisch, T. Jaakkola, and R. Barzilay.
Calibrated selective classification.
Transactions on Machine Learning Research, 2022.
[link]

B. Jing, G. Corso, J. Chang, R. Barzilay, and T. Jaakkola.
Torsional diffusion for molecular conformer generation.
In Neural Information Processing Systems (NeurIPS), 2022.
[link]

Y. Xu, Z. Liu, M. Tegmark, and T. Jaakkola.
Poisson flow generative models.
In Neural Information Processing Systems (NeurIPS), 2022.
[link]

F. Wong, A. Krishnan, E. Zheng, H. St\ärk, A. Manson, A. Earl, T. Jaakkola, and J. Collins.
Benchmarking alphafold-enabled molecular docking predictions for antibiotic discovery in molecular systems biology.
Molecular Systems Biology, 18(9), 2022.
[link]

B. Jing, G. Corso, R. Berlinghieri, and T. Jaakkola.
Subspace diffusion generative models.
In European Conference on Computer Vision (ECCV), 2022.
[link]

H. St\ärk, O. Ganea, L. Pattanaik, R. Barzilay, and T. Jaakkola.
Equibind: Geometric deep learning for drug binding structure prediction.
In International Conference on Machine Learning (ICML), 2022.
[link]

W. Jin, R. Barzilay, and T. Jaakkola.
Antibody-antigen interface design via hierarchical structure refinement.
In International Conference on Machine Learning (ICML), 2022.
[link]

A. Fisch, T. Schuster, T. Jaakkola, and R. Barzilay.
Conformal prediction sets with limited false positives.
In International Conference on Machine Learning (ICML), 2022.
[link]

C. Bilodeau, W. Jin, T. Jaakkola, R. Barzilay, and K. F. Jensen.
Generative models for molecular discovery: Recent advances and challenges.
WIREs Computational Molecular Science, 2022.
[link]

T. Xie, X. Fu, O. Ganea, R. Barzilay, and T. Jaakkola.
Crystal diffusion variational autoencoder for periodic material generation.
In The Tenth International Conference on Learning Representations (ICLR), 2022.
[pdf]

W. Jin, J. Wohlwend, R. Barzilay, and T. Jaakkola.
Iterative refinement graph neural network for antibody sequence-structure co-design.
In The Tenth International Conference on Learning Representations (ICLR), 2022.
[pdf]

Y. Xu, H. He, T. Shen, and T. Jaakkola.
Controlling directions orthogonal to a classifier.
In The Tenth International Conference on Learning Representations (ICLR), 2022.
[pdf]

S. Tong, T. Garipov, Y. Zhang, S. Chang, and T. Jaakkola.
Adversarial support alignment.
In The Tenth International Conference on Learning Representations (ICLR), 2022.
[pdf]

O. Ganea, X. Huang, C. Bunne, Y. Bian, R. Barzilay, T. Jaakkola, and A. Krause.
Independent se(3)-equivariant models for end-to-end rigid protein docking.
In The Tenth International Conference on Learning Representations (ICLR), 2022.
[pdf]