Bohan Lyu

I am an undergruduate at Tsinghua University. I'm interested in ML and NLP topics. My works are published in ICML, ACL, COLM, etc.

Now I'm working on autonomous self-evolving intelligence at PLI, Princeton, advised by Prof. Prof. Chi Jin. Prior to it, I've worked in Tsinghua NLP Lab, advised by Prof. Zhiyuan Liu, and Rose-STL-Lab, UCSD, advised by Prof. Rose Yu.

I'm serving as reviewers of ICLR, ARR, and workshops about AI4MATH and LLMAgents.

I am seeking industrial internship opportunities.

Contact Info / CV / Google Scholar / Twitter / Where is Bohan?

News

2025-07 Releasing Goedel-Prover-V2!
2025-05 Enhancing LLM's Capabilities in Open Domains via Autonomous Tool Integration is accepted to ACL 2025 Main!
2025-05 Adapting while Learning is accepted to ICML 2025!
2024-06 My team got No.1 out of 600+ teams at a Baidu Inc. Data Mining Competition.
2024-05 Awarded Spark Scientific and Technological Innovation Fellowship (Top 1% in Tsinghua).
2024-04 I led my team to win Second Place and the Newcomer Prize at Tsinghua University's Challenge Cup.

Research (* indicates equal contribution, highlight indicates representative papers)

I'm interested in AI for MATH, ML for NLP and their applications.

	Goedel-Prover-V2: Scaling Formal Theorem Proving with Scaffolded Data Synthesis and Self-Correction Yong Lin, Shange Tang, Bohan Lyu, Ziran Yang, Jui-Hui Chung, Haoyu Zhao, Lai Jiang, Yihan Geng, Jiawei Ge, Jingruo Sun, Jiayun Wu, Jiri Gesi, Ximing Lu, David Acuna, Kaiyu Yang, Hongzhou Lin, Yejin Choi, Danqi Chen, Sanjeev Arora, Chi Jin ICLR 2026, AI4MATH @ ICML 2025 (Oral) arXiv / Code / Slides / Website / BibTeX / Goedel-Prover-V2 sets new state-of-the-art in automated proof generation with dramatically smaller models, using scaffolded training, self-correction, and model averaging to outperform systems 20-100x larger on key mathematical benchmarks.
	Ineq-Comp: Benchmarking Human-Intuitive Compositional Reasoning in Automated Theorem Proving on Inequalities Haoyu Zhao, Yihan Geng, Shange Tang, Yong Lin, Bohan Lyu, Hongzhou Lin, Chi Jin, Sanjeev Arora NeurIPS 2025 D&B Track arXiv / Code / BibTeX / LLM-based proof assistants struggle with compositional reasoning in inequalities. The Ineq-Comp benchmark reveals that even strong models like DeepSeek-Prover-V2-7B falter, despite having proofs of subparts. This highlights a key gap between AI and human mathematical intuition.
	SURGE: On the Potential of Large Language Models as General-Purpose Surrogate Code Executors Bohan Lyu, Siqiao Huang, Zichen Liang* EMNLP 2025 (Main, meta score top 0.3%) arXiv / Code / Website / BibTeX / SURGE evaluates LLMs’ ability to predict code execution across eight key areas. While they show promise, limitations prevent general-purpose surrogate execution. This is an extension of a course project that does not reflect the focus of my research.
	Goedel-Prover: A Frontier Model for Open-Source Automated Theorem Proving Yong Lin, Shange Tang, Bohan Lyu, Jiayun Wu, Hongzhou Lin, Kaiyu Yang, Jia Li, Mengzhou Xia, Danqi Chen, Sanjeev Arora, Chi Jin COLM 2025 arXiv / Code / Website / YouTube / BibTeX / The new 7B Goedel-Prover sets a new state-of-the-art in open-source automated theorem proving, beating previous records with a 7% improvement on miniF2F, topping the PutnamBench Leaderboard, and solving nearly twice as many problems on Lean Workbook.
	Adapting While Learning: Grounding LLMs for Scientific Problems with Tool Usage Adaptation Bohan Lyu, Yadi Cao, Duncan Watson-Parris, Leon Bergen, Taylor Berg-Kirkpatrick, Rose Yu ICML 2025, AAAI 2024 FSS (Oral) arXiv / Code / Slides / Website / YouTube / BibTeX / This work proposes a fine-tuning method where LLMs internalize tool-generated solutions (World Knowledge Distillation) and learn to switch between direct answers and tool use for complex problems (Tool Usage Adaptation). It outperforms GPT-4 and Claude-3.5 across six scientific benchmarks.
	Enhancing Open-Domain Task-Solving Capability of LLMs via Autonomous Tool Integration from GitHub Bohan Lyu, Xin Cong, Heyang Yu, Pan Yang, Cheng Qian, Zihe Wang, Yujia Qin, Yining Ye, Yaxi Lu, Chen Qian, Zhong Zhang, Yukun Yan, Yankai Lin, Zhiyuan Liu, Maosong Sun ACL 2025 (Main) arXiv / Code / Slides / Website / YouTube / BibTeX / Constructed OpenAct benchmark for complex open-domain task-solving. Developed a novel LLM agent system, OpenAgent, which leverages GitHub repositories to extend its capabilities to address diverse user queries.

Talks

2025-10 The Curious Landscape of LLM Reasoning: An Overview of Techniques, Research Directions, and Challenges at Wuqiong College, Tsinghua University.

2025-07 LLM Reasoning: From Informal to Formal at Department of Computer Science and Technology, Tsinghua University.

Resources

Some tools and resources I've developed for the research community:

jload - A Python package for convenient JSON/JSONL data loading and saving.
vlllm - A Python package for convenient text generation with multiprocessing support.

Design and source code from Leonid Keselman's website

Bohan Lyu

News

Research (* indicates equal contribution, highlight indicates representative papers)

Goedel-Prover-V2: Scaling Formal Theorem Proving with Scaffolded Data Synthesis and Self-Correction

Ineq-Comp: Benchmarking Human-Intuitive Compositional Reasoning in Automated Theorem Proving on Inequalities

SURGE: On the Potential of Large Language Models as General-Purpose Surrogate Code Executors

Goedel-Prover: A Frontier Model for Open-Source Automated Theorem Proving

Adapting While Learning: Grounding LLMs for Scientific Problems with Tool Usage Adaptation

Enhancing Open-Domain Task-Solving Capability of LLMs via Autonomous Tool Integration from GitHub

Talks

Resources

Please cite if you find it helpful.

Contact Info

Where is Bohan?