[go: up one dir, main page]

Bohan Lyu

I am an undergruduate at Tsinghua University. I'm interested in ML and NLP topics. My works are published in ICML, ACL, COLM, etc.

Now I'm working on autonomous self-evolving intelligence at PLI, Princeton, advised by Prof. Prof. Chi Jin. Prior to it, I've worked in Tsinghua NLP Lab, advised by Prof. Zhiyuan Liu, and Rose-STL-Lab, UCSD, advised by Prof. Rose Yu.

I'm serving as reviewers of ICLR, ARR, and workshops about AI4MATH and LLMAgents.

I am seeking industrial internship opportunities.

Contact Info  /  CV  /  Google Scholar  /  Twitter  /  Where is Bohan?

   

profile photo

News

  • 2025-07 Releasing Goedel-Prover-V2!
  • 2025-05 Enhancing LLM's Capabilities in Open Domains via Autonomous Tool Integration is accepted to ACL 2025 Main!
  • 2025-05 Adapting while Learning is accepted to ICML 2025!
  • 2024-06 My team got No.1 out of 600+ teams at a Baidu Inc. Data Mining Competition.
  • 2024-05 Awarded Spark Scientific and Technological Innovation Fellowship (Top 1% in Tsinghua).
  • 2024-04 I led my team to win Second Place and the Newcomer Prize at Tsinghua University's Challenge Cup.

Research (* indicates equal contribution, highlight indicates representative papers)

I'm interested in AI for MATH, ML for NLP and their applications.

project image

Goedel-Prover-V2: Scaling Formal Theorem Proving with Scaffolded Data Synthesis and Self-Correction


Yong Lin*, Shange Tang*, Bohan Lyu*, Ziran Yang*, Jui-Hui Chung*, Haoyu Zhao*, Lai Jiang*, Yihan Geng*, Jiawei Ge, Jingruo Sun, Jiayun Wu, Jiri Gesi, Ximing Lu, David Acuna, Kaiyu Yang, Hongzhou Lin, Yejin Choi, Danqi Chen, Sanjeev Arora, Chi Jin
ICLR 2026, AI4MATH @ ICML 2025 (Oral)
arXiv / Code / Slides / Website / BibTeX /

Goedel-Prover-V2 sets new state-of-the-art in automated proof generation with dramatically smaller models, using scaffolded training, self-correction, and model averaging to outperform systems 20-100x larger on key mathematical benchmarks.

project image

SURGE: On the Potential of Large Language Models as General-Purpose Surrogate Code Executors


Bohan Lyu*, Siqiao Huang*, Zichen Liang*
EMNLP 2025 (Main, meta score top 0.3%)
arXiv / Code / Website / BibTeX /

SURGE evaluates LLMs’ ability to predict code execution across eight key areas. While they show promise, limitations prevent general-purpose surrogate execution. This is an extension of a course project that does not reflect the focus of my research.

project image

Goedel-Prover: A Frontier Model for Open-Source Automated Theorem Proving


Yong Lin*, Shange Tang*, Bohan Lyu, Jiayun Wu, Hongzhou Lin, Kaiyu Yang, Jia Li, Mengzhou Xia, Danqi Chen, Sanjeev Arora, Chi Jin
COLM 2025
arXiv / Code / Website / YouTube / BibTeX /

The new 7B Goedel-Prover sets a new state-of-the-art in open-source automated theorem proving, beating previous records with a 7% improvement on miniF2F, topping the PutnamBench Leaderboard, and solving nearly twice as many problems on Lean Workbook.

project image

Adapting While Learning: Grounding LLMs for Scientific Problems with Tool Usage Adaptation


Bohan Lyu*, Yadi Cao*, Duncan Watson-Parris, Leon Bergen, Taylor Berg-Kirkpatrick, Rose Yu
ICML 2025, AAAI 2024 FSS (Oral)
arXiv / Code / Slides / Website / YouTube / BibTeX /

This work proposes a fine-tuning method where LLMs internalize tool-generated solutions (World Knowledge Distillation) and learn to switch between direct answers and tool use for complex problems (Tool Usage Adaptation). It outperforms GPT-4 and Claude-3.5 across six scientific benchmarks.

project image

Enhancing Open-Domain Task-Solving Capability of LLMs via Autonomous Tool Integration from GitHub


Bohan Lyu*, Xin Cong*, Heyang Yu, Pan Yang, Cheng Qian, Zihe Wang, Yujia Qin, Yining Ye, Yaxi Lu, Chen Qian, Zhong Zhang, Yukun Yan, Yankai Lin, Zhiyuan Liu, Maosong Sun
ACL 2025 (Main)
arXiv / Code / Slides / Website / YouTube / BibTeX /

Constructed OpenAct benchmark for complex open-domain task-solving. Developed a novel LLM agent system, OpenAgent, which leverages GitHub repositories to extend its capabilities to address diverse user queries.

Talks

2025-10 The Curious Landscape of LLM Reasoning: An Overview of Techniques, Research Directions, and Challenges at Wuqiong College, Tsinghua University.

2025-07 LLM Reasoning: From Informal to Formal at Department of Computer Science and Technology, Tsinghua University.

Resources

Some tools and resources I've developed for the research community:

  • jload - A Python package for convenient JSON/JSONL data loading and saving. PyPI version
  • vlllm - A Python package for convenient text generation with multiprocessing support. PyPI version





Design and source code from Leonid Keselman's website