VerMCTS: Synthesizing Multi-Step Programs using a Verifier, a Large Language Model, and Tree Search (Dafny 2025)

Who

David Brandfonbrener, Simon Henniger, Sibi Raja, Tarun Prasad, Chloe Loughridge, Federico Cassano, Sabrina Ruixin Hu, Jianang Yang, William E. Byrd, Robert Zinkov, Nada Amin

Track

Dafny 2025

Time Zone

The program is currently displayed in (GMT-07:00) Mountain Time (US & Canada).

Use conference time zone: (GMT-07:00) Mountain Time (US & Canada)Select other time zone

The GMT offsets shown reflect the offsets at the moment of the conference.

Time Band

By setting a time band, the program will dim events that are outside this time window. This is useful for (virtual) conferences with a continuous program (with repeated sessions).
The time band will also limit the events that are included in the personal iCalendar subscription service.

Display full programSpecify a time band

Save

When

Sun 19 Jan 2025 16:18 - 16:36 at Hopscotch - Verified Code Synthesis Chair(s): Stefan Zetzsche

Abstract

Large Language Models (LLMs) can generate useful code, but often the code they generate cannot be trusted to be sound.
In this paper, we present VerMCTS, an approach to begin to resolve this issue by generating verified programs in Dafny and Coq. VerMCTS uses a logical verifier in concert with an LLM to guide a modified Monte Carlo Tree Search (MCTS). This approach leverages the verifier to gain intermediate feedback inside the search algorithm by checking partial programs at each step to estimate an upper bound on the value function. To measure the performance of VerMCTS, we develop a new suite of multi-step verified programming problems in Dafny and Coq. In terms of pass@$T$, a new metric which computes the pass rate given a budget of $T$ tokens sampled from the LLM, VerMCTS leads to more than a 30% absolute increase in average pass@5000 across the suite over repeated sampling from the base language model.

David Brandfonbrener

Harvard

Simon Henniger

Technical University of Munich

Sibi Raja

Harvard

Tarun Prasad

Harvard

Chloe Loughridge

Harvard University

Federico Cassano

Northeastern University

Sabrina Ruixin Hu

Harvard

Jianang Yang

Million.js

William E. Byrd

University of Alabama at Birmingham, USA

United States

Robert Zinkov

University of Oxford

Nada Amin

Harvard University

United States

Time Zone

The program is currently displayed in (GMT-07:00) Mountain Time (US & Canada).

Use conference time zone: (GMT-07:00) Mountain Time (US & Canada)Select other time zone

The GMT offsets shown reflect the offsets at the moment of the conference.

Time Band

Display full programSpecify a time band

Save

Session Program

Sun 19 Jan
Displayed time zone: Mountain Time (US & Canada) change

16:00 - 18:00	Verified Code SynthesisDafny at Hopscotch Chair(s): Stefan Zetzsche Amazon Web Services

16:00 18m Talk		Laurel: Unblocking Automated Verification with Large Language Models Dafny Eric Mugnier University of California San Diego, Emmanuel Anaya Gonzalez UCSD, Nadia Polikarpova University of California at San Diego, Ranjit Jhala University of California at San Diego, Zhou Yuanyuan UCSD
16:18 18m Talk		VerMCTS: Synthesizing Multi-Step Programs using a Verifier, a Large Language Model, and Tree Search Dafny David Brandfonbrener Harvard, Simon Henniger Technical University of Munich, Sibi Raja Harvard, Tarun Prasad Harvard, Chloe Loughridge Harvard University, Federico Cassano Northeastern University, Sabrina Ruixin Hu Harvard, Jianang Yang Million.js, William E. Byrd University of Alabama at Birmingham, USA, Robert Zinkov University of Oxford, Nada Amin Harvard University
16:36 18m Talk		dafny-annotator: AI-Assisted Verification of Dafny Programs Dafny Gabriel Poesia Stanford University, Chloe Loughridge Harvard University, Nada Amin Harvard University
16:54 18m Talk		Dafny as Verification-Aware Intermediate Language for Code Generation Dafny Yue Chen Li Massachusetts Institute of Technology, Stefan Zetzsche Amazon Web Services, Siva Somayyajula Amazon Web Services Pre-print
17:12 18m Talk		DafnyBench: A Benchmark for Formal Software Verification Dafny Chloe Loughridge Harvard University, Qinyi Sun Massachusetts Institute of Technology, Seth Ahrenbach Beneficial AI Foundation, Federico Cassano Northeastern University, Chuyue Sun Stanford University, Ying Sheng Stanford University, Anish Mudide Massachusetts Institute of Technology, Md Rakib Hossain Misu University of California Irvine, Nada Amin Harvard University, Max Tegmark Massachusetts Institute of Technology
17:30 18m Talk		Towards Neural Synthesis for SMT-Assisted Proof-Oriented Programming Dafny Saikat Chakraborty Microsoft Research, Gabriel Ebner Microsoft Research, Siddharth Bhat University of Cambridge, Sarah Fakhoury Microsoft Research, Sakina Fatima University of Ottawa, Shuvendu K. Lahiri Microsoft Research, Nikhil Swamy Microsoft Research
17:48 12m Day closing		Day closing Dafny Stefan Zetzsche Amazon Web Services