Monte Carlo Tree Search Applications to Neural Theorem Proving

A common problem of LLM inference is hallucination, where models generate false information. Another such problem is the tradeoff between model size and computational cost. Larger models use more VRAM, in addition to requiring longer training and inference times. This work explores solutions to thes...

Full description

Bibliographic Details
Main Author: LaBelle, Ethan
Other Authors: Solar-Lezama, Armando
Format: Thesis
Published: Massachusetts Institute of Technology 2024
Online Access:https://hdl.handle.net/1721.1/156761
_version_ 1826216625117855744
author LaBelle, Ethan
author2 Solar-Lezama, Armando
author_facet Solar-Lezama, Armando
LaBelle, Ethan
author_sort LaBelle, Ethan
collection MIT
description A common problem of LLM inference is hallucination, where models generate false information. Another such problem is the tradeoff between model size and computational cost. Larger models use more VRAM, in addition to requiring longer training and inference times. This work explores solutions to these problems, namely search and verification, following Yang’s recent contribution: LeanDojo: Theorem Proving with Retrieval-Augmented Language Models. In their work, Yang et al. introduce LeanDojo, an environment for programmatic interaction with the Lean theorem proving language, alongside ReProver, a ByT5-Small transformer-based ATP fine-tuned using the open source Lean mathlib. The smaller model requires fewer resources, enabling faster inference, which when combined with search, improves the effective performance of the model. We use the language model to generate a space of partial proof trees in Lean. As the core GPT can be interchanged with a larger or more performant model, this work focuses on search algorithms for finding novel proofs given the same computational budget. Three classes of algorithms are explored: best first search, random walk, and Monte Carlo Tree Search. Search algorithms are evaluated on the random split test dataset of the LeanDojo Benchmark. Finally, we present common failure modes of various methods, search results of algorithm variants, and novel proofs discovered relative to the baseline. Across our trials, we show the search space defined by ReProver’s tactic generator contains proofs for approximately 55.0% of theorems in the LeanDojo Benchmark random test split. In Yang’s evaluations, ReProver achieves a 51.2% solve rate Pass@1 on this benchmark.
first_indexed 2024-09-23T16:50:07Z
format Thesis
id mit-1721.1/156761
institution Massachusetts Institute of Technology
last_indexed 2024-09-23T16:50:07Z
publishDate 2024
publisher Massachusetts Institute of Technology
record_format dspace
spelling mit-1721.1/1567612024-09-17T03:02:06Z Monte Carlo Tree Search Applications to Neural Theorem Proving LaBelle, Ethan Solar-Lezama, Armando Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science A common problem of LLM inference is hallucination, where models generate false information. Another such problem is the tradeoff between model size and computational cost. Larger models use more VRAM, in addition to requiring longer training and inference times. This work explores solutions to these problems, namely search and verification, following Yang’s recent contribution: LeanDojo: Theorem Proving with Retrieval-Augmented Language Models. In their work, Yang et al. introduce LeanDojo, an environment for programmatic interaction with the Lean theorem proving language, alongside ReProver, a ByT5-Small transformer-based ATP fine-tuned using the open source Lean mathlib. The smaller model requires fewer resources, enabling faster inference, which when combined with search, improves the effective performance of the model. We use the language model to generate a space of partial proof trees in Lean. As the core GPT can be interchanged with a larger or more performant model, this work focuses on search algorithms for finding novel proofs given the same computational budget. Three classes of algorithms are explored: best first search, random walk, and Monte Carlo Tree Search. Search algorithms are evaluated on the random split test dataset of the LeanDojo Benchmark. Finally, we present common failure modes of various methods, search results of algorithm variants, and novel proofs discovered relative to the baseline. Across our trials, we show the search space defined by ReProver’s tactic generator contains proofs for approximately 55.0% of theorems in the LeanDojo Benchmark random test split. In Yang’s evaluations, ReProver achieves a 51.2% solve rate Pass@1 on this benchmark. M.Eng. 2024-09-16T13:47:32Z 2024-09-16T13:47:32Z 2024-05 2024-07-11T14:36:31.115Z Thesis https://hdl.handle.net/1721.1/156761 In Copyright - Educational Use Permitted Copyright retained by author(s) https://rightsstatements.org/page/InC-EDU/1.0/ application/pdf Massachusetts Institute of Technology
spellingShingle LaBelle, Ethan
Monte Carlo Tree Search Applications to Neural Theorem Proving
title Monte Carlo Tree Search Applications to Neural Theorem Proving
title_full Monte Carlo Tree Search Applications to Neural Theorem Proving
title_fullStr Monte Carlo Tree Search Applications to Neural Theorem Proving
title_full_unstemmed Monte Carlo Tree Search Applications to Neural Theorem Proving
title_short Monte Carlo Tree Search Applications to Neural Theorem Proving
title_sort monte carlo tree search applications to neural theorem proving
url https://hdl.handle.net/1721.1/156761
work_keys_str_mv AT labelleethan montecarlotreesearchapplicationstoneuraltheoremproving