GitHub - Kripner/leantree: Structured data extraction and programmatic interaction with Lean 4.

Installation

Clone the repository including submodules:

git clone --recurse-submodules https://github.com/Kripner/leantree.git

Ensure Lean 4.19 is installed via elan.

lake --version

At the moment, LeanTree only supports Lean 4.19.

Make sure pip is installed. Then, run:

make install

Alternatively, use Poetry explicitly:

pip install poetry
poetry install

For running tests or experiments, refer to the Development section.

Basic Usage

Note: You must create a Lean project to use Lean. See Lean project guide for more information.

Start by creating or loading a Lean project:

from leantree import LeanProject

project = LeanProject.create("path/to/project")

# or load an existing one:
project = LeanProject("path/to/project")

# or decide automatically:
project = LeanProject("path/to/project", create=True)

The created project includes Mathlib by default. If no path is provided in LeanProject.create, the project will be created in or loaded from the leantree_project subdirectory of the current directory.

Starting a Proof

Using the environment, you can initialize a proof search by supplying a theorem with one or more sorry keywords.

with project.environment() as env:
    env.send_command("import Mathlib\nopen BigOperators Real Nat Topology Rat")
    branch = env.proof_from_sorry("theorem succ_less_double_succ (n : Nat) : n > 0 → n < 2 * n := by sorry")
    zero, succ = branch.apply_tactic("cases n")
    print("Factorized proof states after `cases n`:")
    print(zero.state)
    print(succ.state)
    assert not zero.is_solved
    ...

Async API

async with project.environment() as env:
    await env.send_command_async("import Mathlib\nopen BigOperators Real Nat Topology Rat")
    branch = await env.proof_from_sorry_async("theorem succ_less_double_succ (n : Nat) : n > 0 → n < 2 * n := by sorry")
    zero, succ = await branch.apply_tactic_async("cases n")
    ...

Data Extraction

Using the project, you can parse a Lean file and build all proof trees. Then, you can use the environment to start a proof for each tactic block in the file.

file = project.load_file("Example.lean")

# Pretty-print all proof trees.
for thm in file.theorems:
    print(thm.load_source() + "\n")
    for by_block in thm.by_blocks:
        print(by_block.tree.pretty_print())
    print("-" * 100)

# Start proofs for each tactic block.
for thm, branch in env.file_proofs(file):
    if isinstance(branch, Exception):
        print(f"Could not start theorem '{thm}' due to exception: {branch}")
    ...

Dataset Generation

You can easily generate a dataset containing all Lean files in a directory. For production-ready dataset generation, see the Datasets section

import glob
import json

with open("dataset.json", "w") as f:
    for path in glob.glob('**/*.lean', recursive=True):
        file = project.load_file(path)
        f.write(json.dumps(file.serialize()) + "\n")

Save/Restore Environment State

The environment state can be saved to disk and restored later.

with project.environment() as env:
    env.send_command("import Mathlib\nopen BigOperators Real Nat Topology Rat")
    env.pickle("env.pkl")

with project.environment() as env:
    env.unpickle("env.pkl")
    branch = env.proof_from_sorry("theorem succ_less_double_succ (n : Nat) : n > 0 → n < 2 * n := by sorry")
    zero, succ = branch.apply_tactic("cases n")
    ...

Datasets

Assuming you have already create a Lean project in leantree_project, you can recreate the whole Mathlib dataset by running:

python dataset/tree_dataset.py generate --project_path leantree_project --source_files mathlib/Mathlib

Development

Install all development and experiments dependencies:

make install-dev

Running Tests

To run tests, first create a Lean 4.19 project called leantree_project in the LeanTree directory. You can use LeanTree for that - in the leantree directory, run the following Python code:

from leantree import LeanProject

project = LeanProject.create()

After the project is created, run:

make test

Debugging Tips

When working with a Lean environment, you can use env.take_control() to debug the underlying Lean REPL. This method connects your stdin/stdout to the REPL's stdin/stdout.

Related Tools

LeanDojo
Pantograph

For a detailed comparison, refer to the LeanTree paper.

Reference

@inproceedings{
    kripner2025leantree,
    title={LeanTree: Accelerating White-Box Proof Search with Factorized States in Lean 4},
    author={Mat{\v{e}}j Kripner and Michal Sustr and Milan Straka},
    booktitle={2nd AI for Math Workshop @ ICML 2025},
    year={2025},
    url={https://openreview.net/forum?id=pROVJwTb6w}
}

Name		Name	Last commit message	Last commit date
Latest commit History 33 Commits
dataset		dataset
examples		examples
experiments		experiments
lean-repl @ 0a703db		lean-repl @ 0a703db
leantree		leantree
scripts		scripts
tests		tests
.gitignore		.gitignore
.gitmodules		.gitmodules
Example.lean		Example.lean
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Installation

Basic Usage

Starting a Proof

Async API

Data Extraction

Dataset Generation

Save/Restore Environment State

Datasets

Development

Running Tests

Debugging Tips

Related Tools

Reference

About

Uh oh!

Releases

Packages

Languages

License

Kripner/leantree

Folders and files

Latest commit

History

Repository files navigation

Installation

Basic Usage

Starting a Proof

Async API

Data Extraction

Dataset Generation

Save/Restore Environment State

Datasets

Development

Running Tests

Debugging Tips

Related Tools

Reference

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages