Skip to content

HPAI-BSC/TuRTLe

Repository files navigation

HPAI


TuRTLe is a framework to assess LLMs across key RTL generation tasks systematically. It integrates multiple existing benchmarks and automates the evaluation process, enabling a comprehensive assessment of LLM performance in syntax correctness, functional correctness, synthesis, PPA optimization, and exact line completion.

Benchmarks EDA Tools and Metrics
VerilogEval v2.0 - Spec-to-RTL & Module Completion Icarus Verilog and Verilator - STX & FNC
RTLLM v1.1/v2.0 - Spec-to-RTL Yosys - SYN
VGen - Module Completion OpenROAD - PPA
RTL-Repo - Single Line Completion OpenLane - PPA

For more details about our work, refer to our ArXiv paper.

News

  • [2025-11-09] We release TuRTLe v2 with API inference support and local Docker-based evaluation for easy reproducibility
  • [2025-07-03] TuRTLe now supports Verilator as a simulator to check for Syntax and Functionality
  • [2025-06-12] We add support for multi-node inference with Ray and the configurations for bigger models
  • [2025-05-19] The project's source code is now publicly released. We'd love to hear your feedback, so give it a try!
  • [2025-03-31] Our paper "TuRTLe: A Unified Evaluation of LLMs for RTL Generation" is now available on ArXiv!
  • [2025-03-20] The leaderboard is now live! Check it out on our Huggingface Space

Leaderboard

Check the TuRTLe Leaderboard to know the best open-source models for each task.

Quick Start

Make sure you have installed TuRTLe and its dependencies. See Installation Guide for detailed setup instructions.

TuRTLe supports API-based inference which works out of the box with any OpenAI-compatible API (OpenRouter, OpenAI, Azure, etc.) with a Docker-based evaluation to run EDA tools locally.

Inference

export TURTLE_BASE_URL=https://openrouter.ai/api/v1
export TURTLE_API_KEY=sk-or-...
uv run turtle/src/turtle.py --use-api \
    --model google/gemini-2.5-flash \
    --task rtllm \
    --max_tokens 18432 \
    --temperature 0.2 \
    --top_p 0.95 \
    --n_samples 5 \
    --reasoning_effort medium \
    --save_generations \
    --save_generations_path './results/gemini-2.5-flash/rtllm.json' \
    --generation_only

Available tasks: rtllm, verilog_eval_rtl, verilog_eval_cc, verigen, rtlrepo

Evaluate with Docker

Evaluate the generated RTL designs using our bundled EDA tools (OpenLane, Verilator, Icarus Verilog):

docker run --rm -v $(pwd):/work -w /work ggcr0/turtle-eval:2.3.4 \
    python3 turtle/src/turtle.py --use_api \
    --task rtllm \
    --model gemini-2.5-flash \
    --n_samples 5 \
    --load_generations_path ./results/gemini-2.5-flash/rtllm.json

This will automatically pull the Docker image with all the EDA tooling and evaluate your designs for syntax, functionality, synthesis, and PPA metrics.

Advanced Usage

Local/Cluster Inference

If you have access to a GPU cluster and want to run local inference with vLLM or perform multi-node inference, see LOCAL_INFERENCE.md for detailed instructions on using SLURM and Singularity.

Add Your Benchmark

The process to implement a benchmark is very similar to the one described by bigcode-evaluation-harness guide. Follow these steps:

  1. Copy the turtle/tasks/template/new_task.py into turtle/tasks/ and rename it to the name of your benchmark <benchmark_name>.py.
  2. Complete all the TODO comments in the template file.
  3. Update the _load_new_modules() and _create_extended_registry() methods within turtle/src/utils/task_updater.py.

Citation

@inproceedings{garciagasulla2025turtleunifiedevaluationllms,
      title={TuRTLe: A Unified Evaluation of LLMs for RTL Generation}, 
      author={Dario Garcia-Gasulla and Gokcen Kestor and Emanuele Parisi and Miquel Albert\'i-Binimelis and Cristian Gutierrez and Razine Moundir Ghorab and Orlando Montenegro and Bernat Homs and Miquel Moreto},
      booktitle = {Proceedings of the 2025 ACM/IEEE International Symposium on Machine Learning for CAD},
      series = {MLCAD '25}
      year={2025},
      publisher = {Association for Computing Machinery},
      address = {New York, NY, USA},
      location = {Santa Cruz, CA, USA},
      url={https://arxiv.org/abs/2504.01986}, 
}

Contact

If you have any inquiries or wish to collaborate: hpai@bsc.es

Acknowledgments

This work was born as a fork of bigcode-evaluation-harness and vllm-code-harness, and has grown to its own framework for RTL code generation evaluation. We remain grateful to these projects.

We acknowledge the open-source EDA tools: Icarus Verilog, Verilator, Yosys, OpenROAD and LibreLane.

We also thank the authors of the benchmarks integrated in TuRTLe: VerilogEval, RTLLM, VGen, and RTL-Repo.

About

TuRTLe: A Unified Evaluation of LLMs for RTL Generation 🐢 (MLCAD 2025)

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 4

  •  
  •  
  •  
  •