Curated list of papers, datasets, benchmarks, and open-source repos on applying Large Language Models (LLMs) to RTL/HDL/Verilog code generation and chip design automation.
🚀 LLM for RTL generation is an emerging frontier (2023–2025) at the intersection of AI, hardware design, and EDA.
This repo aims to be the community hub + benchmark index + roadmap for this new wave.
Last updated: 2025-10-03
- Papers
- Open-Source Repos
- Related Tooling (Verification/Feedback)
- Industry & Startups
- Publication Trend
- Roadmap & Trends
- Contributing
- Acknowledgement
- CodeV (2024) – Instruction-tuned Verilog LLM via multi-level summarization. [arXiv] [GitHub]
- CodeV-R1 (2025) – RL with Verilog-specific reward (testbench-based). [arXiv] [Project]
- VerilogCoder (AAAI’25) – Multi-agent Verilog generation system. [arXiv] [GitHub]
- VeriSeek (2024) – RL with structure-aware rewards for Verilog generation. [arXiv]
- VeriOpt (2025) – PPA-aware multi-role LLM framework. [arXiv]
- ComplexVCoder (2025) – Two-stage GIR + RAG for complex RTL projects. [arXiv]
- VGen / VeriGen (2023) – Early fine-tuned Verilog LLMs. [GitHub] [HF model]
- Spec2RTL-Agent (NVIDIA, 2025) – Multi-agent system from specification → RTL. [Link]
- VeriMind (2025) – Agentic LLM for automated Verilog generation. [arXiv]
- ReasoningV (2025) – Hybrid reasoning for efficient Verilog code gen. [arXiv]
- VeriCoder (2025) – Functional correctness validation for RTL gen. [ResearchGate]
- RTL++ (2025) – Graph-enhanced LLM for RTL code generation. [arXiv]
- VerilogEval (NVIDIA, 2023) – First systematic benchmark for Verilog gen. [Paper]
- RTLLM / RTLLM 2.0 (HKUST, 2023–25) – NL→RTL benchmark (29 → 50 designs). [GitHub]
- RTL-Repo (AUCOHL, 2024) – Large-scale multi-file RTL repo benchmark. [arXiv] [GitHub]
- RealBench (2025) – Comprehensive real-world IP benchmark. [arXiv]
- TuRTLe (2025) – Unified leaderboard for RTL generation tasks. [arXiv] [HF leaderboard]
- hdl2v (Berkeley, 2025) – 46k+ VHDL/Chisel/PyMTL → Verilog translation pairs. [Tech Report]
- RTLCoder-Data (HKUST, 2024) – Synthetic RTL problems + solutions. [GitHub]
- OpenLLM-RTL suite (ICCAD’24) – AssertEval + RTLLM-2.0 + RTLCoder data overview. [PDF]
- IPRC-DIP/CodeV – Domain-tuned Verilog LLMs.
- NVlabs/VerilogCoder – Multi-agent Verilog gen system.
- hkust-zhiyao/RTL-Coder – Data + fine-tuned models.
- AUCOHL/RTL-Repo – Multi-file benchmark dataset.
- shailja-thakur/VGen – Early fine-tuned Verilog repo.
- hkust-zhiyao/RTLLM – Natural language → RTL dataset.
- HaVen (DATE’25) – Hallucination-mitigated Verilog gen. [arXiv] [GitHub]
- AutoBench (MLCAD’24) – LLM-based Verilog testbench generation. [arXiv]
- CorrectBench (2024) – Automatic validation & correction of LLM-generated testbenches. [arXiv]
- AutoChip / EDA-Feedback (2024) – Using synthesis & sim feedback to iteratively refine Verilog gen. [arXiv]
- NVIDIA – VerilogEval, VerilogCoder, Spec2RTL-Agent.
- Synopsys & Cadence – Exploring AI copilots for RTL coding, verification, PPA analysis.
- Startups:
- ChipFlow, RapidSilicon – RTL generation + FPGA/EDA automation.
- Stealth-mode AI+EDA startups (2024–2025).
- VC activity – AI+Chip Design emerging thesis; early funding since 2023.
The number of LLM for RTL/Verilog Generation papers & repos shows rapid growth, with a clear acceleration in 2025.
Note: Counts are curated estimates from public arXiv/conference/GitHub sources.
2022: 2 • 2023: 5 • 2024: 9 • 2025: 16+
- 2022–2023: Proof of concept (VerilogEval, RTLLM).
- 2024: Domain-specific LLMs (CodeV), large-scale benchmarks (RTL-Repo).
- 2025: Explosion of new methods (Spec2RTL-Agent, VeriMind, ReasoningV, VeriCoder, RTL++), hallucination mitigation (HaVen), cross-HDL augmentation (hdl2v), unified leaderboards (TuRTLe).
- Next 2–3 years:
- PPA-aware LLMs (functionality + power/performance/area)
- Multi-file, hierarchical RTL consistency
- Closed-loop integration with synthesis/simulation
- Open-source leaderboards & competitions
Pull requests welcome!
If you know of a paper / dataset / repo not listed here, please open an issue or PR.
Let’s make this the go-to hub for LLM + RTL generation research.
This repository was initially drafted with the assistance of GPT-5 (OpenAI),
and is continuously curated and updated by the community.
本仓库整理了 大语言模型(LLMs)在 RTL/HDL/Verilog 代码生成与芯片设计自动化 的论文、数据集、基准与开源项目。
🚀 LLM for RTL generation 是 2023–2025 年快速兴起的研究前沿,位于 人工智能、硬件设计与 EDA 的交叉点。
本仓库旨在成为该领域的 资源中心 + 基准索引 + 发展路线图。
- CodeV / CodeV-R1 – Verilog 指令微调 + 强化学习
- VerilogCoder – 多智能体 Verilog 生成系统
- VeriSeek / VeriOpt / ComplexVCoder – 奖励建模、PPA 约束与复杂设计生成
- VGen / VeriGen – 早期 Verilog 微调模型
- Spec2RTL-Agent – 多代理从规格生成 RTL
- VeriMind – Agent 架构的自动 Verilog 生成
- ReasoningV – 混合推理高效 Verilog 生成
- VeriCoder – 功能正确性验证增强生成
- RTL++ – 图增强的 RTL 生成
- VerilogEval – NVIDIA Verilog 生成基准
- RTLLM / RTLLM 2.0 – 自然语言 → RTL 基准
- RTL-Repo – 大规模多文件 RTL 项目评测
- RealBench – 综合真实 IP 评测
- TuRTLe – 统一 RTL leaderboard
- hdl2v – 跨 HDL → Verilog 翻译数据集(46k+ 对)
- RTLCoder-Data – 合成 RTL 任务
- OpenLLM-RTL suite – AssertEval + RTLLM-2.0 等
- HaVen – 幻觉缓解 RTL 生成
- AutoBench / CorrectBench – 自动 testbench 生成与验证
- AutoChip / EDA-Feedback – 综合/仿真反馈改进 RTL
下图展示 LLM for RTL/Verilog Generation 在 2022–2025 年的增长趋势(估计值,来自公开 arXiv/会议/GitHub):
2022:2 • 2023:5 • 2024:9 • 2025:16+
- 2022–2023:可行性验证(VerilogEval, RTLLM)
- 2024:专用 LLM 与大规模基准(CodeV, RTL-Repo)
- 2025:方法爆发(Spec2RTL-Agent, VeriMind, ReasoningV, VeriCoder, RTL++ 等)、幻觉缓解(HaVen)、跨 HDL 增强(hdl2v)、统一榜单(TuRTLe)
- 未来 2–3 年:
- 面向 PPA 的 LLM
- 多文件 / 层次一致性
- 综合/仿真工具闭环
- 开源竞赛与排行榜
本仓库初始版本由 GPT-5 协助撰写,后续由社区共同维护与更新。
