GitHub - gjq100/Bohdi: Bohdi is a novel framework for heterogeneous Large Language Model (LLM) fusion, enabling efficient knowledge transfer from multiple source LLMs to a compact target model without relying on real-world data.

🏆 NeurIPS 2025 Main Conference Paper

Junqi Gao ¹, Zhichang Guo ¹, Dazhi Zhang ¹, Dong Li ¹, Runze Liu ³, Pengfei Li ^1,5, Kai Tian ⁴, Biqing Qi^2,†

¹ School of Mathematics, Harbin Institute of Technology

² Shanghai Artificial Intelligence Laboratory

³ Tsinghua Shenzhen International Graduate School, Tsinghua University

⁴ Department of Electronic Engineering, Tsinghua University

⁵ Shanghai Innovation Institute

^† Corresponding Author

📄 Introduction

Bohdi is a novel framework for heterogeneous Large Language Model (LLM) fusion that integrates the strengths of multiple source LLMs into a target LLM through adaptive knowledge exploration and automatic data generation. Unlike existing methods that rely on real data from limited domains and use fixed data allocation proportions, Bohdi dynamically adjusts sampling based on the target LLM's performance and generates data automatically through a hierarchical knowledge tree structure. This ensures comprehensive domain coverage and balanced capability enhancement without the need for real data.

✨ Features

🚀 Synthetic-Data-Only Fusion: Bohdi operates without relying on real data, making it highly efficient and versatile.

🌳 Dynamic Domain Exploration: Through the hierarchical knowledge tree and Sprout/Harvest operations, Bohdi explores new domains and generates data automatically.

🔄 Adaptive Data Allocation: The DynaBranches mechanism with IR ensures dynamic adjustment of data sampling proportions based on the target LLM’s capabilities.

⚙️ Installation

Main Environment for Distillation

conda env create -f environment_Bohdi.yaml

Environment for Evaluation

conda env create -f opencompass_env.yaml

Preparation for Evaluation Suite

# The version we used: opencompass 0.3.4
git clone https://github.com/open-compass/opencompass opencompass
cd [your project path]/opencompass
pip install -e .

⏳ Distillation Training

To train the target LLM using Bohdi, follow these steps:

Prepare Source LLMs: Ensure you have access to the source LLMs you want to fuse. If you want to follow our setup, please download the following models:

# Source Models
Qwen/Qwen2.5-14B-Instruct
mistralai/Mistral-Small-24B-Instruct-2501
microsoft/phi-4
# Target Models
meta-llama/Llama-3.2-3B-Instruct
meta-llama/Llama-3.1-8B-Instruct
Qwen/Qwen2.5-7B-Instruct
google/gemma-2-9b-it

Run Bohdi For Distillation Please first configure the relevant paths in run_bohdi.sh according to your actual paths, and then run:
```
source activate bohdi
cd [your project path]/Bohdi
bash run_bohdi.sh
```

📏 Evaluation

We use OpenCompass for evaluation and perform inference based on VLLM. To evaluate your model, please configure the relevant paths in eval_opencompass.sh according to your actual paths, and then run:

source activate opencompass
cd [your project path]/opencompass
bash eval_opencompass.sh

Direct Download and Usage

If you would like to directly use the distilled models for evaluation, our distilled models can be found directly on Hugging Face:

ChetKao/Bohdi-Llama-3.2-3B-Instruct
ChetKao/Bohdi-Llama-3.1-8B-Instruct
ChetKao/Bohdi-Qwen2.5-7B-Instruct
ChetKao/Bohdi-gemma-2-9b-it

📚 Citation

@article{gao2025bohdi,
  title={Bohdi: Heterogeneous LLM Fusion with Automatic Data Exploration},
  author={Junqi Gao and Zhichang Guo and Dazhi Zhang and Dong Li and Runze Liu and Pengfei Li and Kai Tian and Biqing Qi},
  journal={arXiv preprint arXiv:2506.15721},
  year={2025},
  url={https://doi.org/10.48550/arXiv.2506.15721}
}

Name		Name	Last commit message	Last commit date
Latest commit History 20 Commits
Bohdi		Bohdi
assets		assets
README.md		README.md
enviorment_Bohdi.yaml		enviorment_Bohdi.yaml
opencompass_env.yaml		opencompass_env.yaml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

📄 Introduction

✨ Features

⚙️ Installation

⏳ Distillation Training

📏 Evaluation

Direct Download and Usage

📚 Citation

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

📄 Introduction

✨ Features

⚙️ Installation

⏳ Distillation Training

📏 Evaluation

Direct Download and Usage

📚 Citation

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages