TMC-Llama: Exploring Transition Metal Complexes with Large Language Models

📖 Introduction

TMC-Llama is fine-tuned from Meta's open-source pre-trained Llama3 large language models (Llama-3.2-1b-Instruct). TMC-Llama generates transition metal complexes (TMCs) using SMILES notations that are tailored for RDKit-compatible metal-organic connections, TMC-SMILES (developed by Rasmussen and co-workers). With a set of chemical properties provided in the supervised fine-tuning (SFT) prompts, TMC-Llama can generate TMCs in specific chemical space, making TMC-Llama a useful tool for TMC discovery.

In addition, the paper studies the unparsable strings (in Notebook 2) and identifies several failure modes for the generated TMCs. Corresponding to these failure modes, we revealed characteristic molecular properties / features that are helpful to build future tools, including SFT protocols and post-generation algorithms, for high quality TMC generation. These properties can also be infrastructures to develop models for chemically functional TMCs.

🔍 How to use

📕 Llama3 environment

Performing inference for TMC-Llama only requires installation of PyTorch, Transformers, and RDKit, which can be found in the directories below:

PyTorch: torch. In addition, TMC-Llama utilizes CUDA (version 11.8) to run PyTorch.
Transformers: Huggingface transformers. Note that you may want to specify your preferred CACHE directories.
RDKit: RDKit

All customized .py files to perform inference are in the libllama/ directory, which are developed in SmileyLlama project. The prerequisites of virtual environment to perform inference will be identical to SmileyLlama as well.

📗 Running Jupyter-notebook demonstrations

All notebook demonstrations can be performed using existing files in libTMC/ and libllama/ directories if the prerequisites above are satisfied, such as RDKit. Customized python functions to identify transition metal centers, isolate ligands, fix redundant dative bonds, correct atoms with improper valences, and fix unclosed rings are in .py files in libTMC/.
Demonstration datasets and the generated results (both of which are .csv files) are in data/ directory.

📘 Fine-tuning TMC-Llama

TMC-Llama is built on top of the SmileyLlama repository, so axolotl needs to be installed to fine-tune and obtain TMC-Llama, following the previous Installation guide. The fine-tuning dataset of TMC-Llama and the corresponding SFT prompts can be found on FigShare.

📙 Inference

To perform inference using TMC-Llama, download the trained models from FigShare and follow the instructions in the Notebook 4 (inference guideline).

📄 License

See the LICENSE file for details

🙏 Acknowledgments

We thank all authors to develop TMC-Llama and build this project! Similar utility of Llama3 models for bio-chemical applications can be found in SmileyLlama and SynLlama.

📝 Citation

If you use this code in your research, please cite:

@misc{tmc_llama_2025,
    title = {Exploring Transition Metal Complexes with Large Language Models},  
    url = {https://chemrxiv.org/engage/chemrxiv/article-details/69136d39a10c9f5ca1c14847},
    doi = {10.26434/chemrxiv-2025-hm3zb},
    publisher = {ChemRxiv},
    author = {Liu, Yunsheng and Cavanagh, Joseph and Sun, Kunyang and Toney, Jacob and Yuan, Chung-Yueh and Smith, Andrew and St Michel II, Roland and Graggs, Paul and Toste, F Dean and Kulik, Heather and Head-Gordon, Teresa},
    month = nov,
    year = {2025}}

Name		Name	Last commit message	Last commit date
Latest commit History 19 Commits
data		data
libTMC		libTMC
libllama		libllama
LICENSE		LICENSE
Notebook1_obtain_prop.ipynb		Notebook1_obtain_prop.ipynb
Notebook2_fix_unparsable_smiles.ipynb		Notebook2_fix_unparsable_smiles.ipynb
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

TMC-Llama: Exploring Transition Metal Complexes with Large Language Models

📖 Introduction

🔍 How to use

📕 Llama3 environment

📗 Running Jupyter-notebook demonstrations

📘 Fine-tuning TMC-Llama

📙 Inference

📄 License

🙏 Acknowledgments

📝 Citation

About

Uh oh!

Releases

Packages

Languages

License

THGLab/TMC-Llama

Folders and files

Latest commit

History

Repository files navigation

TMC-Llama: Exploring Transition Metal Complexes with Large Language Models

📖 Introduction

🔍 How to use

📕 Llama3 environment

📗 Running Jupyter-notebook demonstrations

📘 Fine-tuning TMC-Llama

📙 Inference

📄 License

🙏 Acknowledgments

📝 Citation

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages