GitHub

Two Steps from Hell: Compositionality on Chemical LMs

📃 Paper from EMNLP 2025 findings

Abstract

This paper investigates compositionality in chemical language models (ChemLLMs). We introduce STEPS, a benchmark with compositional questions that reflect intricate chemical structures and reactions, to evaluate models’ understanding of chemical language. Our approach focuses on identifying and analyzing compositional patterns within chemical data, allowing us to evaluate how well existing LLMs can handle complex queries. Experiments with state-of-the-art ChemLLMs show significant performance drops in compositional tasks, highlighting the need for models that move beyond pattern recognition. By creating and sharing this benchmark, we aim to enhance the development of more capable chemical LLMs and provide a resource for future research on compositionality in chemical understanding.

STEPS

STEPS evaluate several Chemical Large Language Models on 2-step chemical tasks.

Experimental datasets

Experimental datasets are provided in the folder "data".

References

If you use our repository, please cite the following related paper:

@inproceedings{ganeeva-etal-2025-two,
    title = "Two Steps from Hell: Compositionality on Chemical {LM}s",
    author = "Ganeeva, Veronika  and
      Khrabrov, Kuzma  and
      Kadurin, Artur  and
      Tutubalina, Elena",
    editor = "Christodoulopoulos, Christos  and
      Chakraborty, Tanmoy  and
      Rose, Carolyn  and
      Peng, Violet",
    booktitle = "Findings of the Association for Computational Linguistics: EMNLP 2025",
    month = nov,
    year = "2025",
    address = "Suzhou, China",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/2025.findings-emnlp.55/",
    pages = "1042--1049",
    ISBN = "979-8-89176-335-7",
    abstract = "This paper investigates compositionality in chemical language models (ChemLLMs). We introduce STEPS, a benchmark with compositional questions that reflect intricate chemical structures and reactions, to evaluate models' understanding of chemical language. Our approach focuses on identifying and analyzing compositional patterns within chemical data, allowing us to evaluate how well existing LLMs can handle complex queries. Experiments with state-of-the-art ChemLLMs show significant performance drops in compositional tasks, highlighting the need for models that move beyond pattern recognition. By creating and sharing this benchmark, we aim to enhance the development of more capable chemical LLMs and provide a resource for future research on compositionality in chemical understanding."
}

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
data		data
images		images
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Two Steps from Hell: Compositionality on Chemical LMs

Abstract

STEPS

Experimental datasets

References

About

Uh oh!

Releases

Packages

ChemistryLLMs/STEPS

Folders and files

Latest commit

History

Repository files navigation

Two Steps from Hell: Compositionality on Chemical LMs

Abstract

STEPS

Experimental datasets

References

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Packages