Enhancing Automatic Term Extraction with Large Language Models via Syntactic Retrieval

Introduction

This repository contains the code and resources for the paper Enhancing Automatic Term Extraction with Large Language Models via Syntactic Retrieval

@inproceedings{chun-etal-2025-enhancing,
    title = "Enhancing Automatic Term Extraction with Large Language Models via Syntactic Retrieval",
    author = "Chun, Yongchan  and
      Kim, Minhyuk  and
      Kim, Dongjun  and
      Park, Chanjun  and
      Lim, Heuiseok",
    editor = "Che, Wanxiang  and
      Nabende, Joyce  and
      Shutova, Ekaterina  and
      Pilehvar, Mohammad Taher",
    booktitle = "Findings of the Association for Computational Linguistics: ACL 2025",
    month = jul,
    year = "2025",
    address = "Vienna, Austria",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/2025.findings-acl.516/",
    pages = "9916--9926",
    ISBN = "979-8-89176-256-5"
}

Requirements

Install the required Python packages listed in requirements.txt using:

pip install -r requirements.txt

Dataset Preparation

We use three datasets for our experiments: ACTER, ACL-RD, and BCGM. Follow these steps to prepare the datasets:

Navigate to the src/dataset directory and execute all cells in the preprocess.ipynb notebook located in each dataset folder.
Run the following command to create dataset indices for retrieval:
```
python src/dataset/create_index.py
```

Running Experiments

Reproducing Main Results

To reproduce the main results from the paper:

Navigate to the src/run_scripts directory:
```
cd src/run_scripts
```
Execute the script:
```
bash test.sh
```

Alternatively, you can manually run the tests using:

cd src
python main.py

Key configuration arguments include:

config_path: Path to the configuration file (e.g., configs/test.json).
model: Model name (e.g., meta-llama/Meta-Llama-3.1-8B-Instruct, google/gemma-2-9b-it, mistralai/Mistral-Nemo-Instruct-2407).
dataset: Dataset name (ACTER, ACL-RD, BCGM).
num_shots: Number of shots to use.
retrieval_method: Retrieval method (default, default_w_ins, bm25, random, fastkassim).
seed: Random seed (default: 42).

Comparison to Pretrained Language Models

To reproduce results from the "Comparison to Pretrained Language Models" section:

For RoBERTa, execute all cells in the RoBERTa.ipynb notebook.

For BART, run:

cd src/run_scripts
bash train.sh
bash test.sh

Results Analysis

Experiment results are saved in the src/outputs directory.
To reproduce Table 2 and Figure 2 from the paper, execute all cells in the src/analysis.ipynb notebook.

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
src		src
.gitignore		.gitignore
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Enhancing Automatic Term Extraction with Large Language Models via Syntactic Retrieval

Introduction

Requirements

Dataset Preparation

Running Experiments

Reproducing Main Results

Comparison to Pretrained Language Models

Results Analysis

About

Uh oh!

Releases

Packages

Languages

cyc9805/ATE-with-LLM

Folders and files

Latest commit

History

Repository files navigation

Enhancing Automatic Term Extraction with Large Language Models via Syntactic Retrieval

Introduction

Requirements

Dataset Preparation

Running Experiments

Reproducing Main Results

Comparison to Pretrained Language Models

Results Analysis

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages