Skip to content
/ cqs-gen Public

Code associated with the ArgMining 2025 paper "ARG2ST at CQs-Gen 2025: Critical Questions Generation through LLMs and Usefulness-based Selection"

Notifications You must be signed in to change notification settings

dhfbk/cqs-gen

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Critical Questions Generation through LLMs and Usefulness-based Selection

This repository contains materials associated to the paper:

Alan Ramponi, Gaudenzia Genoni, and Sara Tonelli. 2025. ARG2ST at CQs-Gen 2025: Critical Questions Generation through LLMs and Usefulness-based Selection. In Proceedings of the 12th Workshop on Argument Mining (ArgMining 2025), Vienna, Austria. Association for Computational Linguistics. [cite] [paper]

Getting started

Clone this repository on your own path:

git clone https://github.com/dhfbk/cqs-gen.git

Create an environment with your own preferred package manager. We used python 3.9 and dependencies listed in requirements.txt. If you use conda, you can just run the following commands from the root of the project:

conda create --name cqs-gen python=3.9            # create the environment
conda activate cqs-gen                            # activate the environment
pip install --user -r requirements.txt            # install the required packages

We use the CQs-Gen dataset obtained from the CQs-Gen shared task repository. Data is in the data/ folder:

  • validation_all.json: the development data split (matching the original validation.json file).
  • validation.json: the development data split (with interventions used in few-shot prompts removed for fair comparison with the zero-shot setting).
  • test.json: the test data split.

Generation of critical questions

The generation phase is conducted by prompting an LLM to obtain a raw output containing $N candidate CQs (i.e., either 3 or 5) for a given argumentative text. First, define the parameters in pred_eval_$N.sh (i.e., the model(s), the prompt(s), the zero/few-shot setting(s), and the seed(s)), then run the following:

sh pred_eval_$N.sh

where $N represents the number of CQs to generate (i.e., either 3 or 5). The outputs will be created in the results/ folder. Specifically, the following files (with prefix corresponding to the defined parameters) will be created:

  • *.log file: the LLM's raw output (with logs and associated prompt).
  • *.json file: the postprocessed output in .json format (i.e., with predicted CQs extracted from the raw output and associated with the relevant interventions).
  • *_results-similarity-06.json file: the postprocessed output in .json format (i.e., with predicted CQs extracted from the raw output and associated with the relevant interventions) with labels determined using the official shared task evaluation script.
  • *_results-similarity-06.txt file: the quantitative results in terms of overall punctuation score as well as label and punctuation distributions.

Usefulness-based questions selection

The CQs selection phase leverages a pretrained model fine-tuned using a dataset of Useful and Not useful CQs (i.e., unhelpful and invalid merged together; see src/machamp/data/ for the data flavors for training and the paper for details on how we assemble them). The fine-tuned model is a binary classifier and provides the confidence score for each predicted label. We use the confidence score for the label Useful as given by the classifier and rank the candidate CQs by decreasing "usefulness". We then select the top-k (k=3) CQs and use them as final output.

Training

To fine-tune the model, first define the parameters in src/scripts/train.sh (i.e., the model(s) and the data setting(s)), then run the following:

sh src/scripts/train.sh

The fine-tuned model will be created at logs/$MODEL_NAME/$DATETIME/model.pt, where $MODEL_NAME is a string corresponding to the defined parameters and $DATETIME is the datetime.

Prediction

To predict the usefulness of candidate CQs using the fine-tuned model, first convert the .json file $JSON_FILE obtained in the Generation of critical questions to .tsv:

python src/scripts/json-to-tsv.py --input_filepath $JSON_FILE

Then define the parameters in src/scripts/predict.py (i.e., the filepath of the resulting .tsv file(s), the model name(s) and datetime(s) to be used) and run the following:

sh src/scripts/predict.py

You will find the predictions in logs/$MODEL_NAME/$DATETIME/CQfilter.out.

Selection and evaluation

Now run the selector of the top-k (k=3) CQs:

python src/filtering.py --input_filepath $JSON_FILE --strategy model

Finally, run the evaluation script:

sh eval_3.sh

The outputs will be created in the results/ folder.

Citation

If you use or build on top of this work, please cite our paper as follows:

@inproceedings{ramponi-etal-2025-arg2st,
    title = "ARG2ST at CQs-Gen 2025: Critical Questions Generation through LLMs and Usefulness-based Selection",
    author = "Ramponi, Alan  and
      Genoni, Gaudenzia  and
      Tonelli, Sara",
    booktitle = "Proceedings of the 12th Workshop on Argument Mining (ArgMining 2025)",
    month = jul,
    year = "2025",
    address = "Vienna, Austria",
    publisher = "Association for Computational Linguistics"
}

About

Code associated with the ArgMining 2025 paper "ARG2ST at CQs-Gen 2025: Critical Questions Generation through LLMs and Usefulness-based Selection"

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published