Critical Questions Generation through LLMs and Usefulness-based Selection

This repository contains materials associated to the paper:

Alan Ramponi, Gaudenzia Genoni, and Sara Tonelli. 2025. ARG2ST at CQs-Gen 2025: Critical Questions Generation through LLMs and Usefulness-based Selection. In Proceedings of the 12th Workshop on Argument Mining (ArgMining 2025), Vienna, Austria. Association for Computational Linguistics. [cite] [paper]

Getting started

Clone this repository on your own path:

git clone https://github.com/dhfbk/cqs-gen.git

Create an environment with your own preferred package manager. We used python 3.9 and dependencies listed in requirements.txt. If you use conda, you can just run the following commands from the root of the project:

conda create --name cqs-gen python=3.9            # create the environment
conda activate cqs-gen                            # activate the environment
pip install --user -r requirements.txt            # install the required packages

We use the CQs-Gen dataset obtained from the CQs-Gen shared task repository. Data is in the data/ folder:

validation_all.json: the development data split (matching the original validation.json file).
validation.json: the development data split (with interventions used in few-shot prompts removed for fair comparison with the zero-shot setting).
test.json: the test data split.

Generation of critical questions

The generation phase is conducted by prompting an LLM to obtain a raw output containing $N candidate CQs (i.e., either 3 or 5) for a given argumentative text. First, define the parameters in pred_eval_$N.sh (i.e., the model(s), the prompt(s), the zero/few-shot setting(s), and the seed(s)), then run the following:

sh pred_eval_$N.sh

where $N represents the number of CQs to generate (i.e., either 3 or 5). The outputs will be created in the results/ folder. Specifically, the following files (with prefix corresponding to the defined parameters) will be created:

*.log file: the LLM's raw output (with logs and associated prompt).
*.json file: the postprocessed output in .json format (i.e., with predicted CQs extracted from the raw output and associated with the relevant interventions).
*_results-similarity-06.json file: the postprocessed output in .json format (i.e., with predicted CQs extracted from the raw output and associated with the relevant interventions) with labels determined using the official shared task evaluation script.
*_results-similarity-06.txt file: the quantitative results in terms of overall punctuation score as well as label and punctuation distributions.

Usefulness-based questions selection

The CQs selection phase leverages a pretrained model fine-tuned using a dataset of Useful and Not useful CQs (i.e., unhelpful and invalid merged together; see src/machamp/data/ for the data flavors for training and the paper for details on how we assemble them). The fine-tuned model is a binary classifier and provides the confidence score for each predicted label. We use the confidence score for the label Useful as given by the classifier and rank the candidate CQs by decreasing "usefulness". We then select the top-k (k=3) CQs and use them as final output.

Training

To fine-tune the model, first define the parameters in src/scripts/train.sh (i.e., the model(s) and the data setting(s)), then run the following:

sh src/scripts/train.sh

The fine-tuned model will be created at logs/$MODEL_NAME/$DATETIME/model.pt, where $MODEL_NAME is a string corresponding to the defined parameters and $DATETIME is the datetime.

Prediction

To predict the usefulness of candidate CQs using the fine-tuned model, first convert the .json file $JSON_FILE obtained in the Generation of critical questions to .tsv:

python src/scripts/json-to-tsv.py --input_filepath $JSON_FILE

Then define the parameters in src/scripts/predict.py (i.e., the filepath of the resulting .tsv file(s), the model name(s) and datetime(s) to be used) and run the following:

sh src/scripts/predict.py

You will find the predictions in logs/$MODEL_NAME/$DATETIME/CQfilter.out.

Selection and evaluation

Now run the selector of the top-k (k=3) CQs:

python src/filtering.py --input_filepath $JSON_FILE --strategy model

Finally, run the evaluation script:

sh eval_3.sh

The outputs will be created in the results/ folder.

Citation

If you use or build on top of this work, please cite our paper as follows:

@inproceedings{ramponi-etal-2025-arg2st,
    title = "ARG2ST at CQs-Gen 2025: Critical Questions Generation through LLMs and Usefulness-based Selection",
    author = "Ramponi, Alan  and
      Genoni, Gaudenzia  and
      Tonelli, Sara",
    booktitle = "Proceedings of the 12th Workshop on Argument Mining (ArgMining 2025)",
    month = jul,
    year = "2025",
    address = "Vienna, Austria",
    publisher = "Association for Computational Linguistics"
}

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
data		data
results		results
src		src
.gitignore		.gitignore
README.md		README.md
eval_3.sh		eval_3.sh
pred_eval_3.sh		pred_eval_3.sh
pred_eval_5.sh		pred_eval_5.sh
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Critical Questions Generation through LLMs and Usefulness-based Selection

Getting started

Generation of critical questions

Usefulness-based questions selection

Training

Prediction

Selection and evaluation

Citation

About

Uh oh!

Releases

Packages

Languages

dhfbk/cqs-gen

Folders and files

Latest commit

History

Repository files navigation

Critical Questions Generation through LLMs and Usefulness-based Selection

Getting started

Generation of critical questions

Usefulness-based questions selection

Training

Prediction

Selection and evaluation

Citation

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages