This repository contains materials associated to the paper:
Agnese Daffara, Alan Ramponi, and Sara Tonelli. 2025. WorthIt: Check-worthiness Estimation of Italian Social Media Posts. In Proceedings of the 11th Italian Conference on Computational Linguistics (CLiC-it 2025), Cagliari, Italy. CEUR Workshop Proceedings. [cite] [paper]
- 📃 WorthIt dataset
- 🚀 Code for experiments
- 📌 Overlap with the Faina dataset
- 📖 Further information
- ✏️ Citation
WorthIt is the first dataset for factuality/verifiability and check-worthiness estimation of Italian social media posts. Previous efforts for other languages typically cover a single topic and focus on a limited time frame, affecting models' generalizability on out-of-distribution data. To fill these gaps, WorthIt spans public discourse on migration, climate change, and public health issues across a large time period of six years. It also includes disaggregated annotations to embrace human label variation.
📃 Dataset request: please write an e-mail to us to request the WorthIt dataset. The dataset can be used for non-commercial research purposes only and the user must declare to avoid the redistribution of the dataset to third parties or in online repositories, deanonymization (by any means), and to exclude data misuse.
The WorthIt dataset is released in an anonymized form (i.e., with [USER], [URL], [EMAIL], and [PHONE] placeholders) with no users' information nor original post identifier to preserve their anonymity. We include individual factuality/verifiability and check-worthiness annotations (i.e., labels assigned by each annotator) across all instances to encourage research on human label variation, as well as information about time and topics. We also include the aggregated labels used for the experiments in the paper (see Data format section); however, label aggregation can be performed in multiple ways, and - if needed - the user can rely on the individual annotations to perform their own aggregation strategy.
The WorthIt dataset consists of two data splits: one for training/development (data/train-dev.tsv) and one for testing (data/test.tsv).
- Training/development set [
data/train-dev.tsv]: the split for training/development purposes (80% of the posts) with gold labels. Users are free to decide how to split this set into train/dev portions as part of their design decisions. - Test set [
data/test.tsv]: the split for official testing purposes (20% of the posts) without labels. To obtain official evaluation scores, the user has to submit their predictions (i.e., a file following the same format ofdata/test.tsvbut including the predicted labels, see Data format section) through the CodaBench benchmark page (link available soon!).
The format of the WorthIt dataset is tab-separated and contains a header line. Each line consists of information about each post (i.e., id, date, topic, text, labels). Individual annotations for both factuality/verifiability and check-worthiness are provided along with the aggregated labels used for the experiments in the paper.
Specifically, a post in the WorthIt dataset is represented as follows:
$POST_ID $POST_DATE $POST_TOPIC_KEYWORDS $POST_TEXT $LABEL_FV_BY_ANN_A $LABEL_FV_BY_ANN_B $LABEL_CW_BY_ANN_A $LABEL_CW_BY_ANN_B $LABEL_FV_AGGREGATED $LABEL_CW_AGGREGATED
where:
$POST_ID: the identifier of the post (integer);$POST_DATE: the date of the post (YYYY-MM);$POST_TOPIC_KEYWORDS: the set in which the keyword that led to the post selection belongs (migration, climate change, or public health);$POST_TEXT: the text of the post (anonymized with [USER], [URL], [EMAIL], and [PHONE] placeholders);$LABEL_FV_BY_ANN_j: the factuality/verifiability label assigned by a given annotator j for the post. The label can be 0 (not factual/verifiable) or 1 (factual/verifiable);$LABEL_CW_BY_ANN_j: the check-worthiness label assigned by a given annotator j for the post. The label can be 0 (definitely not check-worthy), 1 (probably not check-worthy), 2 (neither not check-worthy nor check-worthy), 3 (probably check-worthy), or 4 (definitely check-worthy);$LABEL_FV_AGGREGATED: the (aggregated) factuality/verifiability label for the post, obtained as described in the paper (i.e., if all annotators labeled the post as 1 (factual/verifiable), the label here is 1; o.w., it is 0). Possible labels are therefore 0 (not factual/verifiable) or 1 (factual/verifiable);$LABEL_CW_AGGREGATED: the (aggregated) check-worthiness label for the post, obtained as described in the paper (i.e., if all annotators labeled the post with at least 3 (probably check-worthy), the label here is 1; o.w., it is 0). Possible labels are therefore 0 (not check-worthy) or 1 (check-worthy).
Please note that the test set does not include gold labels (i.e., it has empty $LABEL_FV_BY_ANN_j, $LABEL_CW_BY_ANN_j, $LABEL_FV_AGGREGATED, and $LABEL_CW_AGGREGATED columns) because it serves for official evaluation only (see Data splits section).
Instructions on how to run classification models are provided in the src/classification/ folder.
Instructions on how to run generation models are provided in the src/generation/ folder.
The WorthIt dataset includes the same posts from 2019 to 2022 that are in Faina, a previously released dataset for fine-grained fallacy detection (Ramponi et al., 2025), and further includes messages from 2017 and 2018. To use both datasets together, it is possible to merge the posts in the two datasets by their $POST_ID.
If you need further information, do not hesitate to get in touch with us by writing an email.
If you use or build on top of this work, please cite our paper as follows:
@inproceedings{daffara-etal-2025-worthit,
title = "WorthIt: Check-worthiness Estimation of Italian Social Media Posts",
author = "Daffara, Agnese and
Ramponi, Alan and
Tonelli, Sara",
booktitle = "Proceedings of the 11th Italian Conference on Computational Linguistics (CLiC-it 2025)",
month = sep,
year = "2025",
address = "Cagliari, Italy",
publisher = "CEUR Workshop Proceedings"
}