Skip to content

uccollab/hai-coaching

Repository files navigation

Ask the experts: sourcing a high-quality nutrition counseling dataset through Human-AI collaboration

Large Language Models (LLMs) are being employed by end-users for various tasks, including sensitive ones such as health counseling, disregarding potential safety concerns. It is thus necessary to understand how adequately LLMs perform in such domains. We conduct a case study on ChatGPT in nutrition counseling, a popular use-case where the model supports a user with their dietary struggles. We crowd-source real-world diet-related struggles, then work with nutrition experts to generate supportive text using ChatGPT. Finally, experts evaluate the safety and text quality of ChatGPT’s output. The result is the HAI-coaching dataset, containing ~2.4K crowdsourced dietary struggles and ~97K corresponding ChatGPT-generated and expert-annotated supportive texts. We analyse ChatGPT’s performance, discovering potentially harmful behaviours, especially for sensitive topics like mental health. Finally, we use HAI-coaching to test open LLMs on various downstream tasks, showing that even the latest models struggle to achieve good performance.

Citing us

You can find the published paper on the ACL Anthology.

If you use HAI-Coaching, please cite it as:

@inproceedings{balloccu-etal-2024-ask,
    title = "Ask the experts: sourcing a high-quality nutrition counseling dataset through Human-{AI} collaboration",
    author = "Balloccu, Simone  and
      Reiter, Ehud  and
      Li, Karen Jia-Hui  and
      Sargsyan, Rafael  and
      Kumar, Vivek  and
      Reforgiato, Diego  and
      Riboni, Daniele  and
      Dusek, Ondrej",
    editor = "Al-Onaizan, Yaser  and
      Bansal, Mohit  and
      Chen, Yun-Nung",
    booktitle = "Findings of the Association for Computational Linguistics: EMNLP 2024",
    month = nov,
    year = "2024",
    address = "Miami, Florida, USA",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/2024.findings-emnlp.674",
    pages = "11519--11545",
    abstract = "Large Language Models (LLMs) are being employed by end-users for various tasks, including sensitive ones such as health counseling, disregarding potential safety concerns. It is thus necessary to understand how adequately LLMs perform in such domains. We conduct a case study on ChatGPT in nutrition counseling, a popular use-case where the model supports a user with their dietary struggles. We crowd-source real-world diet-related struggles, then work with nutrition experts to generate supportive text using ChatGPT. Finally, experts evaluate the safety and text quality of ChatGPT{'}s output. The result is the HAI-coaching dataset, containing {\textasciitilde}2.4K crowdsourced dietary struggles and {\textasciitilde}97K corresponding ChatGPT-generated and expert-annotated supportive texts. We analyse ChatGPT{'}s performance, discovering potentially harmful behaviours, especially for sensitive topics like mental health. Finally, we use HAI-coaching to test open LLMs on various downstream tasks, showing that even the latest models struggle to achieve good performance. HAI-coaching is available at https://github.com/uccollab/hai-coaching/",
}

Dataset file structure

The HAI-coaching dataset is in the dataset.xlsx file and has the following structure:

  • TAB "DATASET": the actual dataset, containing the following columns:

    • Columns related to the struggles and the topic they cover:

      • doc_no: Document number for that specific annotator
      • annotator: Which annotator (number) worked on the given struggle. If "ALL", then the struggle was used for IAA, hence annotated by everyone.
      • struggle: The struggle, after typo correction.
      • cluster_auto: Coarse clustering, obtained automatically. Hyperparameters were set to capture main topics. Has been used as an aiding tool during topic modelling with experts.
      • cluster_expert: Fine-grained clustering, obtained manually in collaboration with experts. It contains more specific topics that are useful for qualitative analysis.
      • cluster_expert_merged: More general clustering, where smaller topics have been merged into bigger ones.
      • struggle_original: The struggle as it was written by the crowdworker, before typo correction.
      • full_embeddings: Embeddings for the struggle, from all-mpnet model. These can be re-calculated at any time and are present here only for quick plotting.
      • reduced_embeddings: Embeddings after PCA, for 3D plotting purposes.
    • Candidates from ChatGPT and their annotation. For IAA, majority voting was used.

      • OT: Whether the struggle is off-topic or not
      • reflection_candidates: Reflection generated by ChatGPT, divided by the "###" separator
      • reflection_annotation: Whether each candidate is safe or not, divided by the "###" separator
      • reflection_from_expert: Optional candidates written by experts, divided by the "###" separator

      The same structure echoes for all kind of supportive text (reframing, comfort and suggestion)

  • TAB "STATS": presents some basic counts based on the clustering

  • TAB "INFO": recaps dataset structure (like this README) and also shows the merging logic for clusters

    • Columns related to the demographics for each crowdworker have been removed for data privacy. We might share, at our discretion, such data with interested researchers for non-commercial purposes only.

Working with the dataset

The file dataset_parsing.ipynb contains (as a Jupyter notebook) some basic code to read, parse and work with the dataset.

Other files

We also release all the relevant documents and material used in our experiments for struggles collection, clustering, prompt engineering, prompting ChatGPT and safety annotation. Each step has its own directory and README file.

The code to reproduce our NLP baselines can be found in the "evaluation" folder.

Acknowledgements

This work has been funded by the EC in the H2020 Marie Skłodowska-Curie PhilHumans project (contract no. 812882) and the European Research Council (Grant agreement No. 101039303 NG-NLG).

erc-logo

About

Official repository for HAI-coaching, the first expert-annotated dataset for nutritional counselling, sourced through human-AI collaboration.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors