Skip to content

KevinMeisel/fine-tune-experiment

Repository files navigation

MeetingBank Summarization with BART

Fine-tuning facebook/bart-large-xsum on the MeetingBank dataset to generate concise meeting summaries. This repo contains a simple, script-based pipeline from preprocessing to training and evaluation.

Slide 1 Slide 2 Slide 3 Slide 4 Slide 5 Slide 6 Slide 7 Slide 8

Repository Structure

  • 01_data_preprocessing.py: Loads MeetingBank, tokenizes transcripts and summaries, and saves a tokenized dataset.
  • 02_fine_tuning_BART.py: Fine-tunes BART on the tokenized dataset and saves checkpoints and logs.
  • 03_evaluation_and_comparison.py: Compares the fine-tuned model with the base model using ROUGE and exports side-by-side outputs.
  • Slides/: Presentation screenshots for context and visualization.
  • results/: Saved experiment outputs (loss logs and evaluation exports).

Setup

  • Python 3.9+ recommended
  • GPU strongly recommended for training and evaluation

Install dependencies:

pip install -r requirements.txt

Workflow

  1. Preprocess and tokenize the dataset:
python 01_data_preprocessing.py
  1. Fine-tune BART (default configuration uses training_args_v2_2):
python 02_fine_tuning_BART.py
  1. Evaluate and compare against the base model:
python 03_evaluation_and_comparison.py

Outputs

  • Tokenized dataset saved to data/processed/tokenized_meetingbank
  • Models saved to models/ (path configured in 02_fine_tuning_BART.py)
  • Training logs and CSV exports in results/

Notes

  • Make sure fine_tuned_model_path in 03_evaluation_and_comparison.py matches the model output path used in training.
  • The evaluation script runs on the full test split by default and can take a while on CPU.
  • If you want smaller, faster experiments, use the provided half_train_data, tenth_train_data, or hundredth_train_data in 02_fine_tuning_BART.py.

Dataset

  • MeetingBank (Hugging Face Hub): huuuyeah/meetingbank

Citations

  1. MeetingBank: A Benchmark Dataset for Meeting Summarization
    Yebowen Hu, Tim Ganter, Hanieh Deilamsalehy, Franck Dernoncourt, Hassan Foroosh, Fei Liu
    In main conference of Association for Computational Linguistics (ACL'23), Toronto, Canada.

  2. BART: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension
    Mike Lewis, Yinhan Liu, Naman Goyal, Marjan Ghazvininejad, Abdelrahman Mohamed, Omer Levy, Ves Stoyanov, Luke Zettlemoyer

License

  • MIT (see LICENSE).

About

Meeting summarization pipeline using BART fine‑tuning on the MeetingBank dataset, with preprocessing, training, and evaluation scripts plus presentation visuals and results logs.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages