Fine-tuning facebook/bart-large-xsum on the MeetingBank dataset to generate concise meeting summaries. This repo contains a simple, script-based pipeline from preprocessing to training and evaluation.
Repository Structure
01_data_preprocessing.py: Loads MeetingBank, tokenizes transcripts and summaries, and saves a tokenized dataset.02_fine_tuning_BART.py: Fine-tunes BART on the tokenized dataset and saves checkpoints and logs.03_evaluation_and_comparison.py: Compares the fine-tuned model with the base model using ROUGE and exports side-by-side outputs.Slides/: Presentation screenshots for context and visualization.results/: Saved experiment outputs (loss logs and evaluation exports).
Setup
- Python 3.9+ recommended
- GPU strongly recommended for training and evaluation
Install dependencies:
pip install -r requirements.txtWorkflow
- Preprocess and tokenize the dataset:
python 01_data_preprocessing.py- Fine-tune BART (default configuration uses
training_args_v2_2):
python 02_fine_tuning_BART.py- Evaluate and compare against the base model:
python 03_evaluation_and_comparison.pyOutputs
- Tokenized dataset saved to
data/processed/tokenized_meetingbank - Models saved to
models/(path configured in02_fine_tuning_BART.py) - Training logs and CSV exports in
results/
Notes
- Make sure
fine_tuned_model_pathin03_evaluation_and_comparison.pymatches the model output path used in training. - The evaluation script runs on the full test split by default and can take a while on CPU.
- If you want smaller, faster experiments, use the provided
half_train_data,tenth_train_data, orhundredth_train_datain02_fine_tuning_BART.py.
Dataset
- MeetingBank (Hugging Face Hub):
huuuyeah/meetingbank
Citations
-
MeetingBank: A Benchmark Dataset for Meeting Summarization
Yebowen Hu, Tim Ganter, Hanieh Deilamsalehy, Franck Dernoncourt, Hassan Foroosh, Fei Liu
In main conference of Association for Computational Linguistics (ACL'23), Toronto, Canada. -
BART: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension
Mike Lewis, Yinhan Liu, Naman Goyal, Marjan Ghazvininejad, Abdelrahman Mohamed, Omer Levy, Ves Stoyanov, Luke Zettlemoyer
License
- MIT (see
LICENSE).