MeetingBank Summarization with BART

Fine-tuning facebook/bart-large-xsum on the MeetingBank dataset to generate concise meeting summaries. This repo contains a simple, script-based pipeline from preprocessing to training and evaluation.

Repository Structure

01_data_preprocessing.py: Loads MeetingBank, tokenizes transcripts and summaries, and saves a tokenized dataset.
02_fine_tuning_BART.py: Fine-tunes BART on the tokenized dataset and saves checkpoints and logs.
03_evaluation_and_comparison.py: Compares the fine-tuned model with the base model using ROUGE and exports side-by-side outputs.
Slides/: Presentation screenshots for context and visualization.
results/: Saved experiment outputs (loss logs and evaluation exports).

Setup

Python 3.9+ recommended
GPU strongly recommended for training and evaluation

Install dependencies:

pip install -r requirements.txt

Workflow

Preprocess and tokenize the dataset:

python 01_data_preprocessing.py

Fine-tune BART (default configuration uses training_args_v2_2):

python 02_fine_tuning_BART.py

Evaluate and compare against the base model:

python 03_evaluation_and_comparison.py

Outputs

Tokenized dataset saved to data/processed/tokenized_meetingbank
Models saved to models/ (path configured in 02_fine_tuning_BART.py)
Training logs and CSV exports in results/

Notes

Make sure fine_tuned_model_path in 03_evaluation_and_comparison.py matches the model output path used in training.
The evaluation script runs on the full test split by default and can take a while on CPU.
If you want smaller, faster experiments, use the provided half_train_data, tenth_train_data, or hundredth_train_data in 02_fine_tuning_BART.py.

Dataset

MeetingBank (Hugging Face Hub): huuuyeah/meetingbank

Citations

MeetingBank: A Benchmark Dataset for Meeting Summarization
Yebowen Hu, Tim Ganter, Hanieh Deilamsalehy, Franck Dernoncourt, Hassan Foroosh, Fei Liu
In main conference of Association for Computational Linguistics (ACL'23), Toronto, Canada.
BART: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension
Mike Lewis, Yinhan Liu, Naman Goyal, Marjan Ghazvininejad, Abdelrahman Mohamed, Omer Levy, Ves Stoyanov, Luke Zettlemoyer

License

MIT (see LICENSE).

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

MeetingBank Summarization with BART

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
Slides		Slides
results		results
.gitignore		.gitignore
01_data_preprocessing.py		01_data_preprocessing.py
02_fine_tuning_BART.py		02_fine_tuning_BART.py
03_evaluation_and_comparison.py		03_evaluation_and_comparison.py
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt

Folders and files

Latest commit

History

Repository files navigation

MeetingBank Summarization with BART

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages