CSE153 music generation assignment with two sub-tasks:
- Task 1: Unconditioned symbolic generation — LSTM models for pitch and duration trained on classical piano MIDI.
- Task 2: Conditioned symbolic generation — builds on Task 1 with arpeggio-focused filtering and constrained decoding.
- Place classical piano MIDI files under
dataset/(notebook readsdataset/**/*.mid). - Adjust preprocessing constants in the notebook if needed (e.g.,
MIN_SEQ_LEN,MAX_SEQ_LEN,TRANSPOSE_RANGE).
- Preprocessing (Task1 section in notebook)
- Parse MIDI, quantize to 1/16 notes, drop short sequences, split at length 128, augment with ±3/±6 semitone transposition.
- Build
pitch_to_id/duration_to_id; save toTask1_preprocessed/.
- EDA
- Run EDA cells for pitch-class distribution, duration distribution, entropy, length histograms, and summary stats.
- Training & Generation
- Train separate pitch/duration LSTMs (embed 128, hidden 256, 2 layers, dropout 0.3, label smoothing 0.1).
- Best models (by val perplexity) saved to
Task1_outputs/; generatesgenerated_music.mid.
- Baselines & Metrics
- Trigram baseline vs. LSTM: pitch perplexity ~10 vs. trigram ~498, showing stronger sequence modeling.
- Preprocessing
- Keep only arpeggio-like sequences: durations in {1,2,4}, intervals ≤24 semitones, min length 32; apply the same transposition augmentation.
- Save processed data and vocab to
Task2_preprocessed/.
- Training & Generation
- Train pitch/duration LSTMs (same hyperparams as Task 1) with constrained decoding:
- nucleus/top-k sampling (
top_p=0.9,top_k=50), n-gram repeat blocking, repeat-threshold control; - progressive relaxation of duration set and top-p if sampling fails.
- nucleus/top-k sampling (
- Outputs
Task2_outputs/generated_music.mid.
- Train pitch/duration LSTMs (same hyperparams as Task 1) with constrained decoding:
- Baselines & Metrics
- Against n-gram baselines: typical test perplexity ~9.6 (pitch) and ~1.2 (duration), outperforming unstructured baselines.
- Faster runs: lower
NUM_EPOCHS,HIDDEN_SIZE, batch size, or subsample MIDI files. - More diversity: raise
TEMPERATURE,TOP_P, orTOP_K; if quality drops, tighten n-gram/repeat constraints. - Longer pieces: raise
MAX_GENERATION_LENGTH, noting stability limits on very long sequences.
- Classical piano MIDI dataset (Kaggle, sourced from piano-midi.de, 19 composers).
- Related work: Melissa Jalali Monfared’s LSTM example; Sulun et al. (2022) transformer with continuous emotion conditioning.