Skip to content

Kriiiiss/ML_Music

Repository files navigation

ML_Music / Assignment 2

CSE153 music generation assignment with two sub-tasks:

  • Task 1: Unconditioned symbolic generation — LSTM models for pitch and duration trained on classical piano MIDI.
  • Task 2: Conditioned symbolic generation — builds on Task 1 with arpeggio-focused filtering and constrained decoding.

Data Setup

  1. Place classical piano MIDI files under dataset/ (notebook reads dataset/**/*.mid).
  2. Adjust preprocessing constants in the notebook if needed (e.g., MIN_SEQ_LEN, MAX_SEQ_LEN, TRANSPOSE_RANGE).

Task 1: Unconditioned Workflow

  1. Preprocessing (Task1 section in notebook)
    • Parse MIDI, quantize to 1/16 notes, drop short sequences, split at length 128, augment with ±3/±6 semitone transposition.
    • Build pitch_to_id / duration_to_id; save to Task1_preprocessed/.
  2. EDA
    • Run EDA cells for pitch-class distribution, duration distribution, entropy, length histograms, and summary stats.
  3. Training & Generation
    • Train separate pitch/duration LSTMs (embed 128, hidden 256, 2 layers, dropout 0.3, label smoothing 0.1).
    • Best models (by val perplexity) saved to Task1_outputs/; generates generated_music.mid.
  4. Baselines & Metrics
    • Trigram baseline vs. LSTM: pitch perplexity ~10 vs. trigram ~498, showing stronger sequence modeling.

Task 2: Arpeggio-Conditioned Workflow

  1. Preprocessing
    • Keep only arpeggio-like sequences: durations in {1,2,4}, intervals ≤24 semitones, min length 32; apply the same transposition augmentation.
    • Save processed data and vocab to Task2_preprocessed/.
  2. Training & Generation
    • Train pitch/duration LSTMs (same hyperparams as Task 1) with constrained decoding:
      • nucleus/top-k sampling (top_p=0.9, top_k=50), n-gram repeat blocking, repeat-threshold control;
      • progressive relaxation of duration set and top-p if sampling fails.
    • Outputs Task2_outputs/generated_music.mid.
  3. Baselines & Metrics
    • Against n-gram baselines: typical test perplexity ~9.6 (pitch) and ~1.2 (duration), outperforming unstructured baselines.

Tunable Knobs

  • Faster runs: lower NUM_EPOCHS, HIDDEN_SIZE, batch size, or subsample MIDI files.
  • More diversity: raise TEMPERATURE, TOP_P, or TOP_K; if quality drops, tighten n-gram/repeat constraints.
  • Longer pieces: raise MAX_GENERATION_LENGTH, noting stability limits on very long sequences.

References & Data

  • Classical piano MIDI dataset (Kaggle, sourced from piano-midi.de, 19 composers).
  • Related work: Melissa Jalali Monfared’s LSTM example; Sulun et al. (2022) transformer with continuous emotion conditioning.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors