Shouting Voice Detection

Frame-level shouting detection for music or speech audio using librosa for feature extraction and PyTorch Lightning for training. The project targets Apple Silicon (MPS) but works on CPU-only setups as well.

Features

16 kHz resampling and mel-spectrogram features via ShoutingVoiceFrameDataset.
Lightweight CNN LightningModule (ShoutingVoiceFrameCNN) with BCE loss and accuracy logging.
Training CLI (app/train.py) with configurable frame settings, hyperparameters, and deterministic splits.
Inference CLI (app/predict.py) that loads checkpoints and exports [time, probability] arrays.
Visualization CLI (app/visualize.py) overlaying shouting spans on the waveform.
Unit tests covering dataset behavior and CNN forward/training steps.

Project Layout

.
├─ app/
│  ├─ data/                # sample WAV + labels
│  ├─ model/
│  │  ├─ dataset.py        # ShoutingVoiceFrameDataset implementation
│  │  └─ model.py          # ShoutingVoiceFrameCNN LightningModule
│  ├─ train.py             # training entry point
│  ├─ predict.py           # checkpoint-driven inference
│  └─ visualize.py         # waveform + prediction overlay
├─ tests/                  # pytest suite
├─ docker/                 # placeholder for container assets
├─ environment.yml         # conda environment (Apple Silicon-friendly)
├─ requirements.txt        # pip-based dependency lock
├─ Makefile                # setup/format/lint/test helpers
├─ IMPLEMENTATION_PLAN.md  # progress checklist
└─ AGENTS.md               # contributor guidelines

Environment Setup

Option 1: Conda (Recommended for Apple Silicon)

git clone <repo-url>
cd ShoutingVoiceDetection
conda env create -f environment.yml
conda activate svd
python -m pip install --upgrade pip

Option 2: Virtualenv via Makefile

git clone <repo-url>
cd ShoutingVoiceDetection
make setup
source .venv/bin/activate  # after setup completes

Both paths install PyTorch, PyTorch Lightning, librosa, matplotlib, pytest, black, and ruff. The Makefile targets:

make format → black app tests
make lint → ruff check app tests
make test → pytest (PYTHONPATH configured via pytest.ini)

Option 3: Docker (Reproducible Everywhere)

The repo ships with docker/Dockerfile, which creates a slim CPU-only image that already contains Python, system audio libraries, and every dependency from requirements.txt. Use it when you want a guaranteed-clean environment or to run training in CI without managing conda.

Build once from the repo root:

docker build -t svd:cpu -f docker/Dockerfile .

Kick off training inside the container (all flags pass through to app.train):

docker run --rm svd:cpu --max_epochs=5 --batch_size=8

Need an interactive shell for debugging? Override the entrypoint:

docker run -it --entrypoint /bin/bash svd:cpu

Data Requirements

Place short WAV clips under app/data/audio/ and create app/data/labels.csv with:

file,start,end,label
example.wav,0.0,2.0,shouting
example.wav,2.0,4.0,non_vocal

Times are in seconds at 16 kHz. Labels accept string values like shouting, non_vocal, or numeric 0/1.

Dataset Diagnostics

Inspect class balance with the same frame settings you use for training:

python -m app.utils.report_class_balance \
  --labels_csv app/data/labels.csv \
  --audio_dir app/data/audio \
  --frame_duration 1.0 \
  --hop_duration 0.5

This prints how many positive vs. negative frames exist overall and per file.

Visualize the ground-truth spans from labels.csv on top of the waveform:

python -m app.utils.plot_labels \
  --audio app/data/audio/example.wav \
  --labels_csv app/data/labels.csv \
  --output outputs/example_labels.png

The plot highlights shouting intervals (red) and non-vocal intervals (green).

Training

Run on CPU or Apple MPS:

python -m app.train \
  --labels_csv app/data/labels.csv \
  --audio_dir app/data/audio \
  --batch_size 4 \
  --max_epochs 5 \
  --frame_duration 1.0 \
  --hop_duration 0.5

Key flags:

--default_root_dir <dir> (Lightning) if you want checkpoints somewhere other than lightning_logs/svd/.
--sample_rate, --n_mels, --n_fft, --spec_hop_length to tweak the feature extractor.
--num_workers for DataLoaders (set >0 when running outside notebooks).
--log_dir to relocate TensorBoard events and checkpoints (default lightning_logs).

Lightning checkpoints land under lightning_logs/svd/.../checkpoints/epoch=*-step=*.ckpt. Copy or symlink a checkpoint to a stable location (e.g., checkpoints/last.ckpt) for inference.

Visualize Training with TensorBoard

TensorBoard is included in requirements.txt/environment.yml. After any training run, Lightning writes logs under lightning_logs/svd/. Launch TensorBoard from the repo root:

tensorboard --logdir lightning_logs --port 6006

Open http://localhost:6006 to inspect loss curves, metrics, and learning-rate schedules across runs.

Inference

Generate frame probabilities for any WAV:

python -m app.predict \
  checkpoints/last.ckpt \
  app/data/audio/example.wav \
  --output outputs/example_preds.npy \
  --frame_duration 1.0 \
  --hop_duration 0.5 \
  --threshold 0.6

Outputs a NumPy array with shape (num_frames, 2) containing [start_time_sec, probability].

Visualization

Overlay shouting spans on the waveform using the saved predictions:

python -m app.visualize \
  --audio app/data/audio/example.wav \
  --predictions outputs/example_preds.npy \
  --threshold 0.6 \
  --output outputs/example_plot.png

If --output is omitted, the plot displays interactively.

Example Output

Testing & Quality

pytest tests/model -q validates dataset and model components.
make format / make lint keep code style consistent (black + ruff).
For coverage-oriented runs: pytest --cov=app --cov-report=term-missing.

Implementation Progress

Track ongoing work in IMPLEMENTATION_PLAN.md. Major milestones already complete:

Environment setup (conda + Makefile).
Repository skeleton and sample data.
Dataset/model implementations with unit tests.
Training CLI and smoke test.
Inference + visualization pipeline.

Remaining tasks include Dockerization, README screenshots/examples, and CI hooks.

Contributing

See AGENTS.md for contributor expectations:

Use 4-space indentation, snake_case, PascalCase classes.
Run make format lint test before opening a PR.
Keep commits scoped (feat:, fix:, etc.) and link issues with Closes #<id>.
Do not commit large audio datasets or secrets; store them outside git-tracked paths.

Troubleshooting

ModuleNotFoundError: app → ensure you run commands via python -m app.train or set PYTHONPATH=$(pwd).
MPS/Metal errors → rerun with --accelerator cpu or set PYTORCH_ENABLE_MPS_FALLBACK=1.
librosa import issues → confirm the active environment is the one you created via conda/Make.

For more background, refer to shouting_voice_detection_tutorial.md, which mirrors the end-to-end workflow described above. Happy experimenting!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Shouting Voice Detection

Features

Project Layout

Environment Setup

Option 1: Conda (Recommended for Apple Silicon)

Option 2: Virtualenv via Makefile

Option 3: Docker (Reproducible Everywhere)

Data Requirements

Dataset Diagnostics

Training

Visualize Training with TensorBoard

Inference

Visualization

Example Output

Testing & Quality

Implementation Progress

Contributing

Troubleshooting

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 20 Commits
.github/workflows		.github/workflows
app		app
docker		docker
media		media
tests		tests
.gitignore		.gitignore
AGENTS.md		AGENTS.md
IMPLEMENTATION_PLAN.md		IMPLEMENTATION_PLAN.md
Makefile		Makefile
README.md		README.md
environment.yml		environment.yml
pytest.ini		pytest.ini
requirements.txt		requirements.txt
shouting_voice_detection_tutorial.md		shouting_voice_detection_tutorial.md

Folders and files

Latest commit

History

Repository files navigation

Shouting Voice Detection

Features

Project Layout

Environment Setup

Option 1: Conda (Recommended for Apple Silicon)

Option 2: Virtualenv via Makefile

Option 3: Docker (Reproducible Everywhere)

Data Requirements

Dataset Diagnostics

Training

Visualize Training with TensorBoard

Inference

Visualization

Example Output

Testing & Quality

Implementation Progress

Contributing

Troubleshooting

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages