A deep learning project for detecting anomalous sounds (e.g., footsteps, breaking glass, screams) in audio streams using transformer-based models.
Develop a robust system to identify unusual or dangerous sound events in real-world environments using state-of-the-art audio models.
Target Anomalies:
- Footsteps in restricted areas
- Breaking glass
- Human screams/shouts
- Other context-specific anomalies
Kaggle Dataset: AudioAnomalyDataset
- Contains labeled audio samples of:
- Normal sounds (background noise, conversations)
- Anomalous events (breaking glass, screams, footsteps)
- Format:
.wavfiles at 16kHz sampling rate
Before running the project, ensure you have the following installed:
- Python 3.10 or higher
- CUDA and cuDNN (if using GPU)
- Git
- Conda (Optional)
Clone the project repository to your local machine:
git clone https://github.com/AlinaShapiro/Sound_Anomaly_Detection.git
cd Sound_Anomaly_Detectionconda create -n sound_anomaly_detection python=3.10
conda activate sound_anomaly_detectionpip install -r requirements.txtTo run inference on a test subset of Audio-Anomaly-Dataset, use:
python inference.py --model_name wav2vec2This project can be applied to both Security Systems and Smart Home Automation, as sound anomaly detection can expand the capabilities of traditional security systems. These systems often rely on motion sensors or magnetic contacts to detect intruders, but sound anomaly detection allows them to detect sounds like broken glass, footsteps, or quiet speech, which may indicate an intruder. This is particularly useful for large spaces where conventional sensors may not be able to detect intruders effectively.