The dataset used for training, validating and testing the deep neural network, is formatted like the following:
- file: Path to the audio file.
- audio['array']: The actual audio data as a numpy array.
- audio['sampling_rate']: The sampling rate of the audio file (16KHz).
- label: The label for the audio file (e.g. a specific command like "yes", "no", etc.).
- is_unknown: Boolean flag indicating wether the sample is unknown.
- speaker_id: The speakers ID who recordede the sample.
- utterance_id: The ID of the specific utterance of that speaker.
Follow these steps to set up the project locally:
-
Clone the repository:
git clone https://github.com/petterr-n/255-Project.git -
Navigate to the project directory:
cd your-repository-name -
Set up a virtual environment:
python3 -m venv venv -
Activate the virtual environment:
-
On macOS/Linux:
source venv/bin/activate
-
-
Install dependencies:
pip install -r requirements.txt
To run the project:
-
Activate the virtual environment (if not already active) (Steg 4).
-
Run the main script:
python main.py