Skip to content

ExaggeratedRumors/demooder

Repository files navigation

Demooder

Android support Compose Model 1.0

logo

Android application using input sound to recognize voice.

Release

v1.0

Preview

preview

Technologies

Application

  • Kotlin 2.1.21
  • Java 11
  • Android SDK 34
  • Gradle 8.9.3
  • Kotlin Multiplatform 2.1.21
  • Jetpack Compose 1.6.10
  • LiteRT 1.3.0

Machine Learning / Scripts

  • Python 3.8.20
  • Numpy 1.23.3
  • TensorFlow 2.11.0
  • Keras 2.13.1
  • cuda 11.2
  • cuDNN 8.1.0

Project structure

preview

Data processing execution

  1. Clone repository:
https://github.com/ExaggeratedRumors/demooder.git
  1. Download AudioWav data: Download from Kaggle.
  2. Unzip Wav files in the processing/data_audio directory.
  3. [optional] Run data augmentation task (use gradlew.bat on Windows):
./gradlew :processing:dataAugmentation
  1. Run create spectrograms task (use gradlew.bat on Windows):
./gradlew :processing:createSpectrograms

Output spectrograms are saved in data/spectrograms directory.

Sound data

Audio data augmentation

  1. Audio data augmentation: about audio data augmentation.
  2. Gaussian noise.
  3. Time stretching.

Sound signal processing

  1. Read WAV files according to the header scheme: wav file format.
  2. Audio signal resampling (check whether the signal is big-endian or little-endian): about resampling.
  3. Optional gaussian noise reducing.
  4. Convert byte data to complex.
  5. Signal windowing: about windowing.
  6. Use Short-Time Fourier Transform (STFT): about STFT, about FFT.
  7. Optional filtering by A-weighting or C-weighting: about weighting.
  8. Optional converting signal to decibels and lower spectrum.

Predicting in JVM

  1. Read classifier model.
  2. Record voice signal.
  3. Save as WAV file.
  4. Down-sampling signal from 48000Hz to 16000Hz: about resampling.
  5. Convert byte data to complex.
  6. Signal windowing and filter by weighting.
  7. Predict.

Predicting in Android

  1. Convert TF model to ONNX. The model is converted to ONNX to leverage the ONNX Runtime for efficient, cross-platform inference on Android devices.
  2. Record voice signal.
  3. Downsample signal from 48000Hz to 16000Hz.
  4. Convert byte data to complex.
  5. Convert signal to spectrogram bitmap.
  6. Predict.

Visualizing

  1. Read data.
  2. Use FFT.
  3. Convert FFt to spectral amplitude.
  4. Convert to octave/thirds bands: about octave to third conversion.
  5. Filter by A-weighting or C-weighting.

Additional requirements

  1. CUDA for training model on GPU (Nvidia graphics cards): download CUDA.
  2. NNAPI for mobile devices environment acceleration: about inference on Android .
  3. Upgrade for Conda environment during DLL initialization error for converter module:
conda install conda-forge::vs2015_runtime

About

Emotions recognizing mobile application and model.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published