Audio Converter for Custom Acoustic Model

This programs provides a few functions that can helps us customize a custom acoustic model. The audio data recommendations that we cover in this poroject are: Full Audio Recommendations

All audio files in the data set should be stored in the WAV (RIFF) audio format.
The audio must have a sampling rate of 8 kHz or 16 kHz and the sample values should be stored as uncompressed PCM 16-bit signed integers (shorts).
Only single channel (mono) audio files are supported.
The audio files must be between 100ms and 1 minute in length. Each audio file should ideally start and end with at least 100ms of silence, and somewhere between 500ms and 1 second is common.
If you have background noise in your data, it is recommended to also have some examples with longer segments of silence, e.g. a few seconds, in your data, before and/or after the speech content.

The C# project that helps us with this recommendations uses NAudio, an Open Source .NET library.

The method ConvertMP3toWAV helps us with requirements 1 to 3. Create the wav file monochannel, with 16kHz rate ad a single channel. The parameters are:

mp3File: a valid path with our MP3 file (C://Path/mp3File.mp3)
outputFile: the path where the wav file will be (C://Path/wavFile.wav)

public static void ConvertMP3toWAV(string mp3File, string outputFile)
       {
           using (Mp3FileReader reader = new Mp3FileReader(mp3File))
           {

               var newFormat = new WaveFormat(16000, 16, 1);
               using (var conversionStream = new WaveFormatConversionStream(newFormat, reader))
               {
                   WaveFileWriter.CreateWaveFile(outputFile, conversionStream);
               }

           }

       }

The methods TrimVavFile helps us with recommendation 4, trimming an Audio with a specific start and end. The parameters are:

inPath: a valid path with the WAV file we want to trim (C://PahtURL/wavFile.wav).
outPath: a valid path where the trim wav will be stored (C://PahtURL/wavFileTrim.wav).
cutFromStart: a TimeSpan where the audio will be start. csharp new TimeSpan(0, 0, 0) //Time Span is Hours, Minutes, Seconds

cutFromEnd: a TimeSpan where the audio will end. csharp new TimeSpan(0, 1, 0) //Time Span is Hours, Minutes, Seconds

    public static void TrimWavFile(string inPath, string outPath, TimeSpan cutFromStart, TimeSpan cutFromEnd)
    {
        using (WaveFileReader reader = new WaveFileReader(inPath))
        {
            using (WaveFileWriter writer = new WaveFileWriter(outPath, reader.WaveFormat))
            {
                int bytesPerMillisecond = reader.WaveFormat.AverageBytesPerSecond / 1000;

                int startPos = (int)cutFromStart.TotalMilliseconds * bytesPerMillisecond;
                startPos = startPos - startPos % reader.WaveFormat.BlockAlign;

                int endPos = (int)cutFromEnd.TotalMilliseconds * bytesPerMillisecond;
                endPos = endPos - endPos % reader.WaveFormat.BlockAlign;

                
                TrimWavFile(reader, writer, startPos, endPos);
            }
        }
    }

    private static void TrimWavFile(WaveFileReader reader, WaveFileWriter writer, int startPos, int endPos)
    {
        reader.Position = startPos;
        byte[] buffer = new byte[1024];
        while (reader.Position < endPos)
        {
            int bytesRequired = (int)(endPos - reader.Position);
            if (bytesRequired > 0)
            {
                int bytesToRead = Math.Min(bytesRequired, buffer.Length);
                int bytesRead = reader.Read(buffer, 0, bytesToRead);
                if (bytesRead > 0)
                {
                    writer.WriteData(buffer, 0, bytesRead);
                }
            }
        }
    }

The method Concatenate helps us with recommendation 5, adding sample noise to our audio at the beginning and the end. The parameters are:

outputFile: a valid path where the concatenated wav will be stored.
sourceFiles: a IEnumerable with the paths of the audios. It is important to put the audios in the following order:

Audio noise File
The audio file for the model
Audio noise file In this way, we'll make sure we'll add noise at the beginning and the end.

public static void Concatenate(string outputFile, IEnumerable<string> sourceFiles)
      {
          byte[] buffer = new byte[1024];
          WaveFileWriter waveFileWriter = null;

          try
          {
              foreach (string sourceFile in sourceFiles)
              {
                  using (WaveFileReader reader = new WaveFileReader(sourceFile))
                  {
                      if (waveFileWriter == null)
                      {
                          // first time in create new Writer
                          waveFileWriter = new WaveFileWriter(outputFile, reader.WaveFormat);
                      }
                      else
                      {
                          if (!reader.WaveFormat.Equals(waveFileWriter.WaveFormat))
                          {
                              throw new InvalidOperationException("Can't concatenate WAV Files that don't share the same format");
                          }
                      }

                      int read;
                      while ((read = reader.Read(buffer, 0, buffer.Length)) > 0)
                      {
                          waveFileWriter.WriteData(buffer, 0, read);
                      }
                  }
              }
          }
          finally
          {
              if (waveFileWriter != null)
              {
                  waveFileWriter.Dispose();
              }
          }
      }

References:

Custom Speech Service

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
Properties		Properties
bin/Debug		bin/Debug
obj/Debug		obj/Debug
.gitattributes		.gitattributes
.gitignore		.gitignore
App.config		App.config
AudioConverter.csproj		AudioConverter.csproj
Program.cs		Program.cs
README.md		README.md
packages.config		packages.config

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Audio Converter for Custom Acoustic Model

References:

About

Uh oh!

Releases

Packages

Languages

vianeyja/AudioConverter

Folders and files

Latest commit

History

Repository files navigation

Audio Converter for Custom Acoustic Model

References:

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages