Skip to content

Learn how to use Speech To Text APIs form GCloud, AWS and Assembly

Notifications You must be signed in to change notification settings

EddyArellanes/stt-cloud-learning

Repository files navigation

STT Cloud Learning

🎯 Objective: Learn how to record audio in Web and send to Machine Learning Cloud Approachs in order to transform Speech to text 📖 TO DO:

  • Grab the Record from the index.html and send to GCloud Speech to text and receive the Text, then Show it
  • Grab the Record from the index.html and send to AWS Transcribe and receive the Text, then Show it
  • Make a little refactor and use Vue3 for the Interface to show the notes.

Getting Started

Run

npm i
npm run start:dev

to launch index.html and go to http://localhost:3000/*

The Snippet about how to Get Permission of Mic and Start/Stop was taken from: CodePen

The Approach about how to send in correct way the BLOB to Backend was taken from here: Fetch API upload file

The Documentation to Remember how to use the Multer Interceptor from Express and File Interceptor from Nest File Upload Nest

Enabling GCloud API

You must have an GCloud Account with billing enabled Enter here: Enable API

Then go to Cloud Speech-to-Text API

And you should be the api enabled: API Cloud

Also Remember the prices Free Tier: Till 60 minutes/month Level 1: USD 0.024/Minute

Create ServiceAccount and get a JSON Credential

API Cloud

To Test locally, export as Variable:

export GOOGLE_APPLICATION_CREDENTIALS=/home/eddyalien/webpage-9a255-c6e8df3d5c82.json

For this Test I have tested with 'es-MX' language. The response of the API is the following:

{
   "results":[
      {
         "alternatives":[
            {
               "words":[
                  
               ],
               "transcript":"Okay vamos a probar esto no tiene más de 5 segundos",
               "confidence":0.9308170676231384
            }
         ],
         "channelTag":0,
         "resultEndTime":{
            "seconds":"5",
            "nanos":50000000
         },
         "languageCode":"es-us"
      }
   ],
   "totalBilledTime":{
      "seconds":"15",
      "nanos":0
   }
}

As I can see, minimum Billing GCloud takes is 15 seconds yet if the audio is 5, for example.

Finally, Google Cloud Documentation is a little bit spreat for many places. Most useful to fix my issues was here: google.cloud.speech.v1

And a little introduction but useful Converting speech to text with Node.js

About

Learn how to use Speech To Text APIs form GCloud, AWS and Assembly

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published