-
Notifications
You must be signed in to change notification settings - Fork 0
multi_speech_recognition
Every robot with human interaction needs speech recognition for natural interaction. This module aims to give flexibility and facilitate the use of speech recognition.
First of all, the currently available ASRs are:
- Google ASR (performs really well and returns a list of transcripts with a confidence value, but requires internet connection)
- Picovoice's ASR: "Cheetah" (works offline)
- CMU Sphinx (works offline)
Besides these ASR methods, others can be used (like vocon ASR or Mythun's work based on grammars) and added to this interface by creating a node for it (preferably located in .../isr_monarch_robot/mbot_speech_recognition/[name of your ASR]) with a interface through topics or Ros service and add this to the Multiple Automatic Speech Recognition Interface (MASRI) by including it in the package mbot_speech_recognition, node_speech_node.py. Should be easy to do this.
This MASRI is a node that when launched gives you access to all ASRs (note that they should also be launched) through a rosservice:
rosservice call /mbot_speech_recognition/recognize_speech "{audio_path: '', asr_method: '', time: 0.0, delete: false, timeout: 0.0}"
Otherwise, for state machines and more user friendly one can use Mbot_Class to call for a speech recognition. A function in the HRI component was added named speech_to_text was added.
First of all you need to launch the asr you want. To launch google or cheetah you can run the command:
roslaunch mbot_speech_recognition mbot_asr.launch google:=true cheetah:=true
where google and cheetah arguments control if these ASRs are launched. If they are false the corresponding ASR is not launched. By default and if no argument is given, this launch file only launches the google ASR. You can also launch directly the ASRs from their individual packages. CMU Sphinx does not require any node running to work.
When you have your ASR running and awaiting requests you can launch the MASRI:
rl mbot_speech_recognition mbot_speech_recognition.launch
or, to launch the MASRI and the ASR you want at the same time you can type:
rl mbot_speech_recognition mbot_speech_recognition.launch google:=true
or
rl mbot_speech_recognition mbot_speech_recognition.launch cheetah:=true
You can even launch both if you use both arguments at true.
Now you can use the rosservice or mbot class interface to request a speech recognition. To use the mbot class interface, in the terminal type:
mbot_class
This will open the interface for mbot_class where you can use the command to request a speech recognition:
mbot.hri.speech_to_text()
This can also be used in mbot class instances when writing your state or state machine. You can alter the arguments of this function as mentioned in the description: