Enable STT for #315#330

Open

ynakaga wants to merge 1 commit intointel:masterfrom

jb-balaji:ros2_stt_for_315

ynakaga commented Mar 14, 2025

No description provided.


          Enable STT for intel#315

86fa9cc

jb-balaji requested a review from Copilot

October 7, 2025 13:56

Copilot AI reviewed

View reviewed changes

Copilot AI left a comment

Pull Request Overview

This PR adds a new ROS2 audio processing node that enables Speech-to-Text (STT) functionality using OpenVINO. The implementation processes audio files (MP4 and WAV) and converts them to text using a wav2vec2 model.

Adds complete ROS2 package structure with setup files and linting tests
Implements audio processor node with OpenVINO-based STT processing
Supports multiple audio formats with preprocessing pipeline

Reviewed Changes

Copilot reviewed 7 out of 11 changed files in this pull request and generated 6 comments.

Show a summary per file

File	Description
audio_processor/audio_processor/audio_processor_node.py	Main STT processing node with OpenVINO integration
audio_processor/setup.py	Package configuration with dependencies and entry points
audio_processor/package.xml	ROS2 package metadata with build and runtime dependencies
audio_processor/setup.cfg	Package installation configuration
audio_processor/test/test_pep257.py	PEP 257 docstring compliance test
audio_processor/test/test_flake8.py	Code style linting test
audio_processor/test/test_copyright.py	Copyright header compliance test

_{Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.}

audio_processor/audio_processor/audio_processor_node.py

+              from pydub import AudioSegment
+              from openvino.runtime import Core
+              class AudioProcessorNode(Node):

Copilot AI Oct 7, 2025

The class is missing a docstring. Add a docstring to describe the purpose of this audio processing node and its functionality.

Suggested change

      
            class AudioProcessorNode(Node):
          
            class AudioProcessorNode(Node):
          
                """
          
                A ROS2 node for audio processing that loads an OpenVINO speech-to-text model,
          
                processes audio files (WAV or MP4), performs inference, and publishes the
          
                transcribed text to a ROS2 topic.
          
                Functionality includes:
          
                    - Extracting audio from MP4 or reading WAV files.
          
                    - Preprocessing audio for model input.
          
                    - Running inference using OpenVINO.
          
                    - Postprocessing model output to text.
          
                    - Publishing results to the 'stt_output' topic.
          
                """

Copilot uses AI. Check for mistakes.

audio_processor/audio_processor/audio_processor_node.py

Comment on lines +12 to +16

+                      self.publisher_ = self.create_publisher(String, 'stt_output', 10)
+                      self.ie = Core()
+                      # Load the converted OpenVINO model
+                      # self.model = self.ie.read_model(model='wav2vec2-base/wav2vec2-base.xml')
+                      self.model = self.ie.read_model(model='/root/ros2_ws/audio_processor/audio_processor/wav2vec2-base/wav2vec2-base.xml')

Copilot AI Oct 7, 2025

Hard-coded absolute path makes the code non-portable. Consider using a relative path or making this configurable through a ROS parameter.

Suggested change

      
                    self.publisher_ = self.create_publisher(String, 'stt_output', 10)
          
                    self.ie = Core()
          
                    # Load the converted OpenVINO model
          
                    # self.model = self.ie.read_model(model='wav2vec2-base/wav2vec2-base.xml')
          
                    self.model = self.ie.read_model(model='/root/ros2_ws/audio_processor/audio_processor/wav2vec2-base/wav2vec2-base.xml')
          
                    self.declare_parameter('model_path', 'wav2vec2-base/wav2vec2-base.xml')
          
                    model_path = self.get_parameter('model_path').get_parameter_value().string_value
          
                    self.publisher_ = self.create_publisher(String, 'stt_output', 10)
          
                    self.ie = Core()
          
                    self.model = self.ie.read_model(model=model_path)

Copilot uses AI. Check for mistakes.

audio_processor/audio_processor/audio_processor_node.py

+                  def preprocess_audio(self, audio_data):
+                      # Normalize audio data
+                      audio_data = audio_data / np.max(np.abs(audio_data))

Copilot AI Oct 7, 2025

Division by zero will occur if audio_data contains only zeros. Add a check to prevent division by zero.

Suggested change

      
                    audio_data = audio_data / np.max(np.abs(audio_data))
          
                    max_val = np.max(np.abs(audio_data))
          
                    if max_val == 0:
          
                        self.get_logger().warning("Audio data contains only zeros; skipping normalization to avoid division by zero.")
          
                    else:
          
                        audio_data = audio_data / max_val

Copilot uses AI. Check for mistakes.

audio_processor/audio_processor/audio_processor_node.py

+                  def postprocess_result(self, result):
+                      # Implement postprocessing logic to convert model output to text
+                      return "example text"

Copilot AI Oct 7, 2025

The postprocess_result method returns a placeholder string instead of implementing actual text conversion logic. This should be implemented to properly decode the model output.

Copilot uses AI. Check for mistakes.

audio_processor/audio_processor/audio_processor_node.py

+                  rclpy.init(args=args)
+                  node = AudioProcessorNode()
+                  # Example: Process an audio file
+                  node.process_audio_file('/root/ros2_ws/audio_processor/audio_processor/1089-134686-0001.wav')

Copilot AI Oct 7, 2025

Hard-coded absolute path in main function makes the code non-portable. Consider making this configurable or removing the hard-coded test call.

Copilot uses AI. Check for mistakes.

audio_processor/setup.py

+                  maintainer='yuki',
+                  maintainer_email='yuki.nakagawa@intel.com',
+                  description='Audio processing node for STT using OpenVINO',
+                  license='License declaration',

Copilot AI Oct 7, 2025

Generic license declaration should be replaced with the actual license name (e.g., 'Apache-2.0' to match package.xml).

Suggested change

      
                license='License declaration',
          
                license='Apache-2.0',

Copilot uses AI. Check for mistakes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet