Conversation
There was a problem hiding this comment.
Pull Request Overview
This PR adds a new ROS2 audio processing node that enables Speech-to-Text (STT) functionality using OpenVINO. The implementation processes audio files (MP4 and WAV) and converts them to text using a wav2vec2 model.
- Adds complete ROS2 package structure with setup files and linting tests
- Implements audio processor node with OpenVINO-based STT processing
- Supports multiple audio formats with preprocessing pipeline
Reviewed Changes
Copilot reviewed 7 out of 11 changed files in this pull request and generated 6 comments.
Show a summary per file
| File | Description |
|---|---|
| audio_processor/audio_processor/audio_processor_node.py | Main STT processing node with OpenVINO integration |
| audio_processor/setup.py | Package configuration with dependencies and entry points |
| audio_processor/package.xml | ROS2 package metadata with build and runtime dependencies |
| audio_processor/setup.cfg | Package installation configuration |
| audio_processor/test/test_pep257.py | PEP 257 docstring compliance test |
| audio_processor/test/test_flake8.py | Code style linting test |
| audio_processor/test/test_copyright.py | Copyright header compliance test |
Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.
| from pydub import AudioSegment | ||
| from openvino.runtime import Core | ||
|
|
||
| class AudioProcessorNode(Node): |
There was a problem hiding this comment.
The class is missing a docstring. Add a docstring to describe the purpose of this audio processing node and its functionality.
| class AudioProcessorNode(Node): | |
| class AudioProcessorNode(Node): | |
| """ | |
| A ROS2 node for audio processing that loads an OpenVINO speech-to-text model, | |
| processes audio files (WAV or MP4), performs inference, and publishes the | |
| transcribed text to a ROS2 topic. | |
| Functionality includes: | |
| - Extracting audio from MP4 or reading WAV files. | |
| - Preprocessing audio for model input. | |
| - Running inference using OpenVINO. | |
| - Postprocessing model output to text. | |
| - Publishing results to the 'stt_output' topic. | |
| """ |
| self.publisher_ = self.create_publisher(String, 'stt_output', 10) | ||
| self.ie = Core() | ||
| # Load the converted OpenVINO model | ||
| # self.model = self.ie.read_model(model='wav2vec2-base/wav2vec2-base.xml') | ||
| self.model = self.ie.read_model(model='/root/ros2_ws/audio_processor/audio_processor/wav2vec2-base/wav2vec2-base.xml') |
There was a problem hiding this comment.
Hard-coded absolute path makes the code non-portable. Consider using a relative path or making this configurable through a ROS parameter.
| self.publisher_ = self.create_publisher(String, 'stt_output', 10) | |
| self.ie = Core() | |
| # Load the converted OpenVINO model | |
| # self.model = self.ie.read_model(model='wav2vec2-base/wav2vec2-base.xml') | |
| self.model = self.ie.read_model(model='/root/ros2_ws/audio_processor/audio_processor/wav2vec2-base/wav2vec2-base.xml') | |
| self.declare_parameter('model_path', 'wav2vec2-base/wav2vec2-base.xml') | |
| model_path = self.get_parameter('model_path').get_parameter_value().string_value | |
| self.publisher_ = self.create_publisher(String, 'stt_output', 10) | |
| self.ie = Core() | |
| self.model = self.ie.read_model(model=model_path) |
|
|
||
| def preprocess_audio(self, audio_data): | ||
| # Normalize audio data | ||
| audio_data = audio_data / np.max(np.abs(audio_data)) |
There was a problem hiding this comment.
Division by zero will occur if audio_data contains only zeros. Add a check to prevent division by zero.
| audio_data = audio_data / np.max(np.abs(audio_data)) | |
| max_val = np.max(np.abs(audio_data)) | |
| if max_val == 0: | |
| self.get_logger().warning("Audio data contains only zeros; skipping normalization to avoid division by zero.") | |
| else: | |
| audio_data = audio_data / max_val |
|
|
||
| def postprocess_result(self, result): | ||
| # Implement postprocessing logic to convert model output to text | ||
| return "example text" |
There was a problem hiding this comment.
The postprocess_result method returns a placeholder string instead of implementing actual text conversion logic. This should be implemented to properly decode the model output.
| rclpy.init(args=args) | ||
| node = AudioProcessorNode() | ||
| # Example: Process an audio file | ||
| node.process_audio_file('/root/ros2_ws/audio_processor/audio_processor/1089-134686-0001.wav') |
There was a problem hiding this comment.
Hard-coded absolute path in main function makes the code non-portable. Consider making this configurable or removing the hard-coded test call.
| maintainer='yuki', | ||
| maintainer_email='yuki.nakagawa@intel.com', | ||
| description='Audio processing node for STT using OpenVINO', | ||
| license='License declaration', |
There was a problem hiding this comment.
Generic license declaration should be replaced with the actual license name (e.g., 'Apache-2.0' to match package.xml).
| license='License declaration', | |
| license='Apache-2.0', |
No description provided.