Conversation
|
Warning Rate limit exceeded@JarbasAl has exceeded the limit for the number of commits or files that can be reviewed per hour. Please wait 14 minutes and 28 seconds before requesting another review. ⌛ How to resolve this issue?After the wait time has elapsed, a review can be triggered using the We recommend that you space out your commits to avoid hitting the rate limit. 🚦 How do rate limits work?CodeRabbit enforces hourly rate limits for each developer per organization. Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout. Please see our FAQ for further information. 📒 Files selected for processing (1)
WalkthroughThe codebase refactors dialog normalization by delegating all normalization logic from the main transformer class to a new utility module. The new module implements comprehensive, language-aware normalization, including number, date, time, contraction, title, and unit expansion. Dependencies are updated to support these features. Changes
Sequence Diagram(s)sequenceDiagram
participant User
participant DialogNormalizerTransformer
participant normalize (util.py)
participant External Libraries
User->>DialogNormalizerTransformer: transform(dialog, context)
DialogNormalizerTransformer->>normalize (util.py): normalize(dialog, lang)
normalize (util.py)->>External Libraries: parse/pronounce numbers, dates, times, units
normalize (util.py)-->>DialogNormalizerTransformer: normalized_text
DialogNormalizerTransformer-->>User: normalized_text, context
Estimated code review effort🎯 4 (Complex) | ⏱️ ~40 minutes Assessment against linked issues
Poem
✨ Finishing Touches
🧪 Generate unit tests
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
SupportNeed help? Create a ticket on our support page for assistance with any issues or questions. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
Docstrings generation was requested by @JarbasAl. * #2 (comment) The following files were modified: * `ovos_dialog_normalizer_plugin/__init__.py` * `ovos_dialog_normalizer_plugin/util.py`
|
Note Generated docstrings for this pull request at #3 |
There was a problem hiding this comment.
Actionable comments posted: 4
🧹 Nitpick comments (4)
requirements.txt (1)
2-5: Consider pinning versions for new dependenciesThe newly added dependencies
langcodesandunicode_rbnfdon't have version constraints. To ensure reproducible builds and avoid unexpected breaking changes, consider adding version constraints.ovos-plugin-manager -langcodes +langcodes>=3.3.0 ovos-number-parser>=0.4.0 ovos-date-parser>=0.6.4a1 -unicode_rbnf +unicode_rbnf>=1.1.0ovos_dialog_normalizer_plugin/util.py (3)
14-186: Add contractions for other supported languagesThe CONTRACTIONS dictionary only includes English entries, but the module claims to support multiple languages (pt, es, fr, de, etc.). Consider adding common contractions for these languages to ensure consistent normalization across all supported languages.
Would you like me to help generate common contractions for Portuguese, Spanish, French, and other supported languages?
500-506: Document year disambiguation logicThe 2-digit year disambiguation logic assumes years 00-29 map to 2000-2029 and 30-99 map to 1930-1999. This assumption may not be appropriate for all use cases and could lead to incorrect date parsing.
Consider making this configurable or documenting this behavior clearly in the function docstring:
def _normalize_dates_and_times(text: str, full_lang: str, date_format: str = "DMY") -> str: """ Helper function to normalize dates and times using regular expressions. This prepares the strings for pronunciation. + + Note: 2-digit years are expanded as follows: + - 00-29 -> 2000-2029 + - 30-99 -> 1930-1999 """
1-719: Consider future modularization for maintainabilityThis utility module is comprehensive but quite large (700+ lines). As the normalization features grow, consider splitting it into smaller, focused modules:
contractions.py- Language-specific contraction mappingsnumbers.py- Number and fraction normalizationdatetime.py- Date and time normalizationunits.py- Unit conversion and normalizationThis would improve maintainability and make it easier to add language-specific features.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (3)
ovos_dialog_normalizer_plugin/__init__.py(2 hunks)ovos_dialog_normalizer_plugin/util.py(1 hunks)requirements.txt(1 hunks)
🧰 Additional context used
🪛 Ruff (0.12.2)
ovos_dialog_normalizer_plugin/util.py
150-150: Dictionary key literal "shan't" repeated
Remove repeated key literal "shan't"
(F601)
712-712: f-string without any placeholders
Remove extraneous f prefix
(F541)
🔇 Additional comments (1)
ovos_dialog_normalizer_plugin/__init__.py (1)
6-27: Clean refactoring with good separation of concerns!The extraction of normalization logic to a dedicated utility module improves maintainability and makes the transformer class focused on its plugin responsibilities. Error handling and logging are properly preserved.
Docstrings generation was requested by @JarbasAl. * #2 (comment) The following files were modified: * `ovos_dialog_normalizer_plugin/__init__.py` * `ovos_dialog_normalizer_plugin/util.py` Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>
There was a problem hiding this comment.
Actionable comments posted: 1
♻️ Duplicate comments (3)
ovos_dialog_normalizer_plugin/util.py (3)
12-12: Use a more specific logger nameThe logger name "normalize" is too generic and could conflict with other modules. Consider using a name that includes the package namespace.
150-150: Remove duplicate dictionary keyThe key
"shan't"is already defined at line 51. This duplicate at line 150 will override the previous value.
772-772: Remove unnecessary f-string prefixThis string doesn't contain any placeholders, so the
fprefix is not needed.
🧹 Nitpick comments (3)
ovos_dialog_normalizer_plugin/util.py (3)
362-378: Consider using locale information for separator detectionWhile the current implementation works for common languages, consider using Python's
localemodule or thebabellibrary for more comprehensive and accurate locale-specific number formatting.Example using babel:
from babel import Locale def _get_number_separators(full_lang: str) -> tuple[str, str]: try: locale = Locale.parse(full_lang.replace('-', '_')) decimal_separator = locale.number_symbols.get('decimal', '.') thousands_separator = locale.number_symbols.get('group', ',') return decimal_separator, thousands_separator except Exception: # Fallback to current implementation lang_code = full_lang.split("-")[0] if lang_code in ["pt", "es", "fr", "de"]: return ',', '.' return '.', ','
531-534: Document the year expansion logicThe 2-digit year expansion logic assumes years 00-29 map to 2000-2029 and 30-99 map to 1930-1999. This assumption should be documented and might need to be configurable in the future.
# Expand 2-digit year to 4-digit year if year < 100: - # Assume years 00-29 are 2000-2029, 30-99 are 1930-1999 + # Assume years 00-29 are 2000-2029, 30-99 are 1930-1999 + # TODO: Consider making this cutoff configurable or date-aware year = 2000 + year if year < 30 else 1900 + year
1-779: Successfully ported normalization with improvementsThis implementation successfully ports the normalization functionality from the phoonnx project while adding comprehensive error handling, multi-language support, and modular design. The code is well-structured and maintains good separation of concerns.
Consider adding unit tests to ensure the normalization behavior remains consistent across updates, especially for edge cases like 2-digit year expansion and locale-specific number formatting.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
ovos_dialog_normalizer_plugin/__init__.py(2 hunks)ovos_dialog_normalizer_plugin/util.py(1 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
- ovos_dialog_normalizer_plugin/init.py
🧰 Additional context used
🪛 Ruff (0.12.2)
ovos_dialog_normalizer_plugin/util.py
150-150: Dictionary key literal "shan't" repeated
Remove repeated key literal "shan't"
(F601)
772-772: f-string without any placeholders
Remove extraneous f prefix
(F541)
🔇 Additional comments (5)
ovos_dialog_normalizer_plugin/util.py (5)
188-359: Well-structured multi-language dictionariesThe TITLES and UNITS dictionaries provide comprehensive coverage for multiple languages with clear organization.
381-436: Robust number normalization with proper error handlingThe implementation correctly handles various number formats, locale-specific separators, and provides appropriate fallback mechanisms with error logging.
440-467: Clean date/time pronunciation interfacesBoth functions provide appropriate abstractions with good error handling, especially the fallback mechanism in
pronounce_time.
563-572: Simple and effective hyphen-digit normalizationThe regex pattern correctly handles word-hyphen-digit combinations.
693-729: Well-orchestrated normalization pipelineThe main normalize function properly coordinates all normalization steps in a logical order with appropriate error handling for RBNF engine initialization.
Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>
|
Note Unit test generation is an Early Access feature. Expect some limitations and changes as we gather feedback and continue to improve it. Generating unit tests... This may take up to 20 minutes. |
There was a problem hiding this comment.
Actionable comments posted: 1
♻️ Duplicate comments (1)
ovos_dialog_normalizer_plugin/util.py (1)
484-484: Remove unnecessary f-string prefixThis string doesn't contain any placeholders, so the
fprefix is not needed.- print(f"\n--- Other Examples ---") + print("\n--- Other Examples ---")
🧹 Nitpick comments (3)
setup.py (1)
69-70: Consider being more selective with package data inclusionIncluding all files from the plugin directory might package unnecessary files (e.g.,
__pycache__,.pycfiles). Consider filtering by file extensions or using a MANIFEST.in file.def package_files(directory): paths = [] for (path, _, filenames) in os.walk(directory): for filename in filenames: - paths.append(os.path.join(path, filename)) + # Only include JSON files and other necessary resources + if filename.endswith(('.json', '.txt', '.yml', '.yaml')): + paths.append(os.path.join(path, filename)) return pathsAlternatively, use a MANIFEST.in file for better control over included files.
ovos_dialog_normalizer_plugin/util.py (2)
79-83: Consider implementing the TODO: Move separator logic to locale JSON filesThe hardcoded language-specific separator logic could be moved to JSON files for better maintainability and extensibility.
Would you like me to help implement this by:
- Creating a JSON structure for number format configurations
- Updating the LocaleDataManager to load this data
- Refactoring
_get_number_separatorsto use the JSON dataThis would make it easier to add support for new languages without modifying code.
303-337: Cache compiled regex patterns for better performanceThe regex patterns are compiled on every function call. For better performance, especially when processing multiple texts, consider caching the compiled patterns.
# Add to LocaleDataManager or create a separate cache class RegexCache: def __init__(self): self._cache = {} def get_units_regex(self, lang_code, separator_info): cache_key = (lang_code, separator_info) if cache_key not in self._cache: # Build and compile patterns self._cache[cache_key] = self._build_units_patterns(lang_code, separator_info) return self._cache[cache_key]This would significantly improve performance when normalizing multiple texts in the same language.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (20)
.github/workflows/unit_tests.yml(1 hunks)ovos_dialog_normalizer_plugin/locale/ca/titles.json(1 hunks)ovos_dialog_normalizer_plugin/locale/de/titles.json(1 hunks)ovos_dialog_normalizer_plugin/locale/de/units.json(1 hunks)ovos_dialog_normalizer_plugin/locale/en/contractions.json(1 hunks)ovos_dialog_normalizer_plugin/locale/en/titles.json(1 hunks)ovos_dialog_normalizer_plugin/locale/en/units.json(1 hunks)ovos_dialog_normalizer_plugin/locale/es/titles.json(1 hunks)ovos_dialog_normalizer_plugin/locale/es/units.json(1 hunks)ovos_dialog_normalizer_plugin/locale/fr/titles.json(1 hunks)ovos_dialog_normalizer_plugin/locale/fr/units.json(1 hunks)ovos_dialog_normalizer_plugin/locale/gl/titles.json(1 hunks)ovos_dialog_normalizer_plugin/locale/it/titles.json(1 hunks)ovos_dialog_normalizer_plugin/locale/nl/titles.json(1 hunks)ovos_dialog_normalizer_plugin/locale/pt/titles.json(1 hunks)ovos_dialog_normalizer_plugin/locale/pt/units.json(1 hunks)ovos_dialog_normalizer_plugin/util.py(1 hunks)requirements.txt(1 hunks)setup.py(2 hunks)tests/__init__.py(1 hunks)
✅ Files skipped from review due to trivial changes (17)
- ovos_dialog_normalizer_plugin/locale/ca/titles.json
- tests/init.py
- ovos_dialog_normalizer_plugin/locale/es/titles.json
- ovos_dialog_normalizer_plugin/locale/de/titles.json
- ovos_dialog_normalizer_plugin/locale/en/titles.json
- ovos_dialog_normalizer_plugin/locale/nl/titles.json
- ovos_dialog_normalizer_plugin/locale/it/titles.json
- ovos_dialog_normalizer_plugin/locale/es/units.json
- ovos_dialog_normalizer_plugin/locale/fr/units.json
- ovos_dialog_normalizer_plugin/locale/de/units.json
- ovos_dialog_normalizer_plugin/locale/fr/titles.json
- ovos_dialog_normalizer_plugin/locale/gl/titles.json
- ovos_dialog_normalizer_plugin/locale/pt/units.json
- ovos_dialog_normalizer_plugin/locale/en/units.json
- ovos_dialog_normalizer_plugin/locale/pt/titles.json
- .github/workflows/unit_tests.yml
- ovos_dialog_normalizer_plugin/locale/en/contractions.json
🚧 Files skipped from review as they are similar to previous changes (1)
- requirements.txt
🧰 Additional context used
🪛 Ruff (0.12.2)
setup.py
53-53: Loop control variable directories not used within loop body
Rename unused directories to _directories
(B007)
ovos_dialog_normalizer_plugin/util.py
484-484: f-string without any placeholders
Remove extraneous f prefix
(F541)
🔇 Additional comments (1)
ovos_dialog_normalizer_plugin/util.py (1)
236-240: Review 2-digit year expansion logicThe current logic assumes years 00-29 map to 2000-2029 and 30-99 map to 1930-1999. Since we're in 2025, this might misinterpret recent dates.
Consider making the cutoff year configurable or using a sliding window approach. For example, dates within 50 years in the past and 30 years in the future from the current year.
# Example sliding window approach current_year = datetime.now().year if year < 100: # Assume within -50 to +30 years from current year century = (current_year // 100) * 100 candidate_year = century + year if candidate_year > current_year + 30: candidate_year -= 100 elif candidate_year < current_year - 50: candidate_year += 100 year = candidate_year
|
✅ UTG Post-Process Complete No new issues were detected in the generated code and all check runs have completed. The unit test generation process has completed successfully. |
|
Creating a PR to put the unit tests in... The changes have been created in this pull request: View PR |
|
Note Unit test generation is an Early Access feature. Expect some limitations and changes as we gather feedback and continue to improve it. Generating unit tests... This may take up to 20 minutes. |
|
✅ UTG Post-Process Complete No new issues were detected in the generated code and all check runs have completed. The unit test generation process has completed successfully. |
|
Creating a PR to put the unit tests in... The changes have been created in this pull request: View PR |
closes #1
Summary by CodeRabbit
New Features
Chores