Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Enhancements
Refactored metrics computation architecture to eliminate
np.vectorize()overhead by replacing it with a C-level batch processing loop (_metrics_batch()). This provides cleaner code structure and establishes a foundation for future performance optimizations without introducing any performance regression.Fixed double computation bug where
error_handler()was callingmetrics()for validation, then wrapper functions were computing metrics again. Theerror_handler()function is now validation-only, and all metric calculations happen through a single unifiedmetrics()entry point, improving efficiency and code maintainability.Standardized internal metrics return format to row-based
(n, 9)array structure instead of columnar format. This simplifies DataFrame construction insummary()andsummaryp()functions by eliminating complex transpose operations and reducing code complexity.Improved code organization with unified
metrics()router function that dispatches to either single-paircalculations()or batch_metrics_batch()processing, providing a cleaner and more maintainable architecture for metric computation.Updated Pylint configuration to suppress import errors for Cython modules during static analysis and exclude benchmark/development directories from linting. This resolves CI/CD build failures while maintaining code quality standards for the core package.
Optimized Levenshtein distance algorithm in
calculations()function with C-level performance improvements: replacednp.zeros()withnp.empty()to eliminate redundant initialization, moved boundary condition initialization outside the main DP loop to remove conditional branches from the hot path, and replaced Python'smin()function with manual C-level sequential comparisons.Implemented dual-path architecture with fast path optimization for functions that don't require word tracking. Added three new functions (
calculations_fast(),_metrics_batch_fast(),metrics_fast()) that skip word list construction and return float64 arrays instead of object arrays. Updatedwer(),wers(),werp(), andwerps()functions to use the fast path, achieving performance improvement on synthetic benchmarks. Functions requiring word tracking (summary()andsummaryp()) continue using the full path.Bug Fixes
Expanded try/except scope in all wrapper functions (
wer.py,wers.py,werp.py,werps.py,summary.py,summaryp.py) to properly catch exceptions from both validation (error_handler()) and computation (metrics()/metrics_fast()). This fixes 6 pre-existing test failures where invalid input types (e.g., lists of integers) would crash instead of returning None with an error message.Added division-by-zero guards in
calculations_fast()function (wer = (<double>ld) / m if m > 0 else 0.0) and corpus-level wrapper functions (wer.py,werp.py) to prevent crashes on empty input. Also added per-row masked division inwerps.pyto handle cases where individual samples have zero reference length.