Release v3.2.0 #38

rossarmstrong · 2025-12-15T11:03:51Z

Enhancements

Refactored metrics computation architecture to eliminate np.vectorize() overhead by replacing it with a C-level batch processing loop (_metrics_batch()). This provides cleaner code structure and establishes a foundation for future performance optimizations without introducing any performance regression.
Fixed double computation bug where error_handler() was calling metrics() for validation, then wrapper functions were computing metrics again. The error_handler() function is now validation-only, and all metric calculations happen through a single unified metrics() entry point, improving efficiency and code maintainability.
Standardized internal metrics return format to row-based (n, 9) array structure instead of columnar format. This simplifies DataFrame construction in summary() and summaryp() functions by eliminating complex transpose operations and reducing code complexity.
Improved code organization with unified metrics() router function that dispatches to either single-pair calculations() or batch _metrics_batch() processing, providing a cleaner and more maintainable architecture for metric computation.
Updated Pylint configuration to suppress import errors for Cython modules during static analysis and exclude benchmark/development directories from linting. This resolves CI/CD build failures while maintaining code quality standards for the core package.
Optimized Levenshtein distance algorithm in calculations() function with C-level performance improvements: replaced np.zeros() with np.empty() to eliminate redundant initialization, moved boundary condition initialization outside the main DP loop to remove conditional branches from the hot path, and replaced Python's min() function with manual C-level sequential comparisons.
Implemented dual-path architecture with fast path optimization for functions that don't require word tracking. Added three new functions (calculations_fast(), _metrics_batch_fast(), metrics_fast()) that skip word list construction and return float64 arrays instead of object arrays. Updated wer(), wers(), werp(), and werps() functions to use the fast path, achieving performance improvement on synthetic benchmarks. Functions requiring word tracking (summary() and summaryp()) continue using the full path.

Bug Fixes

Expanded try/except scope in all wrapper functions (wer.py, wers.py, werp.py, werps.py, summary.py, summaryp.py) to properly catch exceptions from both validation (error_handler()) and computation (metrics()/metrics_fast()). This fixes 6 pre-existing test failures where invalid input types (e.g., lists of integers) would crash instead of returning None with an error message.
Added division-by-zero guards in calculations_fast() function (wer = (<double>ld) / m if m > 0 else 0.0) and corpus-level wrapper functions (wer.py, werp.py) to prevent crashes on empty input. Also added per-row masked division in werps.py to handle cases where individual samples have zero reference length.

…ix double computation bug

…re benchmark directories

…/wers/werp/werps functions

…ments, and exception handling fixes

rossarmstrong added 7 commits December 15, 2025 09:40

update version for new developmet branch

cb1dd7d

Add local testing scripts

6973432

Refactor metrics computation to eliminate np.vectorize overhead and f…

9c91210

…ix double computation bug

Update Pylint configuration to suppress Cython import errors and igno…

8c3e334

…re benchmark directories

perf(metrics): optimize Levenshtein DP loop for 7-14% speedup

1abc587

perf(metrics): add fast path without word tracking for speedup in wer…

784072e

…/wers/werp/werps functions

docs(changelog): document dual-path architecture, performance improve…

15a99ea

…ments, and exception handling fixes

rossarmstrong merged commit dbd5c50 into main Dec 15, 2025
11 of 12 checks passed

rossarmstrong deleted the development branch December 15, 2025 11:08

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Release v3.2.0 #38

Release v3.2.0 #38

Uh oh!

rossarmstrong commented Dec 15, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Release v3.2.0 #38

Release v3.2.0 #38

Uh oh!

Conversation

rossarmstrong commented Dec 15, 2025

Enhancements

Bug Fixes

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants