Skip to content

Commit dbd5c50

Browse files
Release v3.2.0 (#38)
* update version for new developmet branch * Add local testing scripts * Refactor metrics computation to eliminate np.vectorize overhead and fix double computation bug * Update Pylint configuration to suppress Cython import errors and ignore benchmark directories * perf(metrics): optimize Levenshtein DP loop for 7-14% speedup * perf(metrics): add fast path without word tracking for speedup in wer/wers/werp/werps functions * docs(changelog): document dual-path architecture, performance improvements, and exception handling fixes
1 parent c6455be commit dbd5c50

File tree

15 files changed

+376
-320
lines changed

15 files changed

+376
-320
lines changed

.gitignore

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -185,3 +185,7 @@ cython_debug/
185185
# Meson build directories
186186
build/
187187
builddir/
188+
189+
# Personal development workspace (local testing only)
190+
development/
191+
!development/README.md

.pylintrc

Lines changed: 10 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -2,7 +2,15 @@
22
max-line-length = 120
33

44
[pylint]
5-
good-names=i,j,m,n,ld,df
5+
good-names=i,j,m,n,ld,df,ref,hyp
66

77
[MESSAGES CONTROL]
8-
disable=too-many-locals
8+
disable=
9+
too-many-locals,
10+
import-error,
11+
duplicate-code
12+
13+
[MASTER]
14+
# Ignore benchmark and development files that use optional dependencies
15+
ignore-paths=
16+
^(benchmarks|development|docs)[\\/].*$

CHANGELOG.md

Lines changed: 27 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -4,6 +4,33 @@
44
This changelog file outlines a chronologically ordered list of the changes made on this project.
55
It is organized by version and release date followed by a list of Enhancements, New Features, Bug Fixes, and/or Breaking Changes.
66

7+
## Version 3.2.0
8+
9+
**Released:** December 15, 2025
10+
**Tag:** v3.2.0
11+
12+
### Enhancements
13+
14+
- Refactored metrics computation architecture to eliminate `np.vectorize()` overhead by replacing it with a C-level batch processing loop (`_metrics_batch()`). This provides cleaner code structure and establishes a foundation for future performance optimizations without introducing any performance regression.
15+
16+
- Fixed double computation bug where `error_handler()` was calling `metrics()` for validation, then wrapper functions were computing metrics again. The `error_handler()` function is now validation-only, and all metric calculations happen through a single unified `metrics()` entry point, improving efficiency and code maintainability.
17+
18+
- Standardized internal metrics return format to row-based `(n, 9)` array structure instead of columnar format. This simplifies DataFrame construction in `summary()` and `summaryp()` functions by eliminating complex transpose operations and reducing code complexity.
19+
20+
- Improved code organization with unified `metrics()` router function that dispatches to either single-pair `calculations()` or batch `_metrics_batch()` processing, providing a cleaner and more maintainable architecture for metric computation.
21+
22+
- Updated Pylint configuration to suppress import errors for Cython modules during static analysis and exclude benchmark/development directories from linting. This resolves CI/CD build failures while maintaining code quality standards for the core package.
23+
24+
- Optimized Levenshtein distance algorithm in `calculations()` function with C-level performance improvements: replaced `np.zeros()` with `np.empty()` to eliminate redundant initialization, moved boundary condition initialization outside the main DP loop to remove conditional branches from the hot path, and replaced Python's `min()` function with manual C-level sequential comparisons.
25+
26+
- Implemented dual-path architecture with fast path optimization for functions that don't require word tracking. Added three new functions (`calculations_fast()`, `_metrics_batch_fast()`, `metrics_fast()`) that skip word list construction and return float64 arrays instead of object arrays. Updated `wer()`, `wers()`, `werp()`, and `werps()` functions to use the fast path, achieving performance improvement on synthetic benchmarks. Functions requiring word tracking (`summary()` and `summaryp()`) continue using the full path.
27+
28+
### Bug Fixes
29+
30+
- Expanded try/except scope in all wrapper functions (`wer.py`, `wers.py`, `werp.py`, `werps.py`, `summary.py`, `summaryp.py`) to properly catch exceptions from both validation (`error_handler()`) and computation (`metrics()`/`metrics_fast()`). This fixes 6 pre-existing test failures where invalid input types (e.g., lists of integers) would crash instead of returning None with an error message.
31+
32+
- Added division-by-zero guards in `calculations_fast()` function (`wer = (<double>ld) / m if m > 0 else 0.0`) and corpus-level wrapper functions (`wer.py`, `werp.py`) to prevent crashes on empty input. Also added per-row masked division in `werps.py` to handle cases where individual samples have zero reference length.
33+
734
## Version 3.1.1
835

936
**Released:** December 14, 2025

docs/source/conf.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -19,7 +19,7 @@
1919
project = "werpy"
2020
copyright = f'{datetime.now().year} <a href="https://www.analyticsinmotion.com">Analytics in Motion</a>'
2121
author = "Ross Armstrong"
22-
release = "3.1.1"
22+
release = "3.2.0"
2323

2424
# -- General configuration ---------------------------------------------------
2525
# https://www.sphinx-doc.org/en/master/usage/configuration.html#general-configuration

meson.build

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,7 @@
11
project(
22
'werpy',
33
'c', 'cython',
4-
version : '3.1.1',
4+
version : '3.2.0',
55
license: 'BSD-3',
66
meson_version: '>= 1.1.0',
77
default_options : [

pyproject.toml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -9,7 +9,7 @@ requires = [
99

1010
[project]
1111
name = 'werpy'
12-
version = '3.1.1'
12+
version = '3.2.0'
1313
description = 'A powerful yet lightweight Python package to calculate and analyze the Word Error Rate (WER).'
1414
readme = 'README.md'
1515
requires-python = '>=3.10'

werpy/__init__.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -5,7 +5,7 @@
55
The werpy package provides tools for calculating word error rates (WERs) and related metrics on text data.
66
"""
77

8-
__version__ = "3.1.1"
8+
__version__ = "3.2.0"
99

1010
from .errorhandler import error_handler
1111
from .normalize import normalize

werpy/errorhandler.py

Lines changed: 41 additions & 18 deletions
Original file line numberDiff line numberDiff line change
@@ -2,15 +2,19 @@
22
# SPDX-License-Identifier: BSD-3-Clause
33

44
"""
5-
Responsible for defining custom exceptions and handling errors across the package.
5+
Input validation and consistent exceptions for werpy public functions.
66
"""
77

8-
from .metrics import metrics
8+
import numpy as np
99

1010

1111
def error_handler(reference, hypothesis):
1212
"""
13-
This function provides the overall wrapper to handle exceptions within this package.
13+
Validate inputs and raise consistent exceptions.
14+
15+
This function does not compute metrics. Computation is handled by:
16+
- metrics.metrics (router) for strings and batches
17+
- metrics.calculations for a single pair
1418
1519
Parameters
1620
----------
@@ -30,24 +34,43 @@ def error_handler(reference, hypothesis):
3034
3135
Returns
3236
-------
33-
np.ndarray
34-
This function will return a ragged array containing the Word Error Rate, Levenshtein distance, the number of
35-
words in the reference sequence, insertions count, deletions count, substitutions count, a list of inserted
36-
words, a list of deleted words and a list of substituted words.
37+
bool
38+
True if validation passes.
3739
"""
38-
try:
39-
word_error_rate_breakdown = metrics(reference, hypothesis)
40-
except ValueError as exc:
41-
raise ValueError(
42-
"The Reference and Hypothesis input parameters must have the same number of elements."
43-
) from exc
44-
except AttributeError as exc:
40+
valid_types = (str, list, np.ndarray)
41+
42+
if not isinstance(reference, valid_types) or not isinstance(hypothesis, valid_types):
4543
raise AttributeError(
4644
"All text should be in a string format. Please check your input does not include any "
4745
"Numeric data types."
48-
) from exc
49-
except ZeroDivisionError as exc:
46+
)
47+
48+
ref_is_seq = isinstance(reference, (list, np.ndarray))
49+
hyp_is_seq = isinstance(hypothesis, (list, np.ndarray))
50+
51+
if ref_is_seq != hyp_is_seq:
52+
raise AttributeError(
53+
"Reference and hypothesis must both be strings, or both be lists/arrays."
54+
)
55+
56+
if ref_is_seq and hyp_is_seq:
57+
if len(reference) != len(hypothesis):
58+
raise ValueError(
59+
"The Reference and Hypothesis input parameters must have the same number of elements."
60+
)
61+
return True
62+
63+
# At this point, both are strings (validated above)
64+
ref_s = str(reference).strip()
65+
hyp_s = str(hypothesis).strip()
66+
67+
if ref_s == "" and hyp_s == "":
5068
raise ZeroDivisionError(
5169
"Invalid input: reference must not be blank, and reference and hypothesis cannot both be empty."
52-
) from exc
53-
return word_error_rate_breakdown
70+
)
71+
if ref_s == "":
72+
raise ZeroDivisionError(
73+
"Invalid input: reference must not be blank, and reference and hypothesis cannot both be empty."
74+
)
75+
76+
return True

0 commit comments

Comments
 (0)