analyticsinmotion · rossarmstrong · Dec 19, 2025 · Dec 19, 2025 · Dec 19, 2025 · Dec 19, 2025
diff --git a/CHANGELOG.md b/CHANGELOG.md
@@ -4,6 +4,21 @@
 This changelog file outlines a chronologically ordered list of the changes made on this project.
 It is organized by version and release date followed by a list of Enhancements, New Features, Bug Fixes, and/or Breaking Changes.
 
+## Version 3.3.0
+
+**Released:** December 19, 2025
+**Tag:** v3.3.0
+
+### Enhancements
+
+- Implemented ultra-fast WER-only path with space-optimized 2-row dynamic programming algorithm and batch buffer reuse. Added four new functions (`calculations_wer_only()`, `_calculations_wer_only_reuse_ptr()`, `_metrics_batch_wer_only()`, `metrics_wer_only()`) that eliminate backtrace overhead and use O(n) memory instead of O(m×n). This optimization uses pointer swapping instead of value copying and reuses DP buffers across entire batches, providing significant performance gains for `wer()` and `wers()` functions that only need the WER metric without error counts or word lists.
+
+- Fixed portability issue in WER-only batch processing by replacing platform-dependent `int*` pointers with guaranteed 32-bit `cnp.int32_t*` pointers. This ensures correct behavior on all platforms where `sizeof(int)` may differ from 4 bytes, while also removing unnecessary type casts for cleaner code that follows NumPy/Cython best practices.
+
+- Expanded benchmarking support by adding optional third-party WER libraries (`pywer`, `evaluate`, `universal-edit-distance`, `torchmetrics`) to `pyproject.toml` under the `benchmarks` extra. Updated `benchmark_synthetic_data_local.py` to safely import optional dependencies, ensure all benchmark functions are always defined, and enforce consistent numeric return types. This fixes static analysis warnings, prevents runtime errors when optional packages are missing, and enables more comprehensive and reliable cross-package performance comparisons.
+
+- Standardized all Levenshtein dynamic programming buffers and memoryviews to use cnp.int32_t instead of platform-dependent int. This ensures strict dtype alignment with NumPy int32 arrays, removes undefined behavior on platforms where sizeof(int) != 4, and improves type safety without impacting performance.
+
 ## Version 3.2.0
 
 **Released:** December 15, 2025

diff --git a/README.md b/README.md
@@ -13,7 +13,7 @@
 | | |
 | --- | --- |
 | Meta | [![Python Version](https://img.shields.io/badge/python-3.10%7C3.11%7C3.12%7C3.13-blue?logo=python&logoColor=ffdd54)](https://www.python.org/downloads/)&nbsp;&nbsp; [![Black Code Style](https://img.shields.io/badge/code%20style-black-000000.svg)](https://github.com/psf/black)&nbsp;&nbsp; [![Documentation Status](https://readthedocs.org/projects/werpy/badge/?version=latest)](https://werpy.readthedocs.io/en/latest/?badge=latest)&nbsp;&nbsp; [![Analytics in Motion](https://raw.githubusercontent.com/analyticsinmotion/.github/main/assets/images/analytics-in-motion-github-badge-rounded.svg)](https://www.analyticsinmotion.com) |
-| License | [![werpy License](https://img.shields.io/badge/License-BSD_3--Clause-blue.svg)](https://github.com/analyticsinmotion/werpy/blob/main/LICENSE)&nbsp;&nbsp; [![FOSSA Status](https://app.fossa.com/api/projects/git%2Bgithub.com%2Fanalyticsinmotion%2Fwerpy.svg?type=small)](https://app.fossa.com/projects/git%2Bgithub.com%2Fanalyticsinmotion/werpy?ref=badge_small)&nbsp;&nbsp; [![REUSE status](https://api.reuse.software/badge/github.com/analyticsinmotion/werpy)](https://api.reuse.software/info/github.com/analyticsinmotion/werpy) |
+| License | [![werpy License](https://img.shields.io/badge/License-BSD_3--Clause-blue.svg)](https://github.com/analyticsinmotion/werpy/blob/main/LICENSE)&nbsp;&nbsp; [![FOSSA Status](https://app.fossa.com/api/projects/git%2Bgithub.com%2Fanalyticsinmotion%2Fwerpy.svg?type=small)](https://app.fossa.com/projects/git+github.com%2Fanalyticsinmotion%2Fwerpy?ref=badge_small)&nbsp;&nbsp; [![REUSE status](https://api.reuse.software/badge/github.com/analyticsinmotion/werpy)](https://api.reuse.software/info/github.com/analyticsinmotion/werpy) |
 | Security | [![CodeQL](https://github.com/analyticsinmotion/werpy/actions/workflows/codeql.yml/badge.svg)](https://github.com/analyticsinmotion/werpy/actions/workflows/codeql.yml)&nbsp;&nbsp; [![Codacy Security Scan](https://github.com/analyticsinmotion/werpy/actions/workflows/codacy.yml/badge.svg)](https://github.com/analyticsinmotion/werpy/actions/workflows/codacy.yml)&nbsp;&nbsp; [![Bandit](https://github.com/analyticsinmotion/werpy/actions/workflows/bandit.yml/badge.svg)](https://github.com/analyticsinmotion/werpy/actions/workflows/bandit.yml) |
 | Testing | [![CodeFactor](https://www.codefactor.io/repository/github/analyticsinmotion/werpy/badge)](https://www.codefactor.io/repository/github/analyticsinmotion/werpy)&nbsp;&nbsp; [![CircleCI](https://dl.circleci.com/status-badge/img/gh/analyticsinmotion/werpy/tree/main.svg?style=svg)](https://dl.circleci.com/status-badge/redirect/gh/analyticsinmotion/werpy/tree/main)&nbsp;&nbsp; [![codecov](https://codecov.io/gh/analyticsinmotion/werpy/graph/badge.svg?token=GGT823AVM8)](https://codecov.io/gh/analyticsinmotion/werpy) |
 | Package | [![Pypi](https://img.shields.io/pypi/v/werpy?label=PyPI&color=blue)](https://pypi.org/project/werpy/)&nbsp;&nbsp; [![PyPI Downloads](https://img.shields.io/pypi/dm/werpy?label=PyPI%20downloads)](https://pypi.org/project/werpy/)&nbsp;&nbsp; [![Downloads](https://static.pepy.tech/badge/werpy)](https://pepy.tech/project/werpy)&nbsp;&nbsp; [![PyPI - Trusted Publisher](https://img.shields.io/badge/PyPI-Trusted%20Publisher-blue)](https://pypi.org/project/werpy/) |

diff --git a/docs/source/conf.py b/docs/source/conf.py
@@ -19,7 +19,7 @@
 project = "werpy"
 copyright = f'{datetime.now().year} <a href="https://www.analyticsinmotion.com">Analytics in Motion</a>'
 author = "Ross Armstrong"
-release = "3.2.0"
+release = "3.3.0"
 
 # -- General configuration ---------------------------------------------------
 # https://www.sphinx-doc.org/en/master/usage/configuration.html#general-configuration

diff --git a/meson.build b/meson.build
@@ -1,7 +1,7 @@
 project(
     'werpy', 
     'c', 'cython',
-    version : '3.2.0',
+    version : '3.3.0',
     license: 'BSD-3',
     meson_version: '>= 1.1.0',
     default_options : [

diff --git a/pyproject.toml b/pyproject.toml
@@ -9,7 +9,7 @@ requires = [
 
 [project]
 name = 'werpy'
-version = '3.2.0'
+version = '3.3.0'
 description = 'A powerful yet lightweight Python package to calculate and analyze the Word Error Rate (WER).'
 readme = 'README.md'
 requires-python = '>=3.10'
@@ -75,4 +75,8 @@ benchmarks = [
     "datasets>=4.4.1",
     "werx>=0.3.1",
     "jiwer>=4.0.0",
-]
+    "pywer>=0.1.1",
+    "evaluate>=0.4.6",
+    "universal-edit-distance>=0.4.3",
+    "torchmetrics",
+]
diff --git a/werpy/__init__.py b/werpy/__init__.py
@@ -5,7 +5,7 @@
 The werpy package provides tools for calculating word error rates (WERs) and related metrics on text data.
 """
 
-__version__ = "3.2.0"
+__version__ = "3.3.0"
 
 from .errorhandler import error_handler
 from .normalize import normalize

diff --git a/werpy/metrics.pyx b/werpy/metrics.pyx
@@ -55,13 +55,13 @@ cpdef cnp.ndarray calculations(object reference, object hypothesis):
     # SAFETY: All cells are explicitly initialized below (row 0, col 0, then DP loop).
     # Allocate the (m+1) x (n+1) DP matrix without zero-initialization to avoid
     # redundant memory writes. Boundary conditions are initialized explicitly.
-    cdef int[:, :] ldm = np.empty((m + 1, n + 1), dtype=np.int32)
+    cdef cnp.int32_t[:, :] ldm = np.empty((m + 1, n + 1), dtype=np.int32)
 
     # Initialize first column and first row (boundary conditions)
     for i in range(m + 1):
-        ldm[i, 0] = <int>i
+        ldm[i, 0] = <cnp.int32_t>i
     for j in range(n + 1):
-        ldm[0, j] = <int>j
+        ldm[0, j] = <cnp.int32_t>j
 
     # Fill the Levenshtein distance matrix
     # Compute edit distances using a branch-free inner loop and manual minimum
@@ -181,13 +181,13 @@ cpdef cnp.ndarray calculations_fast(object reference, object hypothesis):
     cdef int cost, del_cost, ins_cost, sub_cost, best
 
     # Allocate the (m+1) x (n+1) DP matrix without zero-initialization
-    cdef int[:, :] ldm = np.empty((m + 1, n + 1), dtype=np.int32)
+    cdef cnp.int32_t[:, :] ldm = np.empty((m + 1, n + 1), dtype=np.int32)
 
     # Initialize first column and first row (boundary conditions)
     for i in range(m + 1):
-        ldm[i, 0] = <int>i
+        ldm[i, 0] = <cnp.int32_t>i
     for j in range(n + 1):
-        ldm[0, j] = <int>j
+        ldm[0, j] = <cnp.int32_t>j
 
     # Fill the Levenshtein distance matrix
     for i in range(1, m + 1):
@@ -268,3 +268,177 @@ cpdef object metrics_fast(object reference, object hypothesis):
     if isinstance(reference, (list, np.ndarray)) and isinstance(hypothesis, (list, np.ndarray)):
         return _metrics_batch_fast(list(reference), list(hypothesis))
     return calculations_fast(reference, hypothesis)
+
+
+@cython.boundscheck(False)
+@cython.wraparound(False)
+cpdef cnp.ndarray calculations_wer_only(object reference, object hypothesis):
+    """
+    WER-only fast path - 2-row DP (O(n) memory), no backtrace.
+    Returns only [wer, ld, m] without error counts or word tracking.
+
+    This is the fastest path for pure WER calculation, using space-optimized
+    Wagner-Fischer algorithm with rolling 2-row buffer instead of full matrix.
+
+    Returns (3,) float64 array: [wer, ld, m]
+    """
+    cdef list reference_word = reference.split()
+    cdef list hypothesis_word = hypothesis.split()
+
+    cdef Py_ssize_t m = len(reference_word)
+    cdef Py_ssize_t n = len(hypothesis_word)
+
+    cdef Py_ssize_t i, j
+    cdef int cost, del_cost, ins_cost, sub_cost, best, ld
+    cdef double wer
+
+    cdef cnp.ndarray prev_arr = np.empty(n + 1, dtype=np.int32)
+    cdef cnp.ndarray curr_arr = np.empty(n + 1, dtype=np.int32)
+
+    cdef cnp.int32_t[:] prev = prev_arr
+    cdef cnp.int32_t[:] curr = curr_arr
+
+    for j in range(n + 1):
+        prev[j] = <cnp.int32_t>j
+
+    for i in range(1, m + 1):
+        curr[0] = <cnp.int32_t>i
+        for j in range(1, n + 1):
+            cost = 0 if reference_word[i - 1] == hypothesis_word[j - 1] else 1
+
+            del_cost = prev[j] + 1
+            ins_cost = curr[j - 1] + 1
+            sub_cost = prev[j - 1] + cost
+
+            best = del_cost
+            if ins_cost < best:
+                best = ins_cost
+            if sub_cost < best:
+                best = sub_cost
+
+            curr[j] = best
+
+        prev, curr = curr, prev
+
+    ld = prev[n]
+    wer = (<double>ld) / m if m > 0 else 0.0
+
+    return np.array([wer, <double>ld, <double>m], dtype=np.float64)
+
+
+@cython.boundscheck(False)
+@cython.wraparound(False)
+cdef inline void _calculations_wer_only_reuse_ptr(
+    object reference,
+    object hypothesis,
+    cnp.int32_t* prev,
+    cnp.int32_t* curr,
+    double* out3,
+) except *:
+    """
+    Internal WER-only DP using caller-provided buffers and pointer swap (no copying).
+    Writes: out3[0]=wer, out3[1]=ld, out3[2]=m
+
+    This implementation uses true pointer swapping instead of copying values,
+    eliminating O(n) copy overhead per outer iteration.
+    """
+    cdef list reference_word = reference.split()
+    cdef list hypothesis_word = hypothesis.split()
+
+    cdef Py_ssize_t m = len(reference_word)
+    cdef Py_ssize_t n = len(hypothesis_word)
+
+    cdef Py_ssize_t i, j
+    cdef int cost, del_cost, ins_cost, sub_cost, best, ld
+    cdef cnp.int32_t* tmp
+
+    # Initialize base row: prev[j] = j for j=0..n
+    for j in range(n + 1):
+        prev[j] = j
+
+    for i in range(1, m + 1):
+        curr[0] = i
+        for j in range(1, n + 1):
+            cost = 0 if reference_word[i - 1] == hypothesis_word[j - 1] else 1
+
+            del_cost = prev[j] + 1
+            ins_cost = curr[j - 1] + 1
+            sub_cost = prev[j - 1] + cost
+
+            best = del_cost
+            if ins_cost < best:
+                best = ins_cost
+            if sub_cost < best:
+                best = sub_cost
+
+            curr[j] = best
+
+        # Swap prev and curr pointers (zero-cost operation)
+        tmp = prev
+        prev = curr
+        curr = tmp
+
+    ld = prev[n]
+    out3[0] = (<double>ld) / m if m > 0 else 0.0
+    out3[1] = <double>ld
+    out3[2] = <double>m
+
+
+@cython.boundscheck(False)
+@cython.wraparound(False)
+cdef cnp.ndarray _metrics_batch_wer_only(list references, list hypotheses):
+    """
+    Fast batch processing for WER-only calculations with buffer reuse and pointer swapping.
+
+    Eliminates repeated buffer allocations by reusing prev/curr arrays across all pairs
+    in the batch, sized to the maximum hypothesis length. Uses true pointer swapping
+    instead of value copying for optimal performance.
+
+    Returns (n, 3) float64 array where each row contains:
+    [wer, ld, m]
+    """
+    cdef Py_ssize_t n_pairs = len(references)
+    cdef Py_ssize_t idx
+
+    cdef cnp.ndarray out = np.empty((n_pairs, 3), dtype=np.float64)
+
+    # Find max hypothesis token length to size buffers once
+    cdef Py_ssize_t max_n = 0
+    cdef Py_ssize_t this_n
+    cdef object h
+    cdef list h_words
+    for idx in range(n_pairs):
+        h = hypotheses[idx]
+        h_words = h.split()
+        this_n = len(h_words)
+        if this_n > max_n:
+            max_n = this_n
+
+    # Allocate reusable DP buffers once for the entire batch
+    cdef cnp.ndarray prev_arr = np.empty(max_n + 1, dtype=np.int32)
+    cdef cnp.ndarray curr_arr = np.empty(max_n + 1, dtype=np.int32)
+
+    # Get raw pointers for zero-cost swapping
+    cdef cnp.int32_t* prev = <cnp.int32_t*>cnp.PyArray_DATA(prev_arr)
+    cdef cnp.int32_t* curr = <cnp.int32_t*>cnp.PyArray_DATA(curr_arr)
+
+    # Process each pair using shared buffers, writing directly to output rows
+    cdef double* out_row
+    for idx in range(n_pairs):
+        out_row = <double*>cnp.PyArray_DATA(out) + (idx * 3)
+        _calculations_wer_only_reuse_ptr(references[idx], hypotheses[idx], prev, curr, out_row)
+
+    return out
+
+
+cpdef object metrics_wer_only(object reference, object hypothesis):
+    """
+    WER-only metrics entry point (fastest path).
+
+    Returns:
+    - strings: (3,) float64 array [wer, ld, m]
+    - sequences: (n, 3) float64 array, one row per pair
+    """
+    if isinstance(reference, (list, np.ndarray)) and isinstance(hypothesis, (list, np.ndarray)):
+        return _metrics_batch_wer_only(list(reference), list(hypothesis))
+    return calculations_wer_only(reference, hypothesis)
diff --git a/werpy/wer.py b/werpy/wer.py
@@ -12,7 +12,7 @@
 
 import numpy as np
 from .errorhandler import error_handler
-from .metrics import metrics_fast
+from .metrics import metrics_wer_only
 
 
 def wer(reference, hypothesis) -> float | np.float64 | None:
@@ -57,17 +57,17 @@ def wer(reference, hypothesis) -> float | np.float64 | None:
     """
     try:
         error_handler(reference, hypothesis)
-        result = metrics_fast(reference, hypothesis)
+        result = metrics_wer_only(reference, hypothesis)
     except (ValueError, AttributeError, ZeroDivisionError) as err:
         print(f"{type(err).__name__}: {str(err)}")
         return None
 
-    # Batch: (n, 6) float64
+    # Batch: (n, 3) float64, columns [wer, ld, m]
     if isinstance(result, np.ndarray) and result.ndim == 2:
-        den = np.sum(result[:, 2])
-        return float(np.sum(result[:, 1]) / den) if den else 0.0
+        den = np.sum(result[:, 2])  # m column
+        return float(np.sum(result[:, 1]) / den) if den else 0.0  # ld column
 
-    # Single: (6,) float64, WER is at index 0
+    # Single: (3,) float64, WER is at index 0
     if isinstance(result, np.ndarray) and getattr(result, "ndim", 0) == 0:
         result = result.item()
 

diff --git a/werpy/wers.py b/werpy/wers.py
@@ -11,7 +11,7 @@
 
 import numpy as np
 from .errorhandler import error_handler
-from .metrics import metrics_fast
+from .metrics import metrics_wer_only
 
 
 def wers(reference, hypothesis):
@@ -50,16 +50,16 @@ def wers(reference, hypothesis):
     """
     try:
         error_handler(reference, hypothesis)
-        result = metrics_fast(reference, hypothesis)
+        result = metrics_wer_only(reference, hypothesis)
     except (ValueError, AttributeError, ZeroDivisionError) as err:
         print(f"{type(err).__name__}: {str(err)}")
         return None
 
-    # Batch: (n, 6) float64
+    # Batch: (n, 3) float64, columns [wer, ld, m]
     if isinstance(result, np.ndarray) and result.ndim == 2:
-        return result[:, 0].tolist()
+        return result[:, 0].tolist()  # Return wer column
 
-    # Single: (6,) float64, WER is at index 0
+    # Single: (3,) float64, WER is at index 0
     if isinstance(result, np.ndarray) and getattr(result, "ndim", 0) == 0:
         result = result.item()