Skip to content

feat: add Wasm components for outlier and changepoint detection; generate Go bindings#367

Open
sd2k wants to merge 8 commits intorand-0.9from
go-components
Open

feat: add Wasm components for outlier and changepoint detection; generate Go bindings#367
sd2k wants to merge 8 commits intorand-0.9from
go-components

Conversation

@sd2k
Copy link
Collaborator

@sd2k sd2k commented Oct 7, 2025

This is a proof of concept of using gravity to generate Go bindings
for some of augurs' functionality. There are two stages:

  1. Build Wasm components which wraps some of augurs key methods
  2. Generate Go bindings from the Wasm component using gravity.

This results in a Go package with some subpackages that can be used to
call augurs code.

I'm using a fork of gravity for now to get a few more features, which
will hopefully be merged upstream soon.

Summary by CodeRabbit

  • New Features

    • Added WebAssembly-based changepoint and outlier detectors with Go bindings.
    • Supported algorithms: Argpcp and BOCPD (changepoint), DBSCAN and MAD (outlier).
    • Provided factories to instantiate and run detectors in-process.
  • Documentation

    • Added guides for Rust components and Go bindings, including build and generation steps.
  • Tests

    • Introduced Go tests validating detection results for all algorithms.
  • Chores/CI

    • Added GitHub Actions workflow to build WASM components, generate/test Go bindings, and verify no diffs.

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Oct 7, 2025

Important

Review skipped

Auto reviews are disabled on base/target branches other than the default branch.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Walkthrough

Adds Rust WebAssembly components for changepoint and outlier detection with WIT interfaces, implements their logic and error handling, generates Go bindings and tests via Gravity/wazero, updates Rust workspace and dependencies, introduces a CI workflow to build/test and verify generated Go code, and makes minor JS doc cleanup.

Changes

Cohort / File(s) Summary
CI Workflow
.github/workflows/go.yml
New GitHub Actions workflow to install toolchains, build Wasm components, generate Go bindings, run Go tests, and verify no diffs in go/.
Rust Workspace & Patches
Cargo.toml
Adds components/rust/* to workspace members; retargets argmin and argmin-math patches to sd2k/argmin@timer-feature.
Rust Component: Changepoint - Manifest & WIT
components/rust/changepoint/Cargo.toml, components/rust/changepoint/wit/changepoint.wit, components/rust/README.md
New component package manifest (cdylib), WIT world/types for detect(input)->result, and directory README describing components and generation.
Rust Component: Changepoint - Implementation
components/rust/changepoint/src/lib.rs
Implements Wasm guest for changepoint detection (Argpcp, BOCPD), public error type, parameter parsing, and bindings via wit-bindgen.
Rust Component: Outlier - Manifest & WIT
components/rust/outlier/Cargo.toml, components/rust/outlier/wit/outlier.wit
New component package manifest (cdylib) and WIT world/types for outlier detection with DBSCAN and MAD variants.
Rust Component: Outlier - Implementation
components/rust/outlier/src/lib.rs
Implements Wasm guest for outlier detection, typed error mapping, data conversions, and exported detect.
Rust Crate Enhancement
crates/augurs-changepoint/src/lib.rs
Adds builder setters to DefaultArgpcpDetectorBuilder (length_scale, noise_level, max_lag, alpha0, beta0, logistic hazard h/a/b).
Go Workspace & Docs
go/go.mod, go/README.md, go/justfile
Initializes Go module, lists dependencies, documents generation/testing, and adds Just recipes to build Wasm, generate, and test.
Go Bindings: Changepoint
go/changepoint/doc.go, go/changepoint/changepoint.go, go/changepoint/changepoint_test.go
Generated Go API, factory/instance via wazero, Detect implementation bridging WIT, package doc and generate directive, and tests for Argpcp/BOCPD outputs.
Go Bindings: Outlier
go/outlier/doc.go, go/outlier/outlier.go, go/outlier/outlier_test.go
Generated Go API for DBSCAN/MAD, factory/instance, Detect bridging WIT, package doc and generate directive, and tests validating results and cluster band.
JS Minor Cleanup
js/augurs-changepoint-js/src/lib.rs
Removes a stray doc comment line; no API change.

Sequence Diagram(s)

sequenceDiagram
  autonumber
  actor GoApp as Go app
  participant GoHost as Go bindings (wazero)
  participant Wasm as Wasm component (changepoint)
  participant Rust as Rust logic

  GoApp->>GoHost: NewChangepointFactory/Instantiate
  GoApp->>GoHost: Detect(Input: data + algorithm)
  GoHost->>Wasm: call detect(input)
  Wasm->>Rust: parse params, select Argpcp/BOCPD
  Rust-->>Wasm: run detection -> Output or error
  Wasm-->>GoHost: result<Output, string>
  GoHost-->>GoApp: Output | error
  note over GoHost: cabi_post_detect cleanup
Loading
sequenceDiagram
  autonumber
  actor GoApp as Go app
  participant GoHost as Go bindings (wazero)
  participant Wasm as Wasm component (outlier)
  participant Rust as Rust logic

  GoApp->>GoHost: NewOutlierFactory/Instantiate
  GoApp->>GoHost: Detect(Input: series + algorithm)
  GoHost->>Wasm: call detect(input)
  Wasm->>Rust: dispatch DBSCAN/MAD
  Rust-->>Wasm: compute OutlyingSeries, SeriesResults, Band
  Wasm-->>GoHost: result<Output, string>
  GoHost-->>GoApp: Output | error
  note over GoHost: cabi_post_detect cleanup
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~75 minutes

Possibly related PRs

Poem

A rabbit compiled some Wasm tonight,
With changepoints found in moonlit light.
Outliers dance, then quietly go,
As Go and Rust share streams that flow.
CI hums a gentle tune,
“No diffs!” it squeaks beneath the moon.
Hop, build, test—then swoon.

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 40.74% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title Check ✅ Passed The title concisely describes the addition of Wasm components for outlier and changepoint detection and the generation of Go bindings, aligning directly with the primary objectives of the pull request. It avoids extraneous details while clearly conveying the key feature additions introduced by the author. The phrasing is specific and informative, making the pull request’s intent immediately understandable to reviewers.

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@sd2k sd2k changed the base branch from main to rand-0.9 October 7, 2025 11:09
@sd2k sd2k changed the title go components feat: add Wasm components for outlier and changepoint detection; generate Go bindings Oct 7, 2025
@sd2k sd2k force-pushed the rand-0.9 branch 2 times, most recently from 60eca04 to 0a00f22 Compare October 7, 2025 11:51
…rate Go bindings

This is a proof of concept of using [`gravity`] to generate Go bindings
for some of augurs' functionality. There are two stages:

1. Build Wasm components which wraps some of `augurs` key methods
2. Generate Go bindings from the Wasm component using gravity.

This results in a Go package with some subpackages that can be used to
call augurs code.

I'm using a fork of `gravity` for now to get a few more features, which
will hopefully be merged upstream soon.

[`gravity`]: https://github.com/sd2k/gravity
@sd2k sd2k marked this pull request as ready for review October 7, 2025 12:45
@sd2k
Copy link
Collaborator Author

sd2k commented Oct 7, 2025

@coderabbitai review

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Oct 7, 2025

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 7

🧹 Nitpick comments (15)
crates/augurs-changepoint/src/lib.rs (1)

167-176: Enforce positive parameters in builder methods
Each setter passes values into *Kernel::new_unchecked, which performs no bounds checks. Add validation to ensure inputs > 0.0 (e.g., panic with assert! or return a Result) in length_scale, noise_level, alpha0, and beta0.

.github/workflows/go.yml (2)

35-38: Fix Go cache configuration (wrong path).

The cache-dependency-path points into the Rust tree, so Go mod cache won’t warm. Point it at the Go module(s) or use setup-go’s built‑in cache.

Apply one of these:

Option A — built-in cache + correct paths:

       uses: actions/setup-go@0aaccfd150d50ccaeb58ebd88d36e91967a5f35b # v5.4.0
       with:
         go-version: "1.25"
-        cache-dependency-path: |
-          components/rust/outlier/go.sum
+        cache: true
+        cache-dependency-path: |
+          go/**/go.sum

Option B — simplest (let setup-go detect go.sum):

       with:
         go-version: "1.25"
-        cache-dependency-path: |
-          components/rust/outlier/go.sum
+        cache: true

52-53: Strengthen “no diffs” check (nested paths + untracked files).

Current glob may miss nested packages and untracked generated files. Use git pathspec glob and check untracked files too.

-      - name: Ensure no diffs
-        run: git diff --exit-code -- ./go/*/*.go
+      - name: Ensure no diffs
+        run: |
+          set -euo pipefail
+          # Tracked changes in any Go file under go/
+          git diff --exit-code -- ':(glob)go/**/*.go'
+          # Fail if any untracked generated Go files exist under go/
+          if git ls-files --others --exclude-standard -- ':(glob)go/**/*.go' | grep -q .; then
+            echo "Untracked generated Go files found under go/. Run 'just go/generate' and commit the results:" >&2
+            git ls-files --others --exclude-standard -- ':(glob)go/**/*.go' >&2
+            exit 1
+          fi
go/justfile (1)

1-12: LGTM. Optional: enforce strict shell for safer failures.

Consider setting a strict shell once so recipes fail fast on errors.

+set shell := ["bash", "-eu", "-o", "pipefail", "-c"]
+
 build-wasm:
     RUSTFLAGS='--cfg getrandom_backend="custom"' cargo build \
         --target wasm32-unknown-unknown \
         --release \
         -p augurs-outlier-component \
         -p augurs-changepoint-component
go/outlier/outlier_test.go (2)

34-41: Guard against out-of-bounds before indexing series results.

Add a length check before accessing SeriesResults[2] to avoid a panic and improve failure messages.

 		detected, err := ins.Detect(ctx, input)
 		require.NoError(t, err)
 
+		require.Len(t, detected.SeriesResults, len(input.Data))
 		assert.Len(t, detected.OutlyingSeries, 1)
 		assert.Contains(t, detected.OutlyingSeries, uint32(2))
 		assert.True(t, detected.SeriesResults[2].IsOutlier)

Based on learnings


71-77: Same here: check length before indexing.

Mirror the length assertion in the MAD subtest to prevent accidental panics.

 		detected, err := ins.Detect(ctx, input)
 		require.NoError(t, err)
 
+		require.Len(t, detected.SeriesResults, len(input.Data))
 		assert.Len(t, detected.OutlyingSeries, 1)
 		assert.Contains(t, detected.OutlyingSeries, uint32(3))
 		assert.True(t, detected.SeriesResults[3].IsOutlier)

Based on learnings

go/outlier/doc.go (1)

4-4: Document pre-req for go:generate.

Add a brief note that the Wasm artifact must exist (build-wasm) before running go:generate locally to avoid confusion.

For example, above the directive:

// Run `just go/build-wasm` first to produce the Wasm artifact referenced below.
components/rust/outlier/src/lib.rs (2)

32-57: Broaden error variant naming for clarity.

augurs_outlier::Error may cover more than sensitivity; the variant name InvalidSensitivity can mislead. Consider a general variant name.

 #[derive(Debug)]
 enum TypedError {
-    InvalidSensitivity(augurs_outlier::Error),
+    OutlierError(augurs_outlier::Error),
     TryFromIntError(std::num::TryFromIntError),
 }
 
 impl From<augurs_outlier::Error> for TypedError {
     fn from(value: augurs_outlier::Error) -> Self {
-        Self::InvalidSensitivity(value)
+        Self::OutlierError(value)
     }
 }
 
 impl fmt::Display for TypedError {
     fn fmt(&self, f: &mut fmt::Formatter<'_>) -> fmt::Result {
         match self {
-            Self::InvalidSensitivity(e) => write!(f, "invalid sensitivity: {}", e),
+            Self::OutlierError(e) => write!(f, "algorithm error: {}", e),
             Self::TryFromIntError(e) => write!(f, "overflow converting to u32: {}", e),
         }
     }
 }

82-111: Minor: use u32::try_from for readability and consistency.

This avoids the generic TryInto syntax and matches the end-field conversion.

     Ok(Output {
         outlying_series: output
             .outlying_series
             .into_iter()
-            .map(TryInto::<u32>::try_into)
+            .map(u32::try_from)
             .collect::<Result<_, _>>()?,
         series_results: output
             .series_results
             .into_iter()
             .map(|series| {
                 Ok(Series {
                     is_outlier: series.is_outlier,
                     outlier_intervals: series
                         .outlier_intervals
                         .intervals
                         .into_iter()
                         .map(|interval| {
                             Ok(OutlierInterval {
-                                start: interval.start.try_into()?,
-                                end: interval.end.map(TryInto::try_into).transpose()?,
+                                start: u32::try_from(interval.start)?,
+                                end: interval.end.map(u32::try_from).transpose()?,
                             })
                         })
                         .collect::<Result<_, TypedError>>()?,
                     scores: series.scores,
                 })
             })
             .collect::<Result<_, TypedError>>()?,
components/rust/changepoint/wit/changepoint.wit (2)

114-118: Prefer a named error alias for stability and cross-world consistency

Define a types.error alias and use it in the world result to avoid leaking raw string and to ease future evolution.

 interface types {
+    /// Errors that can occur during changepoint detection.
+    /// Currently represented as a string; aliased for forward-compatibility.
+    type error = string;
 }
 
 world changepoint {
     use types.{input, output};
     /// Detect changepoints in the input.
-    export detect: func(input: input) -> result<output, string>;
+    export detect: func(input: input) -> result<output, error>;
 }

41-61: Document non-zero constraint for max-lag

max-lag is validated as NonZeroUsize in Rust (InvalidMaxLag on zero). Clarify this here to align expectations.

Consider adding “Must be > 0.” to the max-lag field docs.

components/rust/outlier/wit/outlier.wit (2)

20-26: Avoid leaking Rust impl detail (“BTreeSet”) in cross-language docs

This is exposed as list in WIT. Rephrase to “Sorted unique indices of outlying series,” or similar.


139-141: Use the named error alias in the world signature

Align with the defined error type alias for consistency.

-    export detect: func(input: input) -> result<output, string>;
+    export detect: func(input: input) -> result<output, error>;
go/changepoint/changepoint.go (1)

555-562: Avoid panic on guest cleanup failure

Panicking in a library call is harsh. Prefer logging or best-effort cleanup without panic; at minimum, wrap with context in the returned error path.

go/outlier/outlier.go (1)

369-375: Avoid panic on guest cleanup failure

As above, prefer non-panicking cleanup or logging; panics from a library are undesirable.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 0a00f22 and e270dba.

⛔ Files ignored due to path filters (3)
  • go/changepoint/changepoint.wasm is excluded by !**/*.wasm
  • go/go.sum is excluded by !**/*.sum
  • go/outlier/outlier.wasm is excluded by !**/*.wasm
📒 Files selected for processing (20)
  • .github/workflows/go.yml (1 hunks)
  • Cargo.toml (3 hunks)
  • components/rust/README.md (1 hunks)
  • components/rust/changepoint/Cargo.toml (1 hunks)
  • components/rust/changepoint/src/lib.rs (1 hunks)
  • components/rust/changepoint/wit/changepoint.wit (1 hunks)
  • components/rust/outlier/Cargo.toml (1 hunks)
  • components/rust/outlier/src/lib.rs (1 hunks)
  • components/rust/outlier/wit/outlier.wit (1 hunks)
  • crates/augurs-changepoint/src/lib.rs (1 hunks)
  • go/README.md (1 hunks)
  • go/changepoint/changepoint.go (1 hunks)
  • go/changepoint/changepoint_test.go (1 hunks)
  • go/changepoint/doc.go (1 hunks)
  • go/go.mod (1 hunks)
  • go/justfile (1 hunks)
  • go/outlier/doc.go (1 hunks)
  • go/outlier/outlier.go (1 hunks)
  • go/outlier/outlier_test.go (1 hunks)
  • js/augurs-changepoint-js/src/lib.rs (0 hunks)
💤 Files with no reviewable changes (1)
  • js/augurs-changepoint-js/src/lib.rs
🔇 Additional comments (11)
go/go.mod (2)

5-8: LGTM! Dependency versions are current.

The direct dependencies are appropriate for this use case:

  • testify v1.11.1 for test assertions (latest stable)
  • wazero v1.9.0 for WebAssembly runtime (recent release)

Both versions align with the learnings provided and support the WASM-based Go bindings architecture.

Based on learnings.


3-3: Go version is correct Confirmed that Go 1.25.1 is the current stable release on go.dev.

components/rust/outlier/Cargo.toml (1)

1-21: LGTM! Standard Wasm component configuration.

The Cargo manifest is properly configured for a WebAssembly component:

  • cdylib crate type is correct for generating Wasm binaries
  • Workspace dependencies (augurs-outlier, wit-bindgen, wit-component) are appropriate for WIT-based component generation
  • publish = false is correct for an internal component
Cargo.toml (2)

3-3: LGTM! Workspace member pattern correctly includes Rust components.

The glob pattern "components/rust/*" appropriately includes all Rust Wasm components (changepoint, outlier) added in this PR.


101-103: LGTM! Fork usage is properly documented.

The patch to use sd2k/argmin with the timer-feature branch is clearly documented in the inline comment. This temporary fork supports the Argpcp detector functionality needed by the changepoint component.

crates/augurs-changepoint/src/lib.rs (1)

166-212: LGTM! Builder methods correctly expose Argpcp parameters.

The new builder methods properly implement the builder pattern for configuring the DefaultArgpcpDetector. Each method:

  • Sets the corresponding field
  • Returns self for method chaining
  • Has clear documentation

These methods align with the WASM interface parameters (ArgpcpParams) exposed through the Go bindings.

go/README.md (1)

1-36: LGTM! Clear and comprehensive documentation.

The README provides:

  • Clear explanation of the Go bindings architecture
  • Complete prerequisites list (Rust, Go, gravity, just)
  • Actionable build steps with commands
  • Proper context about the gravity tool

This will help developers understand and build the Go bindings.

go/changepoint/doc.go (1)

1-4: LGTM! Go generate directive correctly configured.

The package documentation and go:generate directive are properly set up:

  • Package comment clearly describes the purpose
  • Gravity command references correct WIT file and WASM binary paths
  • Paths are relative to the file location
go/changepoint/changepoint_test.go (1)

10-51: LGTM! Well-structured parallel tests with proper cleanup.

The test suite demonstrates best practices:

  • Parallel execution with t.Parallel()
  • Proper resource cleanup using t.Cleanup()
  • Appropriate use of require for errors (stop on failure) and assert for values
  • Clear test organization with subtests

Both algorithms (argpcp and bocpd) detect the same changepoint at index 6, which corresponds to the large jump from ~0.9 to 25.3 in the test data.

Based on learnings.

components/rust/README.md (1)

1-7: LGTM! Clear documentation for Rust Wasm components.

The README effectively:

  • Explains the directory's purpose
  • Describes usage options (direct Wasm or generated bindings)
  • References the gravity tool for host binding generation
  • Points to the Go package and build tooling
components/rust/changepoint/Cargo.toml (1)

1-21: Cargo manifest looks correct for a Wasm component.

cdylib target, workspace deps, and publish=false are appropriate.

Comment on lines 55 to 63
impl fmt::Display for ChangepointError {
fn fmt(&self, f: &mut fmt::Formatter<'_>) -> fmt::Result {
match self {
Self::NormalGammaError(e) => write!(f, "invalid Normal Gamma distribution: {}", e),
Self::TryFromIntError(e) => write!(f, "overflow converting to u32: {}", e),
Self::InvalidMaxLag(ml) => write!(f, "invalid max lag: {}", ml),
}
}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Error message mentions wrong target type

It says “overflow converting to u32” but the conversion is u32 -> usize. Fix the wording.

-            Self::TryFromIntError(e) => write!(f, "overflow converting to u32: {}", e),
+            Self::TryFromIntError(e) => write!(f, "overflow converting to usize: {}", e),
🤖 Prompt for AI Agents
In components/rust/changepoint/src/lib.rs around lines 55 to 63, the Display
message for TryFromIntError incorrectly says "overflow converting to u32" while
the code converts from u32 to usize; update the error string to reflect the
actual conversion target (e.g., "overflow converting u32 to usize" or "overflow
converting to usize") so the message accurately describes the conversion and
include the original error details as before.

Comment on lines 6 to 12
/// The input for the outlier detector.
///
/// Currently this is represented as a string because Gravity
/// does not yet support more complex types.
///
/// It should be a JSON object with the following fields:
record input {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Stale docs: input is not a string

Comments say input is represented as a string, but the type is a record with list<list>. Update/remove the note.

🤖 Prompt for AI Agents
In components/rust/outlier/wit/outlier.wit around lines 6 to 12, the comment
incorrectly states the input is represented as a string; update or remove that
note so the documentation matches the actual type (record input with
list<list<f64>>). Edit the comment to either remove the "represented as a
string" sentence or replace it with a brief accurate description that the input
is a nested list of f64 (JSON array of arrays of numbers) and ensure any mention
of Gravity limitations is omitted or reworded to reflect current support.

Comment on lines 116 to 131
/// The algorithm to use for outlier detection.
variant algorithm {
/// The DBSCAN algorithm.
///
/// This algorithm is a density-based algorithm that uses a
/// clustering algorithm to group together points that are
/// close to each other.
dbscan(dbscan-params),

/// The MAD algorithm.
///
/// This algorithm is a density-based algorithm that uses a
/// clustering algorithm to group together points that are
/// close to each other.
mad(mad-params),
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Incorrect description of MAD

MAD isn’t a density/clustering algorithm. Update to describe Median Absolute Deviation thresholding and its sensitivity/threshold semantics.

🤖 Prompt for AI Agents
In components/rust/outlier/wit/outlier.wit around lines 116 to 131, the comment
for the mad variant incorrectly describes it as a density/clustering algorithm;
replace that description with one stating that mad implements Median Absolute
Deviation thresholding, explaining it computes the median, measures absolute
deviations from the median, and flags points beyond a configurable threshold
(sensitivity) as outliers; also note its robustness to outliers and that the
threshold parameter controls strictness (higher = fewer outliers, lower = more),
and update wording to match style of the dbscan comment.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant

Comments