feat: add Wasm components for outlier and changepoint detection; generate Go bindings#367
feat: add Wasm components for outlier and changepoint detection; generate Go bindings#367
Conversation
|
Important Review skippedAuto reviews are disabled on base/target branches other than the default branch. Please check the settings in the CodeRabbit UI or the You can disable this status message by setting the WalkthroughAdds Rust WebAssembly components for changepoint and outlier detection with WIT interfaces, implements their logic and error handling, generates Go bindings and tests via Gravity/wazero, updates Rust workspace and dependencies, introduces a CI workflow to build/test and verify generated Go code, and makes minor JS doc cleanup. Changes
Sequence Diagram(s)sequenceDiagram
autonumber
actor GoApp as Go app
participant GoHost as Go bindings (wazero)
participant Wasm as Wasm component (changepoint)
participant Rust as Rust logic
GoApp->>GoHost: NewChangepointFactory/Instantiate
GoApp->>GoHost: Detect(Input: data + algorithm)
GoHost->>Wasm: call detect(input)
Wasm->>Rust: parse params, select Argpcp/BOCPD
Rust-->>Wasm: run detection -> Output or error
Wasm-->>GoHost: result<Output, string>
GoHost-->>GoApp: Output | error
note over GoHost: cabi_post_detect cleanup
sequenceDiagram
autonumber
actor GoApp as Go app
participant GoHost as Go bindings (wazero)
participant Wasm as Wasm component (outlier)
participant Rust as Rust logic
GoApp->>GoHost: NewOutlierFactory/Instantiate
GoApp->>GoHost: Detect(Input: series + algorithm)
GoHost->>Wasm: call detect(input)
Wasm->>Rust: dispatch DBSCAN/MAD
Rust-->>Wasm: compute OutlyingSeries, SeriesResults, Band
Wasm-->>GoHost: result<Output, string>
GoHost-->>GoApp: Output | error
note over GoHost: cabi_post_detect cleanup
Estimated code review effort🎯 4 (Complex) | ⏱️ ~75 minutes Possibly related PRs
Poem
Pre-merge checks and finishing touches❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
60eca04 to
0a00f22
Compare
…rate Go bindings This is a proof of concept of using [`gravity`] to generate Go bindings for some of augurs' functionality. There are two stages: 1. Build Wasm components which wraps some of `augurs` key methods 2. Generate Go bindings from the Wasm component using gravity. This results in a Go package with some subpackages that can be used to call augurs code. I'm using a fork of `gravity` for now to get a few more features, which will hopefully be merged upstream soon. [`gravity`]: https://github.com/sd2k/gravity
|
@coderabbitai review |
✅ Actions performedReview triggered.
|
There was a problem hiding this comment.
Actionable comments posted: 7
🧹 Nitpick comments (15)
crates/augurs-changepoint/src/lib.rs (1)
167-176: Enforce positive parameters in builder methods
Each setter passes values into*Kernel::new_unchecked, which performs no bounds checks. Add validation to ensure inputs > 0.0 (e.g., panic withassert!or return aResult) inlength_scale,noise_level,alpha0, andbeta0..github/workflows/go.yml (2)
35-38: Fix Go cache configuration (wrong path).The cache-dependency-path points into the Rust tree, so Go mod cache won’t warm. Point it at the Go module(s) or use setup-go’s built‑in cache.
Apply one of these:
Option A — built-in cache + correct paths:
uses: actions/setup-go@0aaccfd150d50ccaeb58ebd88d36e91967a5f35b # v5.4.0 with: go-version: "1.25" - cache-dependency-path: | - components/rust/outlier/go.sum + cache: true + cache-dependency-path: | + go/**/go.sumOption B — simplest (let setup-go detect go.sum):
with: go-version: "1.25" - cache-dependency-path: | - components/rust/outlier/go.sum + cache: true
52-53: Strengthen “no diffs” check (nested paths + untracked files).Current glob may miss nested packages and untracked generated files. Use git pathspec glob and check untracked files too.
- - name: Ensure no diffs - run: git diff --exit-code -- ./go/*/*.go + - name: Ensure no diffs + run: | + set -euo pipefail + # Tracked changes in any Go file under go/ + git diff --exit-code -- ':(glob)go/**/*.go' + # Fail if any untracked generated Go files exist under go/ + if git ls-files --others --exclude-standard -- ':(glob)go/**/*.go' | grep -q .; then + echo "Untracked generated Go files found under go/. Run 'just go/generate' and commit the results:" >&2 + git ls-files --others --exclude-standard -- ':(glob)go/**/*.go' >&2 + exit 1 + figo/justfile (1)
1-12: LGTM. Optional: enforce strict shell for safer failures.Consider setting a strict shell once so recipes fail fast on errors.
+set shell := ["bash", "-eu", "-o", "pipefail", "-c"] + build-wasm: RUSTFLAGS='--cfg getrandom_backend="custom"' cargo build \ --target wasm32-unknown-unknown \ --release \ -p augurs-outlier-component \ -p augurs-changepoint-componentgo/outlier/outlier_test.go (2)
34-41: Guard against out-of-bounds before indexing series results.Add a length check before accessing SeriesResults[2] to avoid a panic and improve failure messages.
detected, err := ins.Detect(ctx, input) require.NoError(t, err) + require.Len(t, detected.SeriesResults, len(input.Data)) assert.Len(t, detected.OutlyingSeries, 1) assert.Contains(t, detected.OutlyingSeries, uint32(2)) assert.True(t, detected.SeriesResults[2].IsOutlier)Based on learnings
71-77: Same here: check length before indexing.Mirror the length assertion in the MAD subtest to prevent accidental panics.
detected, err := ins.Detect(ctx, input) require.NoError(t, err) + require.Len(t, detected.SeriesResults, len(input.Data)) assert.Len(t, detected.OutlyingSeries, 1) assert.Contains(t, detected.OutlyingSeries, uint32(3)) assert.True(t, detected.SeriesResults[3].IsOutlier)Based on learnings
go/outlier/doc.go (1)
4-4: Document pre-req for go:generate.Add a brief note that the Wasm artifact must exist (build-wasm) before running go:generate locally to avoid confusion.
For example, above the directive:
// Run `just go/build-wasm` first to produce the Wasm artifact referenced below.components/rust/outlier/src/lib.rs (2)
32-57: Broaden error variant naming for clarity.augurs_outlier::Error may cover more than sensitivity; the variant name InvalidSensitivity can mislead. Consider a general variant name.
#[derive(Debug)] enum TypedError { - InvalidSensitivity(augurs_outlier::Error), + OutlierError(augurs_outlier::Error), TryFromIntError(std::num::TryFromIntError), } impl From<augurs_outlier::Error> for TypedError { fn from(value: augurs_outlier::Error) -> Self { - Self::InvalidSensitivity(value) + Self::OutlierError(value) } } impl fmt::Display for TypedError { fn fmt(&self, f: &mut fmt::Formatter<'_>) -> fmt::Result { match self { - Self::InvalidSensitivity(e) => write!(f, "invalid sensitivity: {}", e), + Self::OutlierError(e) => write!(f, "algorithm error: {}", e), Self::TryFromIntError(e) => write!(f, "overflow converting to u32: {}", e), } } }
82-111: Minor: use u32::try_from for readability and consistency.This avoids the generic TryInto syntax and matches the end-field conversion.
Ok(Output { outlying_series: output .outlying_series .into_iter() - .map(TryInto::<u32>::try_into) + .map(u32::try_from) .collect::<Result<_, _>>()?, series_results: output .series_results .into_iter() .map(|series| { Ok(Series { is_outlier: series.is_outlier, outlier_intervals: series .outlier_intervals .intervals .into_iter() .map(|interval| { Ok(OutlierInterval { - start: interval.start.try_into()?, - end: interval.end.map(TryInto::try_into).transpose()?, + start: u32::try_from(interval.start)?, + end: interval.end.map(u32::try_from).transpose()?, }) }) .collect::<Result<_, TypedError>>()?, scores: series.scores, }) }) .collect::<Result<_, TypedError>>()?,components/rust/changepoint/wit/changepoint.wit (2)
114-118: Prefer a named error alias for stability and cross-world consistencyDefine a types.error alias and use it in the world result to avoid leaking raw string and to ease future evolution.
interface types { + /// Errors that can occur during changepoint detection. + /// Currently represented as a string; aliased for forward-compatibility. + type error = string; } world changepoint { use types.{input, output}; /// Detect changepoints in the input. - export detect: func(input: input) -> result<output, string>; + export detect: func(input: input) -> result<output, error>; }
41-61: Document non-zero constraint for max-lagmax-lag is validated as NonZeroUsize in Rust (InvalidMaxLag on zero). Clarify this here to align expectations.
Consider adding “Must be > 0.” to the max-lag field docs.
components/rust/outlier/wit/outlier.wit (2)
20-26: Avoid leaking Rust impl detail (“BTreeSet”) in cross-language docsThis is exposed as list in WIT. Rephrase to “Sorted unique indices of outlying series,” or similar.
139-141: Use the named error alias in the world signatureAlign with the defined error type alias for consistency.
- export detect: func(input: input) -> result<output, string>; + export detect: func(input: input) -> result<output, error>;go/changepoint/changepoint.go (1)
555-562: Avoid panic on guest cleanup failurePanicking in a library call is harsh. Prefer logging or best-effort cleanup without panic; at minimum, wrap with context in the returned error path.
go/outlier/outlier.go (1)
369-375: Avoid panic on guest cleanup failureAs above, prefer non-panicking cleanup or logging; panics from a library are undesirable.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
⛔ Files ignored due to path filters (3)
go/changepoint/changepoint.wasmis excluded by!**/*.wasmgo/go.sumis excluded by!**/*.sumgo/outlier/outlier.wasmis excluded by!**/*.wasm
📒 Files selected for processing (20)
.github/workflows/go.yml(1 hunks)Cargo.toml(3 hunks)components/rust/README.md(1 hunks)components/rust/changepoint/Cargo.toml(1 hunks)components/rust/changepoint/src/lib.rs(1 hunks)components/rust/changepoint/wit/changepoint.wit(1 hunks)components/rust/outlier/Cargo.toml(1 hunks)components/rust/outlier/src/lib.rs(1 hunks)components/rust/outlier/wit/outlier.wit(1 hunks)crates/augurs-changepoint/src/lib.rs(1 hunks)go/README.md(1 hunks)go/changepoint/changepoint.go(1 hunks)go/changepoint/changepoint_test.go(1 hunks)go/changepoint/doc.go(1 hunks)go/go.mod(1 hunks)go/justfile(1 hunks)go/outlier/doc.go(1 hunks)go/outlier/outlier.go(1 hunks)go/outlier/outlier_test.go(1 hunks)js/augurs-changepoint-js/src/lib.rs(0 hunks)
💤 Files with no reviewable changes (1)
- js/augurs-changepoint-js/src/lib.rs
🔇 Additional comments (11)
go/go.mod (2)
5-8: LGTM! Dependency versions are current.The direct dependencies are appropriate for this use case:
testify v1.11.1for test assertions (latest stable)wazero v1.9.0for WebAssembly runtime (recent release)Both versions align with the learnings provided and support the WASM-based Go bindings architecture.
Based on learnings.
3-3: Go version is correct Confirmed that Go 1.25.1 is the current stable release on go.dev.components/rust/outlier/Cargo.toml (1)
1-21: LGTM! Standard Wasm component configuration.The Cargo manifest is properly configured for a WebAssembly component:
cdylibcrate type is correct for generating Wasm binaries- Workspace dependencies (
augurs-outlier,wit-bindgen,wit-component) are appropriate for WIT-based component generationpublish = falseis correct for an internal componentCargo.toml (2)
3-3: LGTM! Workspace member pattern correctly includes Rust components.The glob pattern
"components/rust/*"appropriately includes all Rust Wasm components (changepoint, outlier) added in this PR.
101-103: LGTM! Fork usage is properly documented.The patch to use
sd2k/argminwith thetimer-featurebranch is clearly documented in the inline comment. This temporary fork supports the Argpcp detector functionality needed by the changepoint component.crates/augurs-changepoint/src/lib.rs (1)
166-212: LGTM! Builder methods correctly expose Argpcp parameters.The new builder methods properly implement the builder pattern for configuring the
DefaultArgpcpDetector. Each method:
- Sets the corresponding field
- Returns
selffor method chaining- Has clear documentation
These methods align with the WASM interface parameters (
ArgpcpParams) exposed through the Go bindings.go/README.md (1)
1-36: LGTM! Clear and comprehensive documentation.The README provides:
- Clear explanation of the Go bindings architecture
- Complete prerequisites list (Rust, Go, gravity, just)
- Actionable build steps with commands
- Proper context about the gravity tool
This will help developers understand and build the Go bindings.
go/changepoint/doc.go (1)
1-4: LGTM! Go generate directive correctly configured.The package documentation and
go:generatedirective are properly set up:
- Package comment clearly describes the purpose
- Gravity command references correct WIT file and WASM binary paths
- Paths are relative to the file location
go/changepoint/changepoint_test.go (1)
10-51: LGTM! Well-structured parallel tests with proper cleanup.The test suite demonstrates best practices:
- Parallel execution with
t.Parallel()- Proper resource cleanup using
t.Cleanup()- Appropriate use of
requirefor errors (stop on failure) andassertfor values- Clear test organization with subtests
Both algorithms (
argpcpandbocpd) detect the same changepoint at index 6, which corresponds to the large jump from ~0.9 to 25.3 in the test data.Based on learnings.
components/rust/README.md (1)
1-7: LGTM! Clear documentation for Rust Wasm components.The README effectively:
- Explains the directory's purpose
- Describes usage options (direct Wasm or generated bindings)
- References the gravity tool for host binding generation
- Points to the Go package and build tooling
components/rust/changepoint/Cargo.toml (1)
1-21: Cargo manifest looks correct for a Wasm component.cdylib target, workspace deps, and publish=false are appropriate.
| impl fmt::Display for ChangepointError { | ||
| fn fmt(&self, f: &mut fmt::Formatter<'_>) -> fmt::Result { | ||
| match self { | ||
| Self::NormalGammaError(e) => write!(f, "invalid Normal Gamma distribution: {}", e), | ||
| Self::TryFromIntError(e) => write!(f, "overflow converting to u32: {}", e), | ||
| Self::InvalidMaxLag(ml) => write!(f, "invalid max lag: {}", ml), | ||
| } | ||
| } | ||
| } |
There was a problem hiding this comment.
Error message mentions wrong target type
It says “overflow converting to u32” but the conversion is u32 -> usize. Fix the wording.
- Self::TryFromIntError(e) => write!(f, "overflow converting to u32: {}", e),
+ Self::TryFromIntError(e) => write!(f, "overflow converting to usize: {}", e),🤖 Prompt for AI Agents
In components/rust/changepoint/src/lib.rs around lines 55 to 63, the Display
message for TryFromIntError incorrectly says "overflow converting to u32" while
the code converts from u32 to usize; update the error string to reflect the
actual conversion target (e.g., "overflow converting u32 to usize" or "overflow
converting to usize") so the message accurately describes the conversion and
include the original error details as before.
| /// The input for the outlier detector. | ||
| /// | ||
| /// Currently this is represented as a string because Gravity | ||
| /// does not yet support more complex types. | ||
| /// | ||
| /// It should be a JSON object with the following fields: | ||
| record input { |
There was a problem hiding this comment.
Stale docs: input is not a string
Comments say input is represented as a string, but the type is a record with list<list>. Update/remove the note.
🤖 Prompt for AI Agents
In components/rust/outlier/wit/outlier.wit around lines 6 to 12, the comment
incorrectly states the input is represented as a string; update or remove that
note so the documentation matches the actual type (record input with
list<list<f64>>). Edit the comment to either remove the "represented as a
string" sentence or replace it with a brief accurate description that the input
is a nested list of f64 (JSON array of arrays of numbers) and ensure any mention
of Gravity limitations is omitted or reworded to reflect current support.
| /// The algorithm to use for outlier detection. | ||
| variant algorithm { | ||
| /// The DBSCAN algorithm. | ||
| /// | ||
| /// This algorithm is a density-based algorithm that uses a | ||
| /// clustering algorithm to group together points that are | ||
| /// close to each other. | ||
| dbscan(dbscan-params), | ||
|
|
||
| /// The MAD algorithm. | ||
| /// | ||
| /// This algorithm is a density-based algorithm that uses a | ||
| /// clustering algorithm to group together points that are | ||
| /// close to each other. | ||
| mad(mad-params), | ||
| } |
There was a problem hiding this comment.
Incorrect description of MAD
MAD isn’t a density/clustering algorithm. Update to describe Median Absolute Deviation thresholding and its sensitivity/threshold semantics.
🤖 Prompt for AI Agents
In components/rust/outlier/wit/outlier.wit around lines 116 to 131, the comment
for the mad variant incorrectly describes it as a density/clustering algorithm;
replace that description with one stating that mad implements Median Absolute
Deviation thresholding, explaining it computes the median, measures absolute
deviations from the median, and flags points beyond a configurable threshold
(sensitivity) as outliers; also note its robustness to outliers and that the
threshold parameter controls strictness (higher = fewer outliers, lower = more),
and update wording to match style of the dbscan comment.
This is a proof of concept of using
gravityto generate Go bindingsfor some of augurs' functionality. There are two stages:
augurskey methodsThis results in a Go package with some subpackages that can be used to
call augurs code.
I'm using a fork of
gravityfor now to get a few more features, whichwill hopefully be merged upstream soon.
Summary by CodeRabbit
New Features
Documentation
Tests
Chores/CI