feat(classification): add semantic pattern matching for GUID, email, base64, format strings, and user agents #131

coderabbitai · 2026-01-26T04:12:01Z

⚠️ Potential issue | 🟡 Minor

Fix MD036 by using a heading instead of bold text.
Markdownlint reports emphasis used as a heading. Convert the bold label to a heading or list item.

Proposed fix

-**New semantic tag**: Add variant to `Tag` enum in `types/mod.rs`, implement pattern in `classification/patterns/` or `classification/mod.rs` +### New semantic tag +Add variant to `Tag` enum in `types/mod.rs`, implement pattern in `classification/patterns/` or `classification/mod.rs`

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

## Adding Features

**New semantic tag**: Add variant to `Tag` enum in `types.rs`, implement pattern in `classification/semantic.rs`

**New semantic tag**: Add variant to `Tag` enum in `types/mod.rs`, implement pattern in `classification/patterns/` or `classification/mod.rs`

## Adding Features

### New semantic tag

Add variant to `Tag` enum in `types/mod.rs`, implement pattern in `classification/patterns/` or `classification/mod.rs`

🤖 Prompt for AI Agents

In `@AGENTS.md` around lines 88 - 91, Replace the bold label "**New semantic tag**:" with a proper Markdown heading (e.g., "### New semantic tag") or a list item to satisfy MD036; keep the rest of the sentence ("Add variant to `Tag` enum in `types/mod.rs`, implement pattern in `classification/patterns/` or `classification/mod.rs`") unchanged and ensure the heading level aligns with surrounding headings in AGENTS.md so references to the Tag enum (`Tag`), types/mod.rs, and classification/patterns/ or classification/mod.rs remain clear.

coderabbitai · 2026-01-26T04:12:01Z

⚠️ Potential issue | 🟡 Minor

Batch sample misses a format-string match.

"Error %s at line {}" uses {} which does not match the documented format-string patterns that rely on % specifiers or {digits}. This reduces the intended mix of categories in the batch set.

Proposed fix

- samples.push(format!("Error %s at line {}", index)); + samples.push(format!("Error %s at line %d {}", index));

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

fn bench_batch_classification(c: &mut Criterion) {

let classifier = SemanticClassifier::new();

let context = make_context();

let mut samples = Vec::new();

for index in 0..1000 {

samples.push(format!("{{12345678-1234-1234-1234-{:012x}}}", index));

samples.push(format!("user{}@example.com", index));

samples.push(format!("Error %s at line {}", index));

}

c.bench_function("classification_batch", |b| {

b.iter(|| {

for sample in &samples {

let _ = classifier.classify(black_box(sample.as_str()), &context);

}

});

});

}

fn bench_batch_classification(c: &mut Criterion) {

let classifier = SemanticClassifier::new();

let context = make_context();

let mut samples = Vec::new();

for index in 0..1000 {

samples.push(format!("{{12345678-1234-1234-1234-{:012x}}}", index));

samples.push(format!("user{}@example.com", index));

samples.push(format!("Error %s at line %d {}", index));

}

c.bench_function("classification_batch", |b| {

b.iter(|| {

for sample in &samples {

let _ = classifier.classify(black_box(sample.as_str()), &context);

}

});

});

}

🤖 Prompt for AI Agents

In `@benches/classification.rs` around lines 84 - 102, In bench_batch_classification replace the mixed-format string "Error %s at line {}" so it matches the classifier's documented patterns (use % specifiers or numeric {digits}); update the format! call that populates samples (inside the loop that builds samples) to use a numeric placeholder like "Error %s at line {0}" (still passing index), so the sample vector contains the intended %-style pattern and the classifier.classify calls (classifier.classify(..., &context)) will see the correct mix of formats.

-Original file line number
+Diff line change
@@ -0,0 +1,66 @@
+    # This is an example .goreleaser.yml file with some sensible defaults.
+    # Make sure to check the documentation at https://goreleaser.com
+    # The lines below are called `modelines`. See `:help modeline`
+    # Feel free to remove those if you don't want/need to use them.
+    # yaml-language-server: $schema=https://goreleaser.com/static/schema.json
+    # vim: set ts=2 sw=2 tw=0 fo=cnqoj
+    version: 2
+    before:
+      hooks:
+        # Ensure cargo-zigbuild is available for cross-compilation
+        # Note: rustup toolchain is pinned via rust-toolchain.toml
+        - cargo install --locked cargo-zigbuild
+        - cargo fetch --locked
+    builds:
+      # macOS targets - use regular cargo (zigbuild has issues with macOS linker flags)
+      - builder: rust
+        id: darwin
+        command: build
+        flags:
+          - --release
+        targets:
+          - x86_64-apple-darwin
+          - aarch64-apple-darwin
+      # Linux/Windows targets - use cargo-zigbuild for cross-compilation
+      - builder: rust
+        id: cross
+        command: zigbuild
+        flags:
+          - --release
+        targets:
+          - x86_64-unknown-linux-gnu
+          - aarch64-unknown-linux-gnu
+          - x86_64-pc-windows-gnu
+    archives:
+      - formats: [tar.gz]
+        # this name template makes the OS and Arch compatible with the results of `uname`.
+        name_template: >-
+          {{ .ProjectName }}_
+          {{- title .Os }}_
+          {{- if eq .Arch "amd64" }}x86_64
+          {{- else if eq .Arch "386" }}i386
+          {{- else }}{{ .Arch }}{{ end }}
+        # use zip for windows archives
+        format_overrides:
+          - goos: windows
+            formats: [zip]
+    changelog:
+      sort: asc
+      filters:
+        exclude:
+          - "^docs:"
+          - "^test:"
+    release:
+      footer: >-
+        ---
+        Released by [GoReleaser](https://github.com/goreleaser/goreleaser).

-Original file line number
+Diff line change
@@ Expand Up / @@ -22,15 +22,15 @@ path = "src/main.rs" @@
     [dependencies]
     clap           = { version = "4.5.54", features = [ "derive" ] }
     cpp_demangle   = "0.5.1"
-    entropy        = "0.4.2"
+    entropy        = "0.4.3"
     goblin         = "0.10.4"
     once_cell      = "1.21.3"
     pelite         = "0.10.0"
     regex          = "1.12.2"
     rustc-demangle = "0.1.27"
     serde          = { version = "1.0.228", features = [ "derive" ] }
     serde_json     = "1.0.149"
-    thiserror      = "2.0.17"
+    thiserror      = "2.0.18"
     [dev-dependencies]
     criterion = "0.8.1"
@@ Expand All / @@ -46,6 +46,10 @@ lto = "thin" @@
     name    = "elf"
     harness = false
+    [[bench]]
+    name    = "classification"
+    harness = false
     [[bench]]
     name    = "pe"
     harness = false
@@ Expand Down @@

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat(classification): add semantic pattern matching for GUID, email, base64, format strings, and user agents #131

Uh oh!

Diff view

Diff view

There are no files selected for viewing

coderabbitai bot Jan 26, 2026

Uh oh!

coderabbitai bot Jan 26, 2026

Uh oh!

Uh oh!

Uh oh!

Uh oh!

feat(classification): add semantic pattern matching for GUID, email, base64, format strings, and user agents #131

Uh oh!

feat(classification): add semantic pattern matching for GUID, email, base64, format strings, and user agents #131

Uh oh!

Uh oh!

Diff view

Diff view

There are no files selected for viewing

coderabbitai bot Jan 26, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Jan 26, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!