feat: Implement enum-based output formatters (Table, JSON, YARA) #127

unclesp1d3r · 2026-01-18T16:44:50Z

This pull request primarily updates documentation and adds new output formatting capabilities. The most significant change is the introduction of a new JSON output module, along with the re-export of output formatting types. Additionally, several documentation examples are updated for clarity and consistency, particularly regarding string extraction and construction.

Output formatting improvements:

Added a new format_json function in src/output/json.rs for formatting extracted strings as JSONL, including comprehensive tests for serialization, escaping, and error handling.
Re-exported output formatting types and functions (OutputFormat, OutputMetadata, format_json, format_output, format_table_with_mode, format_yara) in src/lib.rs for easier access by library users.

Documentation updates and consistency improvements:

Updated documentation examples across multiple modules (src/classification/mod.rs, src/classification/semantic.rs, src/classification/symbols.rs) to use the FoundString::new constructor instead of manual struct initialization, improving clarity and reducing boilerplate. [1] [2] [3] [4]
Updated extraction-related documentation to reflect the new function name extract_utf16_strings (formerly extract_utf16le_strings) and improved example code for section-aware extraction and PE resource extraction, including proper function signatures and error handling. [1] [2] [3] [4] [5] [6] [7] [8] [9] [10]

These changes improve the usability of the library for both developers and end users by providing better output options and clearer documentation.

Signed-off-by: UncleSp1d3r <unclesp1d3r@evilbitlabs.io>

Adds a human-readable table output formatter for extracted strings, with automatic TTY detection. In TTY mode, outputs an aligned table with headers and truncated columns; in non-TTY mode, outputs plain string text, one per line. Includes dynamic column width calculation, tag formatting, Unicode-safe truncation, and comprehensive integration and unit tests with snapshot verification.

Added full implementations for `format_json` and `format_yara` functions, including serialization logic, escaping, and tag/group handling. Updated documentation and examples to reflect new UTF-16 extraction API and output formatting. Added comprehensive integration and snapshot tests for both output formats to ensure correctness and stability.

- Introduced an optional `generated_at` field in `OutputMetadata` for deterministic outputs. - Updated YARA formatter to utilize the `generated_at` timestamp instead of runtime timestamps. - Adjusted integration tests and snapshots to reflect changes in generated timestamps. Signed-off-by: UncleSp1d3r <unclesp1d3r@evilbitlabs.io>

coderabbitai · 2026-01-18T16:45:01Z

Note

Other AI code review bot(s) detected

CodeRabbit has detected other AI code review bot(s) in this pull request and will avoid duplicating their findings in the review comments. This may lead to a less comprehensive review.

📝 Walkthrough

Architectural Overview

This PR introduces a dedicated, extensible output-formatting subsystem and documentation/tests that separate presentation from extraction/classification and enable pluggable formatters (Table, JSONL, YARA) dispatched by a central API.

Key Architectural Changes

New output subsystem: src/output with submodules json, table, yara; an OutputFormatter trait; and a dispatcher format_output() that selects behavior via OutputFormat.
Output metadata: OutputFormat enum and OutputMetadata struct (new(...) and with_generated_at(...)) to select format and support deterministic outputs.
Table formatter: TTY-aware and plain modes, Unicode-safe truncation/padding, tag selection/formatting utilities, sanitization for TTY output, and explicit format_table_with_mode for tests/programmatic control.
JSON formatter: JSON Lines emitter (format_json) using serde serialization with validation (FoundString.confidence must be finite); comprehensive unit and integration tests for serialization, escaping, encodings, optional fields, and error propagation.
YARA formatter: tag-grouped rule generation (format_yara) with binary-name sanitization, encoding-aware emission (Ascii/Utf8 and UTF-16 BE/LE handling including wide/hex forms), length-based skipping (>200 chars), deterministic timestamp support, and tests including escaping/UTF-16 hex utilities.

Public API & Type Changes

FoundString::new(text: String, encoding: Encoding, offset: usize, length: u32, source: StringSource) is adopted in docs/examples (examples and doc-tests updated) to discourage direct struct-literal construction.
Crate-root re-exports in src/lib.rs: OutputFormat, OutputFormatter, OutputMetadata, format_json, format_output, format_table_with_mode, format_yara to simplify consumer access.
Extraction re-export renamed: extract_utf16le_strings → extract_utf16_strings (public re-export in src/extraction/mod.rs), with extract_utf16_from_section still available.
Encoding and StringSource now derive Hash, enabling (text, Encoding) map keys used by dedup internals and other HashMap-based grouping.
New public table API: format_table(strings, metadata) and format_table_with_mode(strings, metadata, is_tty).

Deduplication & Internal Data Handling

Dedup internals refactored to group by (text, Encoding) using HashMap<(String, Encoding), Vec>.
Tag merging and scoring use HashSet-based uniqueness; per-group dedup threshold logic controls occurrence-based bonuses.
These are internal algorithmic refactors; public dedup entry points/signatures remain unchanged.

Binary Format & Extraction Impact

Extractors' core logic preserved; public symbol rename for UTF-16 re-export is the main API-level extraction change.
Examples demonstrate end-to-end binary-format-aware workflows and how to construct OutputMetadata and produce different outputs (examples/*).

Semantic Classification & Preservation

Classification pipeline (semantic tagging and demangling) remains unchanged; formatters preserve and expose semantic metadata:
- JSONL: serializes tags, source, encoding, optional fields, and confidence.
- YARA: groups by tag for rule sections and uses tags in identifiers.
- Table: displays tags, scores, and section info with truncation/sanitization.
Documentation and examples updated to show semantic filtering/grouping (e.g., network indicators example).

Testing, Documentation & Developer Experience

Extensive unit and integration tests added for JSON, Table (TTY/plain), YARA, YARA escaping/UTF-16 hex utilities, and table formatting helpers; many edge cases covered via insta snapshots.
Doctests and examples updated (FoundString::new, extract_utf16_strings, proper doc-test signatures returning stringy::Result).
Developer experience additions: CONTRIBUTING.md, ROADMAP.md, TESTING_ANALYSIS.md, devcontainer, mise.toml, justfile, and VS Code settings.

Notable Behavioral Constraints & Validations

JSON formatter errors if any FoundString.confidence is non-finite or if serde serialization fails.
YARA formatter:
- Skips strings >200 chars and annotates skips.
- Emits encoding-appropriate forms (quoted with ascii/utf8, "wide" for UTF-16LE ASCII, hex for non-ASCII UTF-16 and UTF-16BE), with surrogate handling.
- Sanitizes binary/rule/identifier names for valid output.
Table formatter: Unicode-safe truncation with ellipsis, fixed column widths, TTY-aligned table vs. plain one-line-per-string mode, and control-character sanitization.

Overall Impact

This PR substantially strengthens Stringy's presentation layer by adding a robust, extensible formatting subsystem that preserves extraction/classification semantics, clarifies consumer ergonomics (FoundString::new, crate-level re-exports), enables deterministic outputs via metadata, and increases confidence through comprehensive tests and updated documentation.

Walkthrough

Adds a new output formatting subsystem (JSONL, table, YARA) with metadata-driven dispatch, updates doc examples to use FoundString::new(...), refactors dedup grouping to use encoding-typed keys, and introduces examples, integration tests, tooling, and documentation updates.

Changes

Cohort / File(s)	Summary
Output Core & Re-exports `src/output/mod.rs`, `src/lib.rs`	New `OutputFormat`, `OutputMetadata`, `OutputFormatter` trait, `format_output` dispatcher and public re-exports for JSON/Table/YARA formatters.
JSON Formatter `src/output/json.rs`, `tests/output_json_integration.rs`	New `format_json` producing JSON Lines with confidence validation and comprehensive tests.
Table Formatter `src/output/table/mod.rs`, `src/output/table/tty.rs`, `src/output/table/plain.rs`, `src/output/table/formatting.rs`, `tests/output_table_integration.rs`	TTY-aware table formatter, plain fallback, alignment/truncation utilities, `format_tags`, and extensive integration tests.
YARA Formatter & Escaping `src/output/yara/mod.rs`, `src/output/yara/escaping.rs`, `tests/output_yara_integration.rs`	`format_yara` builds YARA rules (tag grouping, encoding handling, sanitization); escaping and UTF‑16 hex helpers with tests.
FoundString Constructor & Docs `src/classification/mod.rs`, `src/classification/semantic.rs`, `src/classification/symbols.rs`	Doc examples updated to use `FoundString::new(...)` instead of struct literals.
Extraction API & Docs `src/extraction/mod.rs`, `src/extraction/ascii.rs`, `src/extraction/pe_resources.rs`	Re-export rename `extract_utf16le_strings` → `extract_utf16_strings`; doctest fixes and example updates.
Deduplication & Types `src/extraction/dedup.rs`, `src/types.rs`	Dedup grouping key changed to `(text, Encoding)`; `Encoding` and `StringSource` now derive `Hash`; group merging and threshold logic revised.
Formatting Utilities `src/output/table/formatting.rs`	New `Alignment` enum, `format_tags`, `truncate_string`, `pad_string` with Unicode-safe handling and unit tests.
Examples & CLI Demos `examples/basic_extraction.rs`, `examples/network_indicators.rs`, `examples/output_formats.rs`	New runnable examples demonstrating extraction, indicator filtering, and multi-format output.
Developer Infra & Config `.devcontainer/devcontainer.json`, `mise.toml`, `justfile`, `.github/dependabot.yml`	New devcontainer, pinned tools in `mise.toml`, `justfile` updated to use `mise-install`, dependabot intervals adjusted.
Docs, Changelog & Tests Strategy `CONTRIBUTING.md`, `ROADMAP.md`, `CHANGELOG.md`, `TESTING_ANALYSIS.md`	New contributor guide, roadmap, unreleased changelog, and testing analysis document.
Linter & Misc Configs `.mega-linter.yml` (removed), `.gitignore`, `.mdformat.toml`, `cspell.config.yaml`, `.claude/settings.json`, `.coderabbit.yml`, `AGENTS.md`	MegaLinter removed, ignore lists updated, minor config/doc tweaks and Unicode punctuation exception.

Sequence Diagram(s)

sequenceDiagram
    actor Client
    participant Dispatcher as format_output(strings, metadata)
    participant Meta as OutputMetadata
    participant Formatter as Formatter (JSON/Table/YARA)
    participant Strings as FoundString[]
    Client->>Dispatcher: format_output(strings, metadata)
    Dispatcher->>Meta: inspect metadata.format
    alt format == Json
        Dispatcher->>Formatter: format_json(strings, metadata)
    else format == Table
        Dispatcher->>Formatter: format_table(strings, metadata)
    else format == Yara
        Dispatcher->>Formatter: format_yara(strings, metadata)
    end
    Formatter->>Strings: process encodings, tags, scores
    Formatter-->>Dispatcher: Result<String>
    Dispatcher-->>Client: formatted output

sequenceDiagram
    actor Caller
    participant YaraFmt as format_yara()
    participant Grouper as Tag Grouper
    participant Escaper as Escaping Utils
    participant Builder as Rule Builder
    Caller->>YaraFmt: strings, metadata
    YaraFmt->>Grouper: group strings by tag/encoding/length
    Grouper-->>YaraFmt: HashMap<Tag, Vec<FoundString>>
    loop per tag group
        YaraFmt->>Escaper: escape / utf16 hex / sanitize names
        Escaper-->>YaraFmt: escaped literals
        YaraFmt->>Builder: append string var + comment + score
    end
    YaraFmt->>Builder: add meta + condition
    Builder-->>Caller: completed YARA rule

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~50 minutes

Possibly related PRs

feat(types): extend FoundString data model with non_exhaustive and builder pattern #123 — FoundString constructor/builder changes; aligns with the FoundString::new usage and API surface updates.
feat(parse): implement string deduplication with metadata preservation #117 — Deduplication refactor touching src/extraction/dedup.rs and grouping-by-(text,encoding); closely related.
feat(classification): add symbol demangling for Rust #124 — Classification and extraction edits overlapping src/classification/* and src/extraction/*; likely overlapping changes.

Poem

Outputters hum in measured lines,
Tables trim, JSONL aligns,
YARA escapes with careful art,
FoundString born from a single start,
Tooling tuned — the pipeline sings.

🚥 Pre-merge checks | ✅ 6 | ❌ 2

❌ Failed checks (2 warnings)

Check name	Status	Explanation	Resolution
Title check	⚠️ Warning	The PR title follows Conventional Commits format with type 'feat' and a clear description, but lacks the required scope identifier (e.g., 'feat(output)' instead of 'feat').	Add a scope to the title: 'feat(output): Implement enum-based output formatters (Table, JSON, YARA)' to comply with the specified format requirements.
File Size Limit	⚠️ Warning	TESTING_ANALYSIS.md exceeds 500-line limit with 523 lines; all Rust files comply and no blanket #[allow] attributes present on modules.	Split TESTING_ANALYSIS.md into multiple focused documents under 500 lines each, or relocate to /docs directory structure.

✅ Passed checks (6 passed)

Check name	Status	Explanation
Description check	✅ Passed	The PR description thoroughly explains the changes, including JSON output module additions, re-exports for easier access, and documentation updates with specific examples and rationale.
Docstring Coverage	✅ Passed	Docstring coverage is 100.00% which is sufficient. The required threshold is 70.00%.
No Unsafe Code	✅ Passed	Pull request introduces no unsafe code, `#[allow(unsafe_code)]` attributes, or unsafe function definitions across 700+ lines of new output formatting modules and modified documentation.
Error Handling	✅ Passed	All error handling in new output modules consistently uses StringyError variants. External serde_json errors are properly mapped with descriptive context. Production code contains no bare error strings. Test code appropriately uses panic!().
Ascii Only	✅ Passed	PR contains no Unicode punctuation characters in modified files; all punctuation remains ASCII-compliant.
Section Weight Consistency	✅ Passed	This check is not applicable to the provided pull request. The custom check verifies section weight consistency specifically for changes to container parsers, but this PR modifies none of the container parser files. The PR focuses exclusively on output formatting, documentation examples, and API re-exports, which are outside the scope of this check.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch output_formatters

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

Copilot

Pull request overview

This pull request implements comprehensive output formatting capabilities for Stringy, adding JSON, YARA, and Table formatters with extensive test coverage. The PR also updates documentation examples across multiple modules to use the FoundString::new constructor pattern for improved consistency and clarity.

Changes:

Implements three output formatters (JSON/JSONL, YARA rule templates, and human-readable tables) with TTY detection and format-specific features
Adds comprehensive integration tests with snapshot testing for all three formatters covering edge cases, encoding variations, and special characters
Updates documentation examples across classification and extraction modules to consistently use constructor patterns instead of manual struct initialization

Reviewed changes

Copilot reviewed 64 out of 64 changed files in this pull request and generated 3 comments.

Show a summary per file

File	Description
src/output/mod.rs	Core output dispatcher with OutputFormat enum and OutputMetadata struct
src/output/json.rs	JSONL formatter with proper escaping and optional field handling
src/output/table.rs	Table formatter with TTY detection, column alignment, and truncation
src/output/yara.rs	YARA rule generator with string grouping, sanitization, and encoding modifiers
tests/output_json_integration.rs	Comprehensive JSON formatter tests (266 lines)
tests/output_table_integration.rs	Comprehensive table formatter tests (396 lines)
tests/output_yara_integration.rs	Comprehensive YARA formatter tests (197 lines)
tests/snapshots/*	37 new snapshot files for output format verification
src/lib.rs	Re-exports output types and updates UTF-16 function name references
src/extraction/*	Updates documentation examples and function signatures
src/classification/*	Updates documentation to use FoundString::new constructor

tests/output_table_integration.rs

codecov · 2026-01-18T16:49:27Z

Codecov Report

❌ Patch coverage is 96.29948% with 43 lines in your changes missing coverage. Please review.

Files with missing lines	Patch %	Lines
src/output/mod.rs	89.41%	18 Missing ⚠️
src/output/yara/mod.rs	96.18%	10 Missing ⚠️
src/output/table/tty.rs	96.73%	5 Missing ⚠️
src/output/table/mod.rs	82.60%	4 Missing ⚠️
src/output/json.rs	98.50%	3 Missing ⚠️
src/output/table/formatting.rs	98.80%	2 Missing ⚠️
src/output/yara/escaping.rs	99.19%	1 Missing ⚠️

📢 Thoughts on this report? Let us know!

Added 'superpowers@claude-plugins-official' to the enabledPlugins list in .claude/settings.json.

coderabbitai

Actionable comments posted: 8

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

src/extraction/pe_resources.rs (1)

1-1449: File exceeds 500-600 line limit.

At 1449 lines, this file is nearly 3x the recommended maximum. Per coding guidelines, consider splitting into focused modules:

pe_resources/mod.rs - public API and orchestration (extract_resources, extract_resource_strings)

pe_resources/version_info.rs - VERSIONINFO detection and extraction

pe_resources/string_table.rs - STRINGTABLE parsing logic

pe_resources/manifest.rs - MANIFEST detection and encoding handling

pe_resources/tests.rs or tests/pe_resources.rs - test cases (836+ lines of tests alone)

🤖 Fix all issues with AI agents

In `@src/output/json.rs`:
- Around line 6-9: In format_json, remove the misleading underscore from the
used parameter (_strings) so the signature becomes format_json(strings:
&[FoundString], metadata: &OutputMetadata) and update all references to use
strings; for the currently unused metadata parameter decide one of two fixes: if
metadata should be used, incorporate it into the output (for example add a JSON
header or comment with metadata fields in the JSONL output) and reference it in
the function body, or if metadata is intentionally unused remove the parameter
entirely from format_json (or rename to _metadata to explicitly signal
intentional unusedness); update any callers of format_json accordingly.

In `@src/output/table.rs`:
- Around line 371-708: The file-level test module (#[cfg(test)] mod tests { ...
}) makes src/output/table.rs exceed the size limit; move this entire test module
into a separate test file (either create src/output/table/tests.rs and replace
the inline module with #[cfg(test)] mod tests; or create an integration test at
tests/output_table.rs). Copy all helper functions (make_test_string,
make_metadata) and all nested test modules (format_tags_tests,
truncate_string_tests, pad_string_tests, format_table_tests, column_width_tests)
into the new file, update imports so the tests bring the table module symbols
(e.g., format_tags, tag_to_display_string, truncate_string, pad_string,
format_table_with_mode, calculate_section_width, calculate_tags_width,
FoundString, Tag, OutputMetadata, Alignment, SECTION_COLUMN_WIDTH) into scope
via use crate::output::table::* or the appropriate paths, and ensure the
#[cfg(test)] gating remains in the new test file; finally remove the large
inline tests block from src/output/table.rs so the file falls under the size
limit.
- Around line 218-244: The docs promise limiting tags by MAX_TAGS_DISPLAY but
format_tags currently picks all tags with the highest boost then truncates by
width; change format_tags to instead sort tags by tag_boost_value (desc) and
take at most MAX_TAGS_DISPLAY items (use tag_to_display_string for conversion),
then join and only then truncate to TAGS_COLUMN_WIDTH if needed; keep references
to format_tags, tag_boost_value, tag_to_display_string, TAGS_COLUMN_WIDTH and
MAX_TAGS_DISPLAY so the selection matches the documented cap.
- Around line 36-38: The file imports RankingConfig from
crate::classification::ranking; change the import in src/output/table.rs to use
the crate re-export (use crate::RankingConfig;) and, if the re-export doesn't
exist, add a public re-export (pub use
crate::classification::ranking::RankingConfig;) in the crate root or the module
where re-exports are maintained so other modules follow the import policy;
update any references to RankingConfig accordingly.
- Around line 101-160: The TTY table rows can be broken by control chars; update
format_table_tty to sanitize embedded newlines/carriage-returns/tabs and pipe
characters before truncation and padding: add/ use a helper (e.g.,
sanitize_for_tty) and apply it to found_string.text, the tags_display returned
by format_tags, and section_display (the values used in truncate_string and
pad_string) so you replace "\n", "\r", "\t" with visible escapes (or safe
single-char placeholders) and escape '|' (e.g., "\\|") prior to calling
truncate_string, pad_string, and formatting the score text.
- Around line 310-342: truncate_string can produce output longer than max_len
when the first character is multi-byte because end_index may advance past
truncate_at; update truncate_string to ensure the final byte length (including
the "..." suffix) never exceeds max_len by using char_indices to find the
greatest byte boundary <= truncate_at (where truncate_at = max_len - 3) and if
no such boundary exists return "." repeated max_len (or handle very small
max_len as before); adjust the end_index logic to only set end_index to an idx
when idx <= truncate_at and avoid the fallback that advances to the next char
boundary beyond truncate_at so format!("{}...", &s[..end_index]) is guaranteed
to be <= max_len.

In `@src/output/yara.rs`:
- Around line 8-14: The function signature for format_yara uses unused-parameter
prefixes but both parameters are used; rename parameters from _strings and
_metadata to strings and metadata (update the signature in fn
format_yara(strings: &[FoundString], metadata: &OutputMetadata) ->
Result<String>), and update all references inside the function (e.g.,
metadata.generated_at, metadata.binary_name, and any use of strings) to remove
the underscores; keep all existing logic that computes timestamp,
base_rule_name, and rule_name unchanged aside from the parameter name updates.

In `@tests/output_json_integration.rs`:
- Around line 229-233: The test test_json_unicode_content is constructing a
Unicode fixture with an ASCII encoding; update the fixture to use Encoding::Utf8
so the non-ASCII string produced by make_string(...) is encoded correctly.
Locate the place where the test calls make_string or builds the fixture (within
test_json_unicode_content) and replace the Encoding::Ascii value with
Encoding::Utf8 (or ensure make_string/make_metadata receives Utf8) so the test
data matches the Unicode content invariants.

src/output/json.rs

src/output/table.rs

src/output/yara.rs

tests/output_json_integration.rs

coderabbitai

Actionable comments posted: 1

🤖 Fix all issues with AI agents

In @.claude/settings.json:
- Around line 2-5: The file .claude/settings.json (containing the enabledPlugins
object and keys like "commit@cc-marketplace" and
"superpowers@claude-plugins-official") appears to be a developer-local Claude AI
config and should not be committed; either remove it from version control by
adding the .claude/ directory to .gitignore and committing that change, or if
this is an intentional team-wide setting document its purpose and usage in
CONTRIBUTING.md and keep the file committed; update the repository accordingly
and remove the file from the commit history if necessary.

.claude/settings.json

Address issues identified during PR review: - Rename misleading underscore-prefixed parameters (_strings, _metadata) to strings, metadata since they are actively used - Return "CLOCK_ERROR" instead of "0" on timestamp failure to make errors clearly distinguishable from valid timestamps - Remove dead code: get_yara_modifier() function and its tests after inlining "ascii" constant for ASCII/UTF-8 encodings - Simplify utf16be_hex_string() using iterator chains with flat_map() - Add Tag import and simplify tag_name() function Add comprehensive unit tests for UTF-16 encoding functions: - escape_yara_unicode_literal: basic escapes, control chars, unicode passthrough, empty string - utf16be_hex_string: basic ASCII, empty string, non-ASCII BMP chars, surrogate pairs - escape_yara_string: additional control character coverage - with_generated_at builder method validation - Default timestamp fallback behavior Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

Signed-off-by: UncleSp1d3r <unclesp1d3r@evilbitlabs.io>

coderabbitai

Actionable comments posted: 2

🤖 Fix all issues with AI agents

In `@src/output/yara.rs`:
- Around line 61-69: The Utf16Le branch in the match (handling
Encoding::Utf16Le) incorrectly emits escaped bytes plus the YARA "wide" modifier
which fails for non-ASCII code points; add a new helper utf16le_hex_string that
encodes item.text as UTF-16LE bytes (use char.encode_utf16() and
to_le_bytes/collect into hex like utf16be_hex_string does) and change the
Encoding::Utf16Le arm to check item.text.is_ascii(): if ascii, keep the existing
escape_yara_unicode_literal + " wide" emission, otherwise call
utf16le_hex_string and push the hex form (strings_block.push_str with var_name
and hex) so non-ASCII UTF-16LE strings are emitted as correct hex sequences.
- Around line 16-29: The binary_name and timestamp are inserted raw into both
the comment headers and the meta field generated_at, allowing injection via
newlines; wrap both usages with escape_yara_string. Specifically, when building
the comment lines (currently format!("// Binary: {}\n", metadata.binary_name)
and format!("// Generated: {}\n\n", timestamp)) call
escape_yara_string(&metadata.binary_name) and escape_yara_string(&timestamp)
respectively, and likewise replace the generated_at meta interpolation
(format!("    generated_at = \"{}\"\n", timestamp)) with
escape_yara_string(&timestamp) so both comments and meta fields are escaped
consistently (description already uses escape_yara_string).

src/output/yara/mod.rs

coderabbitai

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

.coderabbit.yml (1)

352-358: Clarify the ASCII-only exception so it is enforceable.

Line 354 introduces an exception ("unless explicitly required"), but bullets 2-4 still prohibit all non-ASCII unconditionally, making the exception impossible and risking inconsistent enforcement. Align the bullets with the exception or remove it.

Proposed alignment

-                  Verify that no Unicode punctuation is introduced unless explicitly required:
-                  1. No emojis in code or documentation
-                  2. No em-dashes - use regular hyphens
-                  3. No smart quotes - use straight quotes
-                  4. No other non-ASCII characters in strings or comments
+                  Verify that no Unicode punctuation is introduced unless explicitly required:
+                  1. No emojis in code or documentation unless explicitly required
+                  2. No em-dashes - use regular hyphens unless explicitly required
+                  3. No smart quotes - use straight quotes unless explicitly required
+                  4. No other non-ASCII characters in strings or comments unless explicitly required (document rationale)

🤖 Fix all issues with AI agents

In `@AGENTS.md`:
- Line 9: The rule text under "ASCII only" contains typos: change "explicity" to
"explicitly" and "emjois" to "emojis" so the guideline reads clearly (e.g., "No
emojis, em-dashes, smart quotes, or Unicode punctuation (except when explicitly
testing or working with Unicode strings or emojis)"). Update the quoted rule
text "ASCII only" accordingly to preserve punctuation and intent.

AGENTS.md

Signed-off-by: UncleSp1d3r <unclesp1d3r@evilbitlabs.io>

Immediate fixes: - Fix failing doctests in extraction/mod.rs by wrapping in fn main() - Fix rustdoc warning in patterns/ip.rs by escaping [::1]:8080 - Fix O(n^2) algorithms in dedup.rs using HashSet for unique detection New features: - Add OutputFormatter trait for extensible output formatting - Add #[non_exhaustive] to OutputFormat enum for API stability - Add Hash derive to Encoding and StringSource enums Documentation: - Create CHANGELOG.md following Keep a Changelog format - Create ROADMAP.md documenting medium/long-term improvements - Create examples/ directory with basic_extraction, network_indicators, and output_formats examples The O(n^2) fixes include: - unique_sections detection now uses HashSet instead of Vec::contains - unique_sources detection now uses HashSet instead of Vec::contains - merge_tags now uses HashSet for deduplication - Grouping key simplified from format!("{:?}", encoding) to Encoding Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

chore(dependabot): update schedules to weekly for various ecosystems Signed-off-by: UncleSp1d3r <unclesp1d3r@evilbitlabs.io>

coderabbitai

Actionable comments posted: 8

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (5)

src/classification/patterns/ip.rs (1)
117-125: Consider using an if-else expression for immutability.

The mutable ip_text variable is only assigned once conditionally. A more idiomatic approach would eliminate the mut binding:
♻️ Proposed refactor
-    let mut ip_text = text;
-
-    // Check for bracketed format
-    if text.starts_with('[') {
-        // Strip port from the full text first (e.g., [::1]:8080 -> [::1])
-        let without_port = strip_port(text);
-        // Now extract the IPv6 from brackets
-        ip_text = strip_ipv6_brackets(without_port);
-    }
+    // Check for bracketed format
+    let ip_text = if text.starts_with('[') {
+        // Strip port from the full text first (e.g., [::1]:8080 -> [::1])
+        let without_port = strip_port(text);
+        // Now extract the IPv6 from brackets
+        strip_ipv6_brackets(without_port)
+    } else {
+        text
+    };
src/extraction/mod.rs (1)

1-978: File exceeds 500-600 line limit.

At 1548 lines, this file significantly exceeds the coding guideline limit of 500-600 lines. Per guidelines: "Keep Rust source files <=500-600 lines; split larger files into focused modules."

Consider splitting into:

mod.rs - trait definitions, re-exports, config (~200 lines)

basic.rs - BasicExtractor implementation (~400 lines)

helpers.rs - extract_ascii_utf8_strings, utility functions (~100 lines)

This would improve maintainability and align with project conventions.
src/lib.rs (2)
57-63: Remove non-ASCII status markers from crate docs

The status bullets contain emojis, which violate the ASCII-only rule for Rust and docs. As per coding guidelines.
Proposed change
-//! - [`container`]: Binary format detection and parsing (✅ Complete)
-//! - [`extraction`]: String extraction algorithms (✅ ASCII, UTF-16LE extraction and PE resources complete)
+//! - [`container`]: Binary format detection and parsing (complete)
+//! - [`extraction`]: String extraction algorithms (ASCII, UTF-16LE extraction and PE resources complete)
 //!   - ASCII extraction provides foundational encoding extraction as the reference implementation
 //!   - UTF-16LE extraction provides Windows PE binary string extraction with confidence scoring
-//! - [`classification`]: Semantic analysis and tagging (🚧 Types defined)
-//! - [`output`]: Result formatting (🚧 Interfaces ready)
-//! - [`types`]: Core data structures and error handling (✅ Complete)
+//! - [`classification`]: Semantic analysis and tagging (types defined)
+//! - [`output`]: Result formatting (interfaces ready)
+//! - [`types`]: Core data structures and error handling (complete)
1-5: Add crate-level forbid(unsafe_code) and deny(warnings) attributes, and remove emojis from documentation

The crate root is missing #![forbid(unsafe_code)] and #![deny(warnings)] attributes required by project policy. Additionally, documentation contains non-ASCII emoji characters (✅ and 🚧 on lines 57–62) which violate the ASCII-only guideline.
Proposed changes
+#![forbid(unsafe_code)]
+#![deny(warnings)]
+
 //! Stringy - A smarter alternative to the strings command
 //!
 //! Stringy leverages format-specific knowledge to distinguish meaningful strings
 //! from random garbage data in binary files.
Remove emoji status markers from lines 57–62:

Replace (✅ Complete) with (Complete)

Replace (🚧 Types defined) with (In Progress - Types defined)

Replace (🚧 Interfaces ready) with (In Progress - Interfaces ready)
src/extraction/dedup.rs (1)
120-131: Do not drop confidence and bonuses for below-threshold groups

Below-threshold groups currently return only the max original score, which removes the confidence boost and conflicts with the documented scoring formula, making scores incomparable across thresholds. If the threshold is meant to suppress bonuses, at least retain confidence (or apply the full formula consistently). As per coding guidelines.
Proposed change
-            let combined_score = if meets_threshold {
-                calculate_combined_score(&occurrences)
-            } else {
-                // For groups below threshold, use the maximum original score without bonuses
-                occurrences
-                    .iter()
-                    .map(|occ| occ.original_score)
-                    .max()
-                    .unwrap_or(0)
-            };
+            let combined_score = if meets_threshold {
+                calculate_combined_score(&occurrences)
+            } else {
+                // Keep confidence on the same scale even below the threshold
+                let base_score = occurrences
+                    .iter()
+                    .map(|occ| occ.original_score)
+                    .max()
+                    .unwrap_or(0);
+                let max_confidence = occurrences
+                    .iter()
+                    .map(|occ| occ.confidence)
+                    .fold(0.0f32, f32::max);
+                base_score + (max_confidence * 10.0) as i32
+            };

🤖 Fix all issues with AI agents

In `@CHANGELOG.md`:
- Around line 27-30: Update the "Changed" section in CHANGELOG.md to explicitly
document the breaking API rename: note that extract_utf16le_strings was renamed
to extract_utf16_strings (include both names and state it's a breaking change
for callers), add a short upgrade note instructing users to replace calls to
extract_utf16le_strings with extract_utf16_strings, and ensure this entry is
listed alongside the other "Changed" items so consumers see the migration step
when upgrading.

In `@CONTRIBUTING.md`:
- Around line 58-68: In CONTRIBUTING.md under the "## Tests" section remove the
duplicate insta snapshot guidance by keeping one concise sentence (e.g., "Use
insta snapshots for output verification when appropriate or when changing output
formatters") and delete the redundant line; also update the empty/unspecified
code fence under the "Run:" block to include a language specifier (e.g.,
```bash) so the `just test` snippet is properly highlighted and the file no
longer contains an empty/unspecified code block.

In `@examples/basic_extraction.rs`:
- Around line 53-71: The current truncation uses byte slicing
(&string.text[..47]) which can panic on multi-byte UTF-8; change to count
characters with string.text.chars().count() for the length check and perform
truncation with a character-aware approach like
string.text.chars().take(47).collect::<String>() when building the displayed
text (keep the else branch using string.text.clone()); update the condition and
the truncated branch in the loop over sorted_strings to use these char-aware
operations to avoid UTF-8 boundary panics.

In `@examples/network_indicators.rs`:
- Around line 8-12: Add re-exports for the container utilities in lib.rs and
update examples to import from the crate root: add a pub use
container::{create_parser, detect_format}; to lib.rs so create_parser and
detect_format are exported alongside extraction and types, then change the
example files (examples/network_indicators.rs, basic_extraction.rs,
output_formats.rs) to import create_parser and detect_format from stringy::
(crate root) instead of stringy::container to comply with the import guideline.

In `@examples/output_formats.rs`:
- Around line 48-65: Capture the full count of extracted strings before
truncation and pass it as total_strings to OutputMetadata::new: compute a
variable (e.g., full_count = sorted_strings.len()) immediately after sorting
(before calling .into_iter().take(50)), then use full_count for the
total_strings argument and top_strings.len() for filtered_strings when
constructing OutputMetadata (referencing sorted_strings, top_strings,
OutputMetadata::new, binary_name, output_format).
- Around line 10-15: Update the top-level imports in examples/output_formats.rs
to use crate-root re-exports for BasicExtractor, ExtractionConfig,
StringExtractor, OutputFormat, OutputMetadata, and format_output instead of
importing them from deep modules; keep create_parser and detect_format imported
from stringy::container. Concretely, replace
stringy::extraction::{BasicExtractor, ExtractionConfig, StringExtractor} and
stringy::output::{OutputFormat, OutputMetadata, format_output} with a single
import of those symbols from the crate root while leaving use
stringy::container::{create_parser, detect_format} unchanged.

In `@ROADMAP.md`:
- Around line 9-33: Update ROADMAP.md to use current, accurate line counts and
replace any non-ASCII punctuation: recalc the lines for src/extraction/mod.rs
(1548) and src/extraction/dedup.rs (838) and regenerate the table rows for all
listed files so the "Lines" and "Overage" columns match the repository's current
counts; scan and replace any em-dashes, smart quotes or other non-ASCII
characters with regular hyphens and straight quotes throughout the file to
ensure ASCII-only documentation (verify entries referencing
src/extraction/config.rs, src/extraction/trait.rs, src/extraction/basic.rs,
src/extraction/helpers.rs and the listed oversized files are updated).
- Around line 93-107: Update the roadmap section header line numbers from 93–107
to 143–165 and add a short safety note next to the memmap2 example documenting
the unsafe invariants: explain why Mmap::map is wrapped in unsafe, state that
the mapped file must not be truncated or modified for the lifetime of the Mmap,
that any mutable access to the underlying file must be avoided while the mapping
exists, and recommend validating file size/permissions before mapping; reference
the memmap2::Mmap type and the unsafe block (unsafe { Mmap::map(&file)? }) so
readers know exactly where the safety requirements apply.

coderabbitai · 2026-01-18T19:44:43Z

CHANGELOG.md

+### Changed
+- Repository renamed from StringyMcStringFace to Stringy
+- Improved YARA formatter code quality and test coverage
+- Clarified ASCII rule for Unicode handling in documentation


⚠️ Potential issue | 🟡 Minor

Document the extract_utf16le_strings to extract_utf16_strings API rename.

The UTF-16 extraction function was renamed in this PR. This is a breaking change that should be explicitly documented under "Changed" for users upgrading.

Proposed addition

### Changed - Repository renamed from StringyMcStringFace to Stringy - Improved YARA formatter code quality and test coverage - Clarified ASCII rule for Unicode handling in documentation +- Renamed `extract_utf16le_strings` to `extract_utf16_strings` for consistency

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

### Changed

- Repository renamed from StringyMcStringFace to Stringy

- Improved YARA formatter code quality and test coverage

- Clarified ASCII rule for Unicode handling in documentation

### Changed

- Repository renamed from StringyMcStringFace to Stringy

- Improved YARA formatter code quality and test coverage

- Clarified ASCII rule for Unicode handling in documentation

- Renamed `extract_utf16le_strings` to `extract_utf16_strings` for consistency

🤖 Prompt for AI Agents

In `@CHANGELOG.md` around lines 27 - 30, Update the "Changed" section in CHANGELOG.md to explicitly document the breaking API rename: note that extract_utf16le_strings was renamed to extract_utf16_strings (include both names and state it's a breaking change for callers), add a short upgrade note instructing users to replace calls to extract_utf16le_strings with extract_utf16_strings, and ensure this entry is listed alongside the other "Changed" items so consumers see the migration step when upgrading.

coderabbitai · 2026-01-18T19:44:44Z

CONTRIBUTING.md

+## Tests
+
+- Add or update tests for behavior changes.
+- Use insta snapshots for output verification when appropriate.
+- Integration tests live in tests/ and fixtures in tests/fixtures/.
+- Use insta snapshots for output verification when changing output formatters.
+
+Run:
+
+- `just test`
+


⚠️ Potential issue | 🟡 Minor

Duplicate guidance and missing language specifier.

Lines 61 and 63 contain duplicate text about insta snapshots:

Line 61: "Use insta snapshots for output verification when appropriate."

Line 63: "Use insta snapshots for output verification when changing output formatters."

Also, line 59 has an empty code block - consider adding a language specifier or removing it if unneeded.

Proposed fix

## Tests - Add or update tests for behavior changes. -- Use insta snapshots for output verification when appropriate. +- Use insta snapshots for output verification, especially when changing output formatters. - Integration tests live in tests/ and fixtures in tests/fixtures/. -- Use insta snapshots for output verification when changing output formatters. - -Run: - -- `just test` +- Run tests with `just test`

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

## Tests

- Add or update tests for behavior changes.

- Use insta snapshots for output verification when appropriate.

- Integration tests live in tests/ and fixtures in tests/fixtures/.

- Use insta snapshots for output verification when changing output formatters.

Run:

- `just test`

## Tests

- Add or update tests for behavior changes.

- Use insta snapshots for output verification, especially when changing output formatters.

- Integration tests live in tests/ and fixtures in tests/fixtures/.

- Run tests with `just test`

🧰 Tools

🪛 markdownlint-cli2 (0.18.1)

59-59: Fenced code blocks should have a language specified

(MD040, fenced-code-language)

🤖 Prompt for AI Agents

In `@CONTRIBUTING.md` around lines 58 - 68, In CONTRIBUTING.md under the "## Tests" section remove the duplicate insta snapshot guidance by keeping one concise sentence (e.g., "Use insta snapshots for output verification when appropriate or when changing output formatters") and delete the redundant line; also update the empty/unspecified code fence under the "Run:" block to include a language specifier (e.g., ```bash) so the `just test` snippet is properly highlighted and the file no longer contains an empty/unspecified code block.

coderabbitai · 2026-01-18T19:44:44Z

examples/basic_extraction.rs

+    println!("Top strings by score:");
+    println!("{:-<60}", "");
+    for string in sorted_strings.iter().take(20) {
+        let tags: Vec<_> = string.tags.iter().map(|t| format!("{:?}", t)).collect();
+        let tags_str = if tags.is_empty() {
+            String::new()
+        } else {
+            format!(" [{}]", tags.join(", "))
+        };
+        println!(
+            "{:4} | {:50}{}",
+            string.score,
+            if string.text.len() > 50 {
+                format!("{}...", &string.text[..47])
+            } else {
+                string.text.clone()
+            },
+            tags_str
+        );


⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

find . -name "basic_extraction.rs" -type f | head -20

Repository: EvilBit-Labs/Stringy

Length of output: 94

🏁 Script executed:

cat -n ./examples/basic_extraction.rs | sed -n '40,80p'

Repository: EvilBit-Labs/Stringy

Length of output: 1378

🏁 Script executed:

find . -name "types.rs" -type f | head -5

Repository: EvilBit-Labs/Stringy

Length of output: 78

🏁 Script executed:

cat -n ./src/types.rs | head -100

Repository: EvilBit-Labs/Stringy

Length of output: 3434

🏁 Script executed:

cat -n ./src/types.rs | sed -n '100,200p'

Repository: EvilBit-Labs/Stringy

Length of output: 3441

🏁 Script executed:

cat -n ./src/types.rs | sed -n '200,300p'

Repository: EvilBit-Labs/Stringy

Length of output: 5130

🏁 Script executed:

rg -A 5 "FoundString {" src/ | head -40

Repository: EvilBit-Labs/Stringy

Length of output: 153

🏁 Script executed:

rg "text:" src/ -A 3 -B 3 | head -60

Repository: EvilBit-Labs/Stringy

Length of output: 2456

🏁 Script executed:

rg "Encoding::Utf8" src/ -B 3 -A 3

Repository: EvilBit-Labs/Stringy

Length of output: 22424

Fix byte slicing to handle multi-byte UTF-8 characters

The condition if string.text.len() > 50 checks byte length, but &string.text[..47] performs byte slicing. This will panic when a multi-byte UTF-8 character spans byte boundaries. Since the tool actively extracts UTF-8 strings (including multi-byte characters), use character-aware truncation instead.

Proposed change

- println!( - "{:4} | {:50}{}", - string.score, - if string.text.len() > 50 { - format!("{}...", &string.text[..47]) - } else { - string.text.clone() - }, - tags_str - ); + let display_text = if string.text.chars().count() > 50 { + let truncated: String = string.text.chars().take(47).collect(); + format!("{}...", truncated) + } else { + string.text.clone() + }; + println!("{:4} | {:50}{}", string.score, display_text, tags_str);

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

println!("Top strings by score:");

println!("{:-<60}", "");

for string in sorted_strings.iter().take(20) {

let tags: Vec<_> = string.tags.iter().map(|t| format!("{:?}", t)).collect();

let tags_str = if tags.is_empty() {

String::new()

} else {

format!(" [{}]", tags.join(", "))

};

println!(

"{:4} | {:50}{}",

string.score,

if string.text.len() > 50 {

format!("{}...", &string.text[..47])

} else {

string.text.clone()

},

tags_str

);

println!("Top strings by score:");

println!("{:-<60}", "");

for string in sorted_strings.iter().take(20) {

let tags: Vec<_> = string.tags.iter().map(|t| format!("{:?}", t)).collect();

let tags_str = if tags.is_empty() {

String::new()

} else {

format!(" [{}]", tags.join(", "))

};

let display_text = if string.text.chars().count() > 50 {

let truncated: String = string.text.chars().take(47).collect();

format!("{}...", truncated)

} else {

string.text.clone()

};

println!("{:4} | {:50}{}", string.score, display_text, tags_str);

}

🤖 Prompt for AI Agents

In `@examples/basic_extraction.rs` around lines 53 - 71, The current truncation uses byte slicing (&string.text[..47]) which can panic on multi-byte UTF-8; change to count characters with string.text.chars().count() for the length check and perform truncation with a character-aware approach like string.text.chars().take(47).collect::<String>() when building the displayed text (keep the else branch using string.text.clone()); update the condition and the truncated branch in the loop over sorted_strings to use these char-aware operations to avoid UTF-8 boundary panics.

coderabbitai · 2026-01-18T19:44:44Z

examples/network_indicators.rs

+use std::env;
+use std::fs;
+use stringy::container::{create_parser, detect_format};
+use stringy::extraction::{BasicExtractor, ExtractionConfig, StringExtractor};
+use stringy::types::Tag;


⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

find . -name "lib.rs" -path "*/src/*" | head -5

Repository: EvilBit-Labs/Stringy

Length of output: 76

🏁 Script executed:

cat -n src/lib.rs | head -100

Repository: EvilBit-Labs/Stringy

Length of output: 4497

🏁 Script executed:

rg "create_parser|detect_format" src/lib.rs -A 2 -B 2

Repository: EvilBit-Labs/Stringy

Length of output: 448

🏁 Script executed:

ls -la src/container.rs 2>/dev/null && wc -l src/container.rs || echo "File not found"

Repository: EvilBit-Labs/Stringy

Length of output: 78

🏁 Script executed:

rg "^pub fn (create_parser|detect_format)" src/ -B 2 -A 5

Repository: EvilBit-Labs/Stringy

Length of output: 1145

🏁 Script executed:

grep -n "pub fn\|pub struct\|pub enum" src/container.rs | head -20

Repository: EvilBit-Labs/Stringy

Length of output: 113

🏁 Script executed:

find examples -name "*.rs" -type f

Repository: EvilBit-Labs/Stringy

Length of output: 150

🏁 Script executed:

head -15 examples/network_indicators.rs

Repository: EvilBit-Labs/Stringy

Length of output: 635

🏁 Script executed:

head -15 examples/basic_extraction.rs examples/output_formats.rs

Repository: EvilBit-Labs/Stringy

Length of output: 1197

Add re-exports to lib.rs and update example imports

The example violates the guideline: "Imports must be from stringy::extraction or stringy::types, not deeply nested paths - use re-exports from lib.rs." The create_parser and detect_format functions are not currently re-exported in lib.rs.

Update lib.rs to re-export these items from the container module alongside extraction and types:

pub use container::{create_parser, detect_format};

Then update this example (and basic_extraction.rs, output_formats.rs) to import from the crate root:

-use stringy::container::{create_parser, detect_format}; -use stringy::extraction::{BasicExtractor, ExtractionConfig, StringExtractor}; -use stringy::types::Tag; +use stringy::{create_parser, detect_format, BasicExtractor, ExtractionConfig, StringExtractor, Tag};

🤖 Prompt for AI Agents

In `@examples/network_indicators.rs` around lines 8 - 12, Add re-exports for the container utilities in lib.rs and update examples to import from the crate root: add a pub use container::{create_parser, detect_format}; to lib.rs so create_parser and detect_format are exported alongside extraction and types, then change the example files (examples/network_indicators.rs, basic_extraction.rs, output_formats.rs) to import create_parser and detect_format from stringy:: (crate root) instead of stringy::container to comply with the import guideline.

coderabbitai · 2026-01-18T19:44:44Z

examples/output_formats.rs

+use std::env;
+use std::fs;
+use stringy::container::{create_parser, detect_format};
+use stringy::extraction::{BasicExtractor, ExtractionConfig, StringExtractor};
+use stringy::output::{OutputFormat, OutputMetadata, format_output};
+


⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🏁 Script executed:

# First, find and examine lib.rs to see what's re-exported at crate root find . -name "lib.rs" -type f | head -5

Repository: EvilBit-Labs/Stringy

Length of output: 76

🏁 Script executed:

# Check the current state of examples/output_formats.rs around lines 10-15 cat -n examples/output_formats.rs | head -20

Repository: EvilBit-Labs/Stringy

Length of output: 989

🏁 Script executed:

# List the files in src to understand the structure ls -la src/ 2>/dev/null || find . -path "*/src/*.rs" -type f | head -20

Repository: EvilBit-Labs/Stringy

Length of output: 603

🏁 Script executed:

# Examine lib.rs to see what's re-exported at crate root cat -n src/lib.rs

Repository: EvilBit-Labs/Stringy

Length of output: 4497

🏁 Script executed:

# Also check if there's an output module and what it exports cat -n src/output/mod.rs 2>/dev/null || find src/output -name "*.rs" | xargs cat -n

Repository: EvilBit-Labs/Stringy

Length of output: 13904

🏁 Script executed:

# Check container module structure ls -la src/container/

Repository: EvilBit-Labs/Stringy

Length of output: 406

🏁 Script executed:

# Check what container module exports cat -n src/container/mod.rs

Repository: EvilBit-Labs/Stringy

Length of output: 2837

Align extraction and output imports with crate-root re-exports

The BasicExtractor, ExtractionConfig, StringExtractor, OutputFormat, OutputMetadata, and format_output are already re-exported from the crate root. Use those re-exports instead of deeply nested paths. However, create_parser and detect_format are not re-exported, so keep those imports from stringy::container.

Proposed change

-use stringy::container::{create_parser, detect_format}; -use stringy::extraction::{BasicExtractor, ExtractionConfig, StringExtractor}; -use stringy::output::{OutputFormat, OutputMetadata, format_output}; +use stringy::{ + create_parser, detect_format, // Still from container module (not re-exported) + BasicExtractor, ExtractionConfig, StringExtractor, + OutputFormat, OutputMetadata, format_output, +};

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

use std::env;

use std::fs;

use stringy::container::{create_parser, detect_format};

use stringy::extraction::{BasicExtractor, ExtractionConfig, StringExtractor};

use stringy::output::{OutputFormat, OutputMetadata, format_output};

use std::env;

use std::fs;

use stringy::{

BasicExtractor, ExtractionConfig, StringExtractor,

OutputFormat, OutputMetadata, format_output,

};

use stringy::container::{create_parser, detect_format};

🤖 Prompt for AI Agents

In `@examples/output_formats.rs` around lines 10 - 15, Update the top-level imports in examples/output_formats.rs to use crate-root re-exports for BasicExtractor, ExtractionConfig, StringExtractor, OutputFormat, OutputMetadata, and format_output instead of importing them from deep modules; keep create_parser and detect_format imported from stringy::container. Concretely, replace stringy::extraction::{BasicExtractor, ExtractionConfig, StringExtractor} and stringy::output::{OutputFormat, OutputMetadata, format_output} with a single import of those symbols from the crate root while leaving use stringy::container::{create_parser, detect_format} unchanged.

coderabbitai · 2026-01-18T19:44:44Z

examples/output_formats.rs

+    // Limit to top 50 strings for demonstration
+    let mut sorted_strings = strings;
+    sorted_strings.sort_by(|a, b| b.score.cmp(&a.score));
+    let top_strings: Vec<_> = sorted_strings.into_iter().take(50).collect();
+
+    // Create output metadata
+    let binary_name = std::path::Path::new(path)
+        .file_name()
+        .and_then(|n| n.to_str())
+        .unwrap_or("unknown")
+        .to_string();
+
+    let metadata = OutputMetadata::new(
+        binary_name,
+        output_format,
+        top_strings.len(),
+        top_strings.len(),
+    );


⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🏁 Script executed:

# First, let's find and read the file fd examples/output_formats.rs

Repository: EvilBit-Labs/Stringy

Length of output: 46

🏁 Script executed:

# Read the relevant section of the file head -n 75 examples/output_formats.rs | tail -n +40

Repository: EvilBit-Labs/Stringy

Length of output: 1023

🏁 Script executed:

# Find OutputMetadata definition to understand the parameters rg -A 10 "struct OutputMetadata" --type rs

Repository: EvilBit-Labs/Stringy

Length of output: 90

🏁 Script executed:

# Search for OutputMetadata struct definition rg "struct OutputMetadata" --type-add 'rust:*.rs'

Repository: EvilBit-Labs/Stringy

Length of output: 203

🏁 Script executed:

# Also search for impl OutputMetadata::new rg -A 15 "impl OutputMetadata" --type-add 'rust:*.rs'

Repository: EvilBit-Labs/Stringy

Length of output: 750

🏁 Script executed:

# Find where OutputMetadata is defined fd -e rs -type f | head -20

Repository: EvilBit-Labs/Stringy

Length of output: 234

🏁 Script executed:

# Let's search more broadly rg "OutputMetadata" -A 5 -B 5

Repository: EvilBit-Labs/Stringy

Length of output: 25073

Capture full extracted count before truncation to accurately reflect metadata

The total_strings field should represent all extracted strings before display limits are applied. Currently both total_strings and filtered_strings are set to the truncated count, making it impossible to distinguish between total extractions and displayed results.

Capture sorted_strings.len() before the .take(50) truncation and pass that as total_strings to OutputMetadata::new().

Proposed change

let mut sorted_strings = strings; sorted_strings.sort_by(|a, b| b.score.cmp(&a.score)); + let total_strings = sorted_strings.len(); let top_strings: Vec<_> = sorted_strings.into_iter().take(50).collect(); let metadata = OutputMetadata::new( binary_name, output_format, - top_strings.len(), + total_strings, top_strings.len(), );

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

// Limit to top 50 strings for demonstration

let mut sorted_strings = strings;

sorted_strings.sort_by(|a, b| b.score.cmp(&a.score));

let top_strings: Vec<_> = sorted_strings.into_iter().take(50).collect();

// Create output metadata

let binary_name = std::path::Path::new(path)

.file_name()

.and_then(|n| n.to_str())

.unwrap_or("unknown")

.to_string();

let metadata = OutputMetadata::new(

binary_name,

output_format,

top_strings.len(),

top_strings.len(),

);

// Limit to top 50 strings for demonstration

let mut sorted_strings = strings;

sorted_strings.sort_by(|a, b| b.score.cmp(&a.score));

let total_strings = sorted_strings.len();

let top_strings: Vec<_> = sorted_strings.into_iter().take(50).collect();

// Create output metadata

let binary_name = std::path::Path::new(path)

.file_name()

.and_then(|n| n.to_str())

.unwrap_or("unknown")

.to_string();

let metadata = OutputMetadata::new(

binary_name,

output_format,

total_strings,

top_strings.len(),

);

🤖 Prompt for AI Agents

In `@examples/output_formats.rs` around lines 48 - 65, Capture the full count of extracted strings before truncation and pass it as total_strings to OutputMetadata::new: compute a variable (e.g., full_count = sorted_strings.len()) immediately after sorting (before calling .into_iter().take(50)), then use full_count for the total_strings argument and top_strings.len() for filtered_strings when constructing OutputMetadata (referencing sorted_strings, top_strings, OutputMetadata::new, binary_name, output_format).

ROADMAP.md

coderabbitai · 2026-01-18T19:44:44Z

ROADMAP.md

+#### 12. Add memory mapping for large files
+**Priority:** High
+**Current state:** Entire file is loaded into memory
+**Impact:** Processing 1GB+ binaries requires 1GB+ RAM
+**Recommendation:** Use `memmap2` crate for memory-mapped file access
+
+```rust
+// Recommended approach
+use memmap2::Mmap;
+use std::fs::File;
+
+let file = File::open(path)?;
+let mmap = unsafe { Mmap::map(&file)? };
+let data: &[u8] = &mmap;
+```


⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🏁 Script executed:

# First, locate and examine the ROADMAP.md file find . -name "ROADMAP.md" -type f | head -5

Repository: EvilBit-Labs/Stringy

Length of output: 76

🏁 Script executed:

# Check if ROADMAP.md exists and read the relevant section if [ -f "ROADMAP.md" ]; then echo "=== Lines 93-107 of ROADMAP.md ===" sed -n '93,107p' ROADMAP.md else echo "ROADMAP.md not found in repository root" # Search for it in subdirectories find . -name "ROADMAP.md" -type f 2>/dev/null fi

Repository: EvilBit-Labs/Stringy

Length of output: 375

🏁 Script executed:

# Check the total line count to confirm the section exists if [ -f "ROADMAP.md" ]; then wc -l ROADMAP.md fi

Repository: EvilBit-Labs/Stringy

Length of output: 78

🏁 Script executed:

# Search for memmap2 in ROADMAP.md grep -n "memmap2\|memory mapping\|memory-mapped" ROADMAP.md

Repository: EvilBit-Labs/Stringy

Length of output: 208

🏁 Script executed:

# Search for the heading mentioned in the review grep -n "Add memory mapping for large files" ROADMAP.md

Repository: EvilBit-Labs/Stringy

Length of output: 111

🏁 Script executed:

# Get a broader view of the file structure grep -n "^#### " ROADMAP.md | head -20

Repository: EvilBit-Labs/Stringy

Length of output: 1067

🏁 Script executed:

# Read the section containing memmap2 (around line 143) sed -n '143,165p' ROADMAP.md

Repository: EvilBit-Labs/Stringy

Length of output: 528

🏁 Script executed:

# Check the exact content without the code fence markers - verify no safety docs exist sed -n '143,165p' ROADMAP.md | grep -i "safety\|invariant\|mutable\|truncat"

Repository: EvilBit-Labs/Stringy

Length of output: 46

🏁 Script executed:

# Check for ASCII compliance in ROADMAP.md # Look for non-ASCII characters, em-dashes, smart quotes, emojis python3 << 'EOF' import re with open('ROADMAP.md', 'r', encoding='utf-8') as f: content = f.read() lines = content.split('\n') # Check for common non-ASCII issues issues = [] for i, line in enumerate(lines, 1): # Look for em-dashes (—), en-dashes (–), smart quotes (" "), ellipsis (…), emojis if '—' in line: issues.append(f"Line {i}: em-dash found") if '"' in line or '"' in line: issues.append(f"Line {i}: smart quote found") if '…' in line: issues.append(f"Line {i}: ellipsis found") # Check for non-ASCII characters (excluding common technical symbols) try: line.encode('ascii') except UnicodeEncodeError: # Re-check to identify specific problematic chars for j, char in enumerate(line): if ord(char) > 127: issues.append(f"Line {i}, position {j}: non-ASCII character '{char}' (U+{ord(char):04X})") if issues: for issue in issues[:20]: # Show first 20 print(issue) if len(issues) > 20: print(f"... and {len(issues) - 20} more issues") else: print("✓ No ASCII violations found in ROADMAP.md") EOF

Repository: EvilBit-Labs/Stringy

Length of output: 119

🌐 Web query:

memmap2 crate safety invariants unsafe Mmap::map documentation

💡 Result:

Key documented safety invariants for memmap2's unsafe Mmap::map:

The file-backed constructors (including Mmap::map) are unsafe because accessing the map can cause Undefined Behavior if the underlying file is modified while the map is live (in‑process or by another process). You must ensure the file cannot be (or will not be) changed while the mapping is used (e.g., via permissions, locks, or process-private/unlinked files). [1]

Mmap is Sync + Send and the Mmap handle is independent of the File used to create it (the file can be dropped after mapping); the memory is unmapped when the Mmap is dropped. [1][2]

Mmap::map is equivalent to MmapOptions::new().map(file) and its safety section refers to the crate-level/type-level docs for the exact rationale for marking it unsafe. [1]

References:

memmap2 Mmap docs. [1]

memmap2 source/type-level docs (crate docs in lib.rs). [2]

Correct line numbers to 143–165 and add safety documentation for the unsafe memmap2 usage.

The code example uses unsafe { Mmap::map(...) } without documenting the safety invariants. Per memmap2's documentation, the mapping is unsafe because the underlying file can be modified or truncated during use, causing undefined behavior. Add a note explaining the key safety requirements:

Proposed safety documentation

let file = File::open(path)?; let mmap = unsafe { Mmap::map(&file)? }; let data: &[u8] = &mmap; + +Safety: `Mmap::map` is unsafe because the file can be modified or truncated while mapped, +causing undefined behavior. Ensure the file remains unchanged during the mapping's lifetime +(via file permissions, locks, or process-private files), avoid mapping files subject to +concurrent modification, and validate file length if you rely on stable boundaries.

🤖 Prompt for AI Agents

In `@ROADMAP.md` around lines 93 - 107, Update the roadmap section header line numbers from 93–107 to 143–165 and add a short safety note next to the memmap2 example documenting the unsafe invariants: explain why Mmap::map is wrapped in unsafe, state that the mapped file must not be truncated or modified for the lifetime of the Mmap, that any mutable access to the underlying file must be avoided while the mapping exists, and recommend validating file size/permissions before mapping; reference the memmap2::Mmap type and the unsafe block (unsafe { Mmap::map(&file)? }) so readers know exactly where the safety requirements apply.

Split the oversized table.rs (708 lines) into a module directory with four focused files, all under the 500-line project limit: - table/mod.rs (120 lines): Public API, constants, entry points - table/tty.rs (220 lines): TTY table rendering with column alignment - table/plain.rs (96 lines): Plain text output for piping - table/formatting.rs (326 lines): Shared utilities (truncate, pad, format_tags) This addresses the ticket acceptance criterion requiring all files to be under 500 lines. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

coderabbitai

Actionable comments posted: 4

🤖 Fix all issues with AI agents

In `@src/output/table/formatting.rs`:
- Around line 6-8: The import of RankingConfig uses a deep module path; update
the import to use the crate re-export (e.g., replace the deep path with the
re-exported symbol like crate::RankingConfig) or add a re-export for
RankingConfig in the crate root (lib.rs) if it’s missing; locate the import
statement in formatting.rs that references RankingConfig and either change it to
use crate::RankingConfig or add pub use
crate::classification::ranking::RankingConfig; in either case ensure other code
compiles against the new re-exported path.
- Around line 121-153: The truncate_string function can choose a character
boundary that yields an output longer than max_len when the first char is
multibyte; update the logic that computes end_index (based on truncate_at and
s.char_indices()) so that you only include an extra character if its byte index
is <= truncate_at (i.e., find the last char boundary <= truncate_at and use
that), and remove the fallback that unconditionally advances to include the
second char; if no valid boundary exists leave end_index at 0 so the result
becomes "..." (which will not exceed max_len). Reference: truncate_string,
truncate_at, end_index, and the char_indices() iteration.

In `@src/output/table/mod.rs`:
- Around line 29-31: Update the module doc comment in src/output/table/mod.rs so
the "Tags" bullet reflects actual behavior: state that tags are the top
(highest-boost) 2–3 tags (not the first tags), comma-separated, and each tag is
truncated to 20 characters; keep the rest of the bullets (String and Score)
unchanged and reference the same doc block around the "Tags" line that currently
describes tag selection.

In `@src/output/table/tty.rs`:
- Around line 20-75: format_table_tty currently uses FoundString.text directly
when building rows which allows embedded CR/LF and tabs to break the TTY table;
before calling truncate_string and pad_string inside the loop, sanitize
found_string.text by normalizing or removing control whitespace (replace '\r'
and '\n' with a single space and convert tabs to a space, optionally collapse
consecutive spaces) and use that sanitized string for truncate_string,
pad_string, and for calculating tags/section widths if needed (references:
format_table_tty, FoundString.text, truncate_string, pad_string,
calculate_section_width, calculate_tags_width, format_tags).

coderabbitai · 2026-01-18T20:04:30Z

src/output/table/formatting.rs

+use crate::classification::ranking::RankingConfig;
+use crate::types::Tag;
+


🛠️ Refactor suggestion | 🟠 Major

Align RankingConfig import with crate re-export policy.

Line 6 imports RankingConfig via a deep module path. Please use the crate re-export (or add it if missing) to comply with the import policy. As per coding guidelines, align import paths via crate re-exports.

🤖 Prompt for AI Agents

In `@src/output/table/formatting.rs` around lines 6 - 8, The import of RankingConfig uses a deep module path; update the import to use the crate re-export (e.g., replace the deep path with the re-exported symbol like crate::RankingConfig) or add a re-export for RankingConfig in the crate root (lib.rs) if it’s missing; locate the import statement in formatting.rs that references RankingConfig and either change it to use crate::RankingConfig or add pub use crate::classification::ranking::RankingConfig; in either case ensure other code compiles against the new re-exported path.

src/output/table/formatting.rs

coderabbitai · 2026-01-18T20:04:30Z

src/output/table/mod.rs

+//! - **String**: Up to 60 characters, truncated with `...` if longer
+//! - **Tags**: First 2-3 tags, comma-separated, max 20 characters
+//! - **Score**: Right-aligned integer score


⚠️ Potential issue | 🟡 Minor

Docs mismatch tag selection behavior.

Line 30 states tags show the first 2-3 tags, but the implementation selects highest boost tags with truncation. Update the doc to match actual behavior.

Proposed fix

-//! - **Tags**: First 2-3 tags, comma-separated, max 20 characters +//! - **Tags**: Highest priority tags (ties), comma-separated, max 20 characters

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

//! - **String**: Up to 60 characters, truncated with `...` if longer

//! - **Tags**: First 2-3 tags, comma-separated, max 20 characters

//! - **Score**: Right-aligned integer score

//! - **String**: Up to 60 characters, truncated with `...` if longer

//! - **Tags**: Highest priority tags (ties), comma-separated, max 20 characters

//! - **Score**: Right-aligned integer score

🤖 Prompt for AI Agents

In `@src/output/table/mod.rs` around lines 29 - 31, Update the module doc comment in src/output/table/mod.rs so the "Tags" bullet reflects actual behavior: state that tags are the top (highest-boost) 2–3 tags (not the first tags), comma-separated, and each tag is truncated to 20 characters; keep the rest of the bullets (String and Score) unchanged and reference the same doc block around the "Tags" line that currently describes tag selection.

src/output/table/tty.rs

Fix two issues in YARA output formatter: 1. UTF-16LE non-ASCII handling: The `wide` modifier only works for ASCII characters. Non-ASCII UTF-16LE strings now use hex encoding instead of the incorrect `wide` modifier approach. - Add `utf16le_hex_string` helper (mirrors `utf16be_hex_string`) - Branch on `is_ascii()` to choose appropriate encoding 2. Injection prevention: Escape `binary_name` and `timestamp` in comment headers and meta fields to prevent newline/quote injection attacks. Add 8 new tests covering: - UTF-16LE hex string encoding (basic, empty, non-ASCII, surrogate pairs) - ASCII UTF-16LE still uses wide modifier - Non-ASCII UTF-16LE uses hex encoding - Binary name injection escaping - Timestamp injection escaping Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

coderabbitai

Actionable comments posted: 3

🤖 Fix all issues with AI agents

In `@ROADMAP.md`:
- Around line 11-12: The bold-only metadata lines (e.g., "**Priority:** High
**Current state:** 1542 lines (exceeds 500-line project limit by 1042 lines)
**Files affected:** `src/extraction/mod.rs`") are being treated as emphasis
headings by markdownlint MD036; replace these bold-only lines with either a
Markdown list or a Markdown table (for example, convert to "- Priority: High",
"- Current state: 1542 lines..." and "- Files affected: src/extraction/mod.rs"
or a two-column table) and apply the same pattern consistently to every other
metadata block in ROADMAP.md so no line consists solely of bold/emphasis markup.

In `@src/output/yara.rs`:
- Around line 266-536: The module-level #[cfg(test)] mod tests block (containing
test functions like test_sanitize_rule_name, test_escape_yara_string,
test_utf16be_hex_string_*, etc.) makes the source file exceed the 500-line
guideline; extract that entire tests block into a separate test module (e.g.,
yara_tests) so the main module only contains implementation code. Create a new
test file/module containing the same use imports and test functions, keep the
helper fns (make_metadata, make_string) inside that test module, and remove the
#[cfg(test)] mod tests block from this file; ensure the moved tests still
reference the same public functions (sanitize_rule_name, escape_yara_string,
escape_yara_unicode_literal, utf16be_hex_string, utf16le_hex_string,
format_yara, group_strings_by_tag) so they compile unchanged.
- Around line 251-264: Add a short doc comment above the group_strings_by_tag
function explaining that when a FoundString has multiple tags only the first tag
(via tag_name on the first entry) is used for grouping and that entries with no
tags are grouped under "untagged"; reference the function name
group_strings_by_tag, the FoundString type, and the tag_name helper in the
comment so future maintainers understand this deliberate simplification.

coderabbitai · 2026-01-18T20:20:48Z

ROADMAP.md

+**Priority:** High **Current state:** 1542 lines (exceeds 500-line project limit by 1042 lines) **Files affected:** `src/extraction/mod.rs`
+


⚠️ Potential issue | 🟡 Minor

Avoid bold-only lines to satisfy MD036.

Markdownlint flags these lines as emphasis used as a heading. Prefer a list (or table) for the metadata and apply the same pattern to the other items.

Proposed fix (apply consistently)

-**Priority:** High **Current state:** 1542 lines (exceeds 500-line project limit by 1042 lines) **Files affected:** `src/extraction/mod.rs` +- Priority: High +- Current state: 1542 lines (exceeds 500-line project limit by 1042 lines) +- Files affected: `src/extraction/mod.rs`

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

**Priority:** High **Current state:** 1542 lines (exceeds 500-line project limit by 1042 lines) **Files affected:** `src/extraction/mod.rs`

- Priority: High

- Current state: 1542 lines (exceeds 500-line project limit by 1042 lines)

- Files affected: `src/extraction/mod.rs`

🧰 Tools

🪛 markdownlint-cli2 (0.18.1)

11-11: Emphasis used instead of a heading

(MD036, no-emphasis-as-heading)

🤖 Prompt for AI Agents

In `@ROADMAP.md` around lines 11 - 12, The bold-only metadata lines (e.g., "**Priority:** High **Current state:** 1542 lines (exceeds 500-line project limit by 1042 lines) **Files affected:** `src/extraction/mod.rs`") are being treated as emphasis headings by markdownlint MD036; replace these bold-only lines with either a Markdown list or a Markdown table (for example, convert to "- Priority: High", "- Current state: 1542 lines..." and "- Files affected: src/extraction/mod.rs" or a two-column table) and apply the same pattern consistently to every other metadata block in ROADMAP.md so no line consists solely of bold/emphasis markup.

coderabbitai · 2026-01-18T20:20:48Z

src/output/yara/mod.rs

+fn group_strings_by_tag(strings: &[FoundString]) -> BTreeMap<String, Vec<&FoundString>> {
+    let mut grouped: BTreeMap<String, Vec<&FoundString>> = BTreeMap::new();
+
+    for item in strings {
+        let tag = item
+            .tags
+            .first()
+            .map(|tag| tag_name(tag).to_string())
+            .unwrap_or_else(|| "untagged".to_string());
+        grouped.entry(tag).or_default().push(item);
+    }
+
+    grouped
+}


🧹 Nitpick | 🔵 Trivial

Consider documenting single-tag grouping behavior.

The function uses only the first tag when a string has multiple tags. This is a reasonable simplification for YARA output, but consider adding a brief doc comment clarifying this behavior so future maintainers understand the design choice.

Suggested documentation

+/// Groups strings by their primary tag (first tag only). +/// +/// Strings with no tags are grouped under "untagged". +/// If a string has multiple tags, only the first tag is used for grouping. fn group_strings_by_tag(strings: &[FoundString]) -> BTreeMap<String, Vec<&FoundString>> {

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

fn group_strings_by_tag(strings: &[FoundString]) -> BTreeMap<String, Vec<&FoundString>> {

let mut grouped: BTreeMap<String, Vec<&FoundString>> = BTreeMap::new();

for item in strings {

let tag = item

.tags

.first()

.map(|tag| tag_name(tag).to_string())

.unwrap_or_else(|| "untagged".to_string());

grouped.entry(tag).or_default().push(item);

}

grouped

}

/// Groups strings by their primary tag (first tag only).

///

/// Strings with no tags are grouped under "untagged".

/// If a string has multiple tags, only the first tag is used for grouping.

fn group_strings_by_tag(strings: &[FoundString]) -> BTreeMap<String, Vec<&FoundString>> {

let mut grouped: BTreeMap<String, Vec<&FoundString>> = BTreeMap::new();

for item in strings {

let tag = item

.tags

.first()

.map(|tag| tag_name(tag).to_string())

.unwrap_or_else(|| "untagged".to_string());

grouped.entry(tag).or_default().push(item);

}

grouped

}

🤖 Prompt for AI Agents

In `@src/output/yara.rs` around lines 251 - 264, Add a short doc comment above the group_strings_by_tag function explaining that when a FoundString has multiple tags only the first tag (via tag_name on the first entry) is used for grouping and that entries with no tags are grouped under "untagged"; reference the function name group_strings_by_tag, the FoundString type, and the tag_name helper in the comment so future maintainers understand this deliberate simplification.

coderabbitai · 2026-01-18T20:20:48Z

src/output/yara/mod.rs

+        let metadata = OutputMetadata::new("test.bin".to_string(), OutputFormat::Yara, 0, 0);
+        // Note: generated_at is None
+        let output = format_yara(&[], &metadata).expect("Formatting should succeed");
+
+        // Should contain a timestamp in the generated_at field
+        assert!(output.contains("generated_at = \""));
+        // Timestamp should be numeric (or CLOCK_ERROR in exceptional cases)
+        assert!(
+            output.contains("generated_at = \"1")
+                || output.contains("generated_at = \"CLOCK_ERROR"),
+            "Timestamp should be numeric or CLOCK_ERROR"
+        );
+    }
+
+    #[test]
+    fn test_utf16le_hex_string_basic() {
+        // Basic ASCII - should be little-endian (ASCII byte followed by 00)
+        assert_eq!(utf16le_hex_string("A"), "{ 41 00 }");
+        assert_eq!(utf16le_hex_string("AB"), "{ 41 00 42 00 }");
+    }
+
+    #[test]
+    fn test_utf16le_hex_string_empty() {
+        assert_eq!(utf16le_hex_string(""), "{ }");
+    }
+
+    #[test]
+    fn test_utf16le_hex_string_non_ascii() {
+        // Non-ASCII Unicode (BMP) - Chinese character U+4E2D
+        let chinese = utf16le_hex_string("\u{4E2D}");
+        assert_eq!(chinese, "{ 2d 4e }");
+    }
+
+    #[test]
+    fn test_utf16le_hex_string_surrogate_pair() {
+        // Character requiring surrogate pair (outside BMP) - emoji U+1F600
+        let emoji = utf16le_hex_string("\u{1F600}");
+        // Should produce surrogate pair: 3D D8 00 DE (little-endian)
+        assert_eq!(emoji, "{ 3d d8 00 de }");
+    }
+
+    #[test]
+    fn test_utf16le_ascii_uses_wide_modifier() {
+        // ASCII UTF-16LE should use "wide" modifier
+        let mut string = make_string("test");
+        string.encoding = Encoding::Utf16Le;
+        let output = format_yara(&[string], &make_metadata()).expect("Formatting should succeed");
+        assert!(
+            output.contains("wide"),
+            "ASCII UTF-16LE should use wide modifier"
+        );
+        assert!(output.contains("\"test\""));
+    }
+
+    #[test]
+    fn test_utf16le_non_ascii_uses_hex() {
+        // Non-ASCII UTF-16LE should use hex string, not wide modifier
+        let mut string = make_string("\u{4E2D}\u{6587}");
+        string.encoding = Encoding::Utf16Le;
+        let output = format_yara(&[string], &make_metadata()).expect("Formatting should succeed");
+        assert!(
+            !output.contains("wide"),
+            "Non-ASCII UTF-16LE should not use wide modifier"
+        );
+        assert!(
+            output.contains("{ 2d 4e 87 65 }"),
+            "Non-ASCII UTF-16LE should use hex encoding"
+        );
+    }
+
+    #[test]
+    fn test_binary_name_injection_escaped_in_comments() {
+        // Binary name with newlines should be escaped in comments
+        let mut metadata = make_metadata();
+        metadata.binary_name = "evil\nname".to_string();
+        let output = format_yara(&[], &metadata).expect("Formatting should succeed");
+        // Should contain escaped newline, not literal
+        assert!(
+            output.contains("evil\\nname"),
+            "Newlines in binary_name should be escaped"
+        );
+        assert!(
+            !output.contains("evil\nname"),
+            "Literal newlines should not appear"
+        );
+    }
+
+    #[test]
+    fn test_timestamp_injection_escaped_in_meta() {
+        // Timestamp with special characters should be escaped
+        let mut metadata = make_metadata();
+        metadata.generated_at = Some("2024\"\n//attack".to_string());
+        let output = format_yara(&[], &metadata).expect("Formatting should succeed");
+        // Should contain escaped characters
+        assert!(
+            output.contains("2024\\\"\\n//attack"),
+            "Special chars in timestamp should be escaped"
+        );
+    }
+}


🧹 Nitpick | 🔵 Trivial

Consider extracting tests to reduce file length.

The file is 536 lines, exceeding the 500-line guideline. The tests (~270 lines) could be moved to a separate tests/ integration test file or yara_tests.rs submodule. The main module would then be ~266 lines, well within limits.

This is a minor concern since the tests provide good coverage and the overage is small.

🤖 Prompt for AI Agents

In `@src/output/yara.rs` around lines 266 - 536, The module-level #[cfg(test)] mod tests block (containing test functions like test_sanitize_rule_name, test_escape_yara_string, test_utf16be_hex_string_*, etc.) makes the source file exceed the 500-line guideline; extract that entire tests block into a separate test module (e.g., yara_tests) so the main module only contains implementation code. Create a new test file/module containing the same use imports and test functions, keep the helper fns (make_metadata, make_string) inside that test module, and remove the #[cfg(test)] mod tests block from this file; ensure the moved tests still reference the same public functions (sanitize_rule_name, escape_yara_string, escape_yara_unicode_literal, utf16be_hex_string, utf16le_hex_string, format_yara, group_strings_by_tag) so they compile unchanged.

Signed-off-by: UncleSp1d3r <unclesp1d3r@evilbitlabs.io>

coderabbitai

Actionable comments posted: 3

🤖 Fix all issues with AI agents

In `@TESTING_ANALYSIS.md`:
- Around line 17-18: Replace the bolded pseudo-headings (e.g., "**Coverage by
Module**" and the other bolded titles found elsewhere) with proper Markdown ATX
headings (use ### or #### as appropriate for the document hierarchy) so they
appear in the document outline; search for the bold patterns like "**...**" used
as section titles and convert them to heading lines (for example change
"**Coverage by Module**" to "### Coverage by Module") keeping the text identical
except for removing the bold markers and adding the leading hashes.
- Around line 7-12: Update the hard-coded numeric metrics block (the lines
containing "**Total Tests**", "**Test Pass Rate**", "**Test Coverage**", "**Test
Modules**", and "**Fixtures**" in TESTING_ANALYSIS.md) to include a "Last
verified" date and the CI job or command used to produce these numbers; also
apply the same change to the repeated metrics around lines 472-477.
Specifically, append a short line like "Last verified: YYYY-MM-DD — generated
by: <CI job name or command>" immediately under the metrics so future readers
know when and how the counts were produced.
- Around line 67-79: The fenced code blocks in TESTING_ANALYSIS.md lack language
identifiers causing markdown lint and readability issues; update each
triple-backtick fence to include the correct language (e.g., ```text for
file/path listings, ```rust for Rust snippets) for all examples referencing
extraction::StringExtractor, extraction::BasicExtractor, the main function
examples (fn main / fn main() -> Result<(), Box<dyn std::error::Error>>), and
the test snippets (tests/test_deduplication_performance.rs,
tests/test_extraction_edge_cases.rs); ensure every opened ``` is replaced with
the appropriate ```rust or ```text and that the Rust examples reflect the
corrected function signature and Ok(()) where suggested.

coderabbitai · 2026-01-18T20:37:53Z

TESTING_ANALYSIS.md

+- **Total Tests**: 535 tests (280 unit + 219 integration + 36 ignored/doctest)
+- **Test Pass Rate**: 98.9% (529 passed, 6 failed/ignored)
+- **Test Coverage**: 6,106 test lines vs 14,138 source lines (43% ratio)
+- **Test Modules**: 24 modules with unit tests
+- **Fixtures**: 5 binary fixtures (ELF, Mach-O, PE with/without resources)
+


⚠️ Potential issue | 🟡 Minor

Add a timestamp or source for the numeric metrics.

Hard-coded counts, pass rates, and timings will go stale. Add a "Last verified" date and the CI job or command that produced the numbers.

Proposed fix

-### Overall Test Health: STRONG with Minor Gaps +### Overall Test Health: STRONG with Minor Gaps +Last verified: 2026-01-18 (source: CI job "tests", command: `cargo test --all`) - **Total Tests**: 535 tests (280 unit + 219 integration + 36 ignored/doctest) - **Test Pass Rate**: 98.9% (529 passed, 6 failed/ignored) - **Test Coverage**: 6,106 test lines vs 14,138 source lines (43% ratio) - **Test Modules**: 24 modules with unit tests - **Fixtures**: 5 binary fixtures (ELF, Mach-O, PE with/without resources)

-**Test Suite Speed**: FAST +**Test Suite Speed**: FAST +Last verified: 2026-01-18 (source: CI job "tests", command: `cargo test --all`)

Also applies to: 472-477

🧰 Tools

🪛 markdownlint-cli2 (0.18.1)

11-11: Emphasis used instead of a heading

(MD036, no-emphasis-as-heading)

🤖 Prompt for AI Agents

In `@TESTING_ANALYSIS.md` around lines 7 - 12, Update the hard-coded numeric metrics block (the lines containing "**Total Tests**", "**Test Pass Rate**", "**Test Coverage**", "**Test Modules**", and "**Fixtures**" in TESTING_ANALYSIS.md) to include a "Last verified" date and the CI job or command used to produce these numbers; also apply the same change to the repeated metrics around lines 472-477. Specifically, append a short line like "Last verified: YYYY-MM-DD — generated by: <CI job name or command>" immediately under the metrics so future readers know when and how the counts were produced.

coderabbitai · 2026-01-18T20:37:54Z

TESTING_ANALYSIS.md

+**Coverage by Module**:
+


⚠️ Potential issue | 🟡 Minor

Replace bolded pseudo-headings with proper Markdown headings.

These lines are styled as headings using emphasis, which breaks the document outline. Use ### or #### headings instead.

Proposed fix

-**Coverage by Module**: +#### Coverage by Module -**Test Files**: +#### Test Files -**Snapshot Testing**: Using `insta` for output validation +#### Snapshot Testing +Using `insta` for output validation -**Test Fixtures**: Well-organized in `tests/fixtures/` +#### Test Fixtures +Well-organized in `tests/fixtures/` -**Good Examples**: +#### Good Examples -**Well-Tested Edge Cases**: +#### Well-Tested Edge Cases -**Positive Findings**: +#### Positive Findings -**Snapshot Testing**: +#### Snapshot Testing -**Strengths**: +#### Strengths -**Weaknesses**: +#### Weaknesses -**Strengths**: +#### Strengths -**Critical Weaknesses**: +#### Critical Weaknesses

Also applies to: 27-28, 45-45, 51-51, 179-180, 195-196, 214-215, 225-226, 289-290, 311-312, 422-423, 430-431

🤖 Prompt for AI Agents

In `@TESTING_ANALYSIS.md` around lines 17 - 18, Replace the bolded pseudo-headings (e.g., "**Coverage by Module**" and the other bolded titles found elsewhere) with proper Markdown ATX headings (use ### or #### as appropriate for the document hierarchy) so they appear in the document outline; search for the bold patterns like "**...**" used as section titles and convert them to heading lines (for example change "**Coverage by Module**" to "### Coverage by Module") keeping the text identical except for removing the bold markers and adding the leading hashes.

coderabbitai · 2026-01-18T20:37:54Z

TESTING_ANALYSIS.md

+```text
+src\extraction\mod.rs - extraction::StringExtractor (line 318)
+src\extraction\mod.rs - extraction::BasicExtractor (line 408)
+```
+
+**Problem**: Doctests use `?` operator without proper return type:
+
+```rust
+fn main() {  // Should be: fn main() -> Result<(), Box<dyn std::error::Error>> {
+    let data = std::fs::read("binary_file")?;  // Error: can't use ? in fn returning ()
+    ...
+}
+```


⚠️ Potential issue | 🟡 Minor

Add language identifiers to fenced code blocks.

These code fences are missing language tags, which violates the markdown rules and hurts readability. Add explicit languages like text or rust as appropriate.

Proposed fix

-```text +```text src\extraction\mod.rs - extraction::StringExtractor (line 318) src\extraction\mod.rs - extraction::BasicExtractor (line 408)

```diff -```rust +```rust fn main() { // Should be: fn main() -> Result<(), Box<dyn std::error::Error>> { let data = std::fs::read("binary_file")?; // Error: can't use ? in fn returning () ... }

```diff -```rust +```rust fn main() -> Result<(), Box<dyn std::error::Error>> { let _cli = Cli::parse(); // TODO: Implement main extraction pipeline println!("Stringy - Binary string extraction tool"); println!("Implementation coming soon..."); Ok(()) }

```diff -```rust +```rust if section_offset >= data.len() { return Ok(Vec::new()); } let end_offset = section_offset .checked_add(section_size) .unwrap_or(data.len()) .min(data.len());

```diff -```rust +```rust #[test] fn test_section_beyond_file_boundary() { // Section offset > data.len() } #[test] fn test_section_size_overflow() { // section_offset + section_size overflows }

```diff -```rust +```rust // In src/extraction/mod.rs (line 318 and 408) // Change: fn main() { // To: fn main() -> Result<(), Box<dyn std::error::Error>> { // Add: Ok(()) at end of function

```diff -```rust +```rust // tests/test_deduplication_performance.rs #[test] #[ignore] // Marked as ignored for normal runs fn test_deduplication_large_input() { // Test with 10,000 duplicate strings }

```diff -```rust +```rust // tests/test_extraction_edge_cases.rs #[test] fn test_section_beyond_boundary() { // Section offset > data.len() }

</details> Also applies to: 116-125, 136-145, 161-171, 339-344, 348-355, 359-365 <details> <summary>🤖 Prompt for AI Agents</summary>

In @TESTING_ANALYSIS.md around lines 67 - 79, The fenced code blocks in
TESTING_ANALYSIS.md lack language identifiers causing markdown lint and
readability issues; update each triple-backtick fence to include the correct
language (e.g., text for file/path listings, rust for Rust snippets) for
all examples referencing extraction::StringExtractor,
extraction::BasicExtractor, the main function examples (fn main / fn main() ->
Result<(), Box>), and the test snippets
(tests/test_deduplication_performance.rs, tests/test_extraction_edge_cases.rs);
ensure every opened is replaced with the appropriaterust or ```text and
that the Rust examples reflect the corrected function signature and Ok(()) where
suggested.

</details>  

- Added 'moby' configuration to Docker feature - Removed unused features from the devcontainer configuration - Added mise.toml for tool dependencies Signed-off-by: UncleSp1d3r <unclesp1d3r@evilbitlabs.io>

Signed-off-by: UncleSp1d3r <unclesp1d3r@evilbitlabs.io>

coderabbitai

Actionable comments posted: 5

🤖 Fix all issues with AI agents

In @.devcontainer/devcontainer.json:
- Line 48: Add a trailing newline to the end of the file so it ends with a
newline character (POSIX convention). Open the .devcontainer/devcontainer.json
file and insert a single newline at EOF so the file terminates with '\n'; no
code changes beyond adding that final newline are required.
- Around line 40-41: The extensions list includes both "1YiB.rust-bundle" and
"rust-lang.rust-analyzer", which overlap; decide which to keep and remove the
duplicate entry accordingly: if you want the bundled set keep "1YiB.rust-bundle"
and delete "rust-lang.rust-analyzer", otherwise keep the official
"rust-lang.rust-analyzer" and remove "1YiB.rust-bundle"; update the "extensions"
array in devcontainer.json to contain only the chosen extension and, if
necessary, update README or comments to document the rationale.
- Around line 5-23: The devcontainer features currently set "version": "latest"
(e.g. entries for "ghcr.io/devcontainers/features/docker-outside-of-docker:1",
"ghcr.io/devcontainers/features/github-cli:1",
"ghcr.io/eitsupi/devcontainer-features/mdbook:1",
"ghcr.io/devcontainers-extra/features/claude-code:1", and
"ghcr.io/devcontainers-extra/features/mise:1") should be pinned to concrete
version strings for reproducibility; update each feature's "version" value to a
specific release/tag (or add a comment listing the expected version) and, where
available, replace "latest" with the exact version you validated locally to
ensure deterministic dev environments.

In `@justfile`:
- Around line 55-66: Replace the duplicated mise install steps in both the setup
and [unix] setup recipes by invoking the existing mise-install recipe: remove
the repeated mise trust/mise install lines and call the mise-install recipe from
setup and the [unix] setup recipe (keep any environment-specific lines like
rustup component additions or the mdformat-install/Write-Host lines as needed),
so both variants reuse the single mise-install implementation and avoid drift.

In `@mise.toml`:
- Around line 1-15: The tools table in mise.toml pins versions except for claude
which is set to "latest", causing non-deterministic installs; update the claude
entry to a specific version string (match the project's versioning scheme used
for other tools) by replacing claude = "latest" with a pinned version (e.g.,
claude = "<x.y.z>") so installs are reproducible; ensure the chosen version is
compatible with CI and document the selected version in the repo (reference: the
claude entry in the tools block).

coderabbitai · 2026-01-19T22:25:50Z

.devcontainer/devcontainer.json

+		"ghcr.io/devcontainers/features/docker-outside-of-docker:1": {
+			"installDockerBuildx": true,
+			"version": "latest",
+			"dockerDashComposeVersion": "v2",
+			"moby": false
+		},
+		"ghcr.io/devcontainers/features/github-cli:1": {
+			"installDirectlyFromGitHubRelease": true,
+			"version": "latest"
+		},
+		"ghcr.io/eitsupi/devcontainer-features/mdbook:1": {
+			"version": "latest"
+		},
+		"ghcr.io/devcontainers-extra/features/claude-code:1": {
+			"version": "latest"
+		},
+		"ghcr.io/devcontainers-extra/features/mise:1": {
+			"version": "latest"
+		}


🧹 Nitpick | 🔵 Trivial

Consider pinning feature versions for reproducible dev environments.

All features use version: "latest", which can cause inconsistent environments across team members or over time. For a binary analysis tool, reproducible builds and test environments reduce debugging headaches.

Pin to specific versions or at minimum document the expected versions in a comment.

Example version pinning

"ghcr.io/devcontainers/features/docker-outside-of-docker:1": { "installDockerBuildx": true, - "version": "latest", + "version": "27.4", "dockerDashComposeVersion": "v2", "moby": false }, "ghcr.io/devcontainers/features/github-cli:1": { "installDirectlyFromGitHubRelease": true, - "version": "latest" + "version": "2.64" },

🤖 Prompt for AI Agents

In @.devcontainer/devcontainer.json around lines 5 - 23, The devcontainer features currently set "version": "latest" (e.g. entries for "ghcr.io/devcontainers/features/docker-outside-of-docker:1", "ghcr.io/devcontainers/features/github-cli:1", "ghcr.io/eitsupi/devcontainer-features/mdbook:1", "ghcr.io/devcontainers-extra/features/claude-code:1", and "ghcr.io/devcontainers-extra/features/mise:1") should be pinned to concrete version strings for reproducibility; update each feature's "version" value to a specific release/tag (or add a comment listing the expected version) and, where available, replace "latest" with the exact version you validated locally to ensure deterministic dev environments.

.devcontainer/devcontainer.json

coderabbitai · 2026-01-19T22:25:50Z

.devcontainer/devcontainer.json

+			]
+		}
+	}
+}


🧹 Nitpick | 🔵 Trivial

Add trailing newline.

File ends without a newline. POSIX convention and most linters expect files to end with a newline character.

🤖 Prompt for AI Agents

In @.devcontainer/devcontainer.json at line 48, Add a trailing newline to the end of the file so it ends with a newline character (POSIX convention). Open the .devcontainer/devcontainer.json file and insert a single newline at EOF so the file terminates with '\n'; no code changes beyond adding that final newline are required.

justfile

mise.toml

Signed-off-by: UncleSp1d3r <unclesp1d3r@evilbitlabs.io>

Extract YARA string escaping utilities to separate escaping.rs module: - escape_yara_string, escape_yara_unicode_literal - utf16be_hex_string, utf16le_hex_string - All associated tests The original yara.rs was 536 lines, now split into: - yara/mod.rs (357 lines) - main formatting logic - yara/escaping.rs (204 lines) - escaping utilities Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

Signed-off-by: UncleSp1d3r <unclesp1d3r@evilbitlabs.io>

Fix formatting bug where rules 3 and 4 were merged on one line, correct typos, add Rust version requirements, expand development commands, clarify module structure, and document key dependencies. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

coderabbitai

Actionable comments posted: 2

🤖 Fix all issues with AI agents

In @.vscode/settings.json:
- Around line 1-12: The .vscode/settings.json contains unused Ruff and local
tool paths ("ruff.path", "ruff.interpreter", "python.defaultInterpreterPath",
"debug.javascript.defaultRuntimeExecutable") that are unclear or dead; either
remove the Ruff entries if unused or wire Ruff into the repository toolchain by
adding it to .pre-commit-config.yaml and CI, and if Python/Node under
.vscode/mise-tools are required for docs/builds/tests, document the purpose and
the steps to populate .vscode/mise-tools in README or CONTRIBUTING (refer to
these specific keys when editing) so developers know when to install those local
interpreters and runtimes.

In `@AGENTS.md`:
- Line 17: The line "**Rust**: Edition 2024, MSRV 1.91" is using emphasis as a
heading; replace it with a real Markdown heading or list item (e.g., change to
"### Rust — Edition 2024, MSRV 1.91" or use "- Rust: Edition 2024, MSRV 1.91")
so markdownlint no longer flags it; locate the exact text "**Rust**: Edition
2024, MSRV 1.91" and update it accordingly.

.vscode/settings.json

AGENTS.md

- Fix test metadata count mismatches in output_table_integration.rs - Remove underscore prefix from used parameters in json.rs - Use crate re-export for RankingConfig import - Fix truncate_string edge case for wide multibyte characters - Update table docs to reflect actual tag selection behavior - Add control character sanitization for TTY output - Use UTF-8 encoding for Unicode test content - Add Claude Code AI assistance section to CONTRIBUTING.md - Update snapshots for encoding and sanitization changes Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

- Remove duplicate rust-bundle extension, keep official rust-analyzer - Refactor setup recipes to reuse mise-install recipe - Fix markdown lint: convert emphasis-as-heading to list items Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

Add mise_exec variable and prefix all tool commands (cargo, prettier, pre-commit, actionlint, cspell, markdownlint, lychee, dist, mdbook, goreleaser) with mise exec to ensure correct tool versions are used regardless of shell activation state. Also adds dotenv-load and ignore-comments settings for consistency with other projects. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

coderabbitai

Actionable comments posted: 1

🤖 Fix all issues with AI agents

In `@tests/output_table_integration.rs`:
- Around line 302-308: The test test_plain_unicode_strings constructs
Unicode-containing cells using make_string but currently uses Encoding::Ascii
(via make_string), which is incorrect; update the calls to make_string to use an
appropriate Unicode encoding (e.g., Encoding::Utf8) and ensure the associated
metadata from make_metadata reflects UTF-8 encoding so encoding handling is
exercised correctly when calling format_table_with_mode; locate the make_string
calls inside test_plain_unicode_strings and change the encoding argument and/or
the metadata returned by make_metadata to UTF-8/Unicode.

♻️ Duplicate comments (5)

CONTRIBUTING.md (1)
60-63: Consolidate duplicate snapshot guidance.

Lines 61 and 63 repeat the same instruction. Keep one concise bullet to avoid drift.
Proposed fix
- - Use insta snapshots for output verification when appropriate.
- - Integration tests live in tests/ and fixtures in tests/fixtures/.
- - Use insta snapshots for output verification when changing output formatters.
+ - Use insta snapshots for output verification when appropriate, especially when changing output formatters.
+ - Integration tests live in tests/ and fixtures in tests/fixtures/.
TESTING_ANALYSIS.md (2)
7-12: Add last-verified metadata for numeric test metrics.

The counts and pass rates will go stale. Add a date and source near this block and repeat for the later metrics block around Lines 472-477.
Proposed fix
 - **Fixtures**: 5 binary fixtures (ELF, Mach-O, PE with/without resources)
+Last verified: 2026-01-18 - source: CI job "tests", command: `cargo test --all`
17-28: Replace bold pseudo-headings with proper Markdown headings.

Bold text is being used as headings, which breaks the document outline. Convert these to ATX headings and apply the change consistently.
Proposed fix
-**Coverage by Module**:
+#### Coverage by Module
@@
-**Test Files**:
+#### Test Files
As per coding guidelines, use proper Markdown headings.
src/output/table/formatting.rs (1)
6-7: Use crate re-export for RankingConfig import.

The deep path import violates the re-export policy. Switch to the crate re-export, or add one in lib.rs if missing.
Proposed fix
-use crate::classification::RankingConfig;
+use crate::RankingConfig;
As per coding guidelines, use crate re-exports for imports.
src/output/table/tty.rs (1)
78-94: Section column needs sanitization and truncation.

Raw section names can include control characters or exceed the capped width, which breaks table alignment. Sanitize and truncate to section_width before padding.
Proposed fix
-        let section_display = found_string.section.as_deref().unwrap_or("");
+        let section_display = found_string.section.as_deref().unwrap_or("");
+        let sanitized_section = sanitize_for_display(section_display);
+        let truncated_section = truncate_string(&sanitized_section, section_width);
@@
-            pad_string(section_display, section_width, Alignment::Left),
+            pad_string(&truncated_section, section_width, Alignment::Left),

coderabbitai · 2026-01-24T23:29:47Z

tests/output_table_integration.rs

+fn test_plain_unicode_strings() {
+    let strings = vec![
+        make_string("\u{4E2D}\u{6587}\u{5B57}\u{7B26}\u{4E32}"), // Chinese characters
+        make_string("\u{0420}\u{0443}\u{0441}\u{0441}\u{043A}\u{0438}\u{0439}"), // Russian
+        make_string("\u{1F600}\u{1F601}\u{1F602}"),              // Emojis
+    ];
+    let result = format_table_with_mode(&strings, &make_metadata(3), false).unwrap();


⚠️ Potential issue | 🟡 Minor

Fix encoding metadata for Unicode test strings.

make_string uses Encoding::Ascii, which mismatches the Unicode content in this test and can mask encoding handling issues.

Proposed fix

- let strings = vec![ - make_string("\u{4E2D}\u{6587}\u{5B57}\u{7B26}\u{4E32}"), // Chinese characters - make_string("\u{0420}\u{0443}\u{0441}\u{0441}\u{043A}\u{0438}\u{0439}"), // Russian - make_string("\u{1F600}\u{1F601}\u{1F602}"), // Emojis - ]; + let chinese = "\u{4E2D}\u{6587}\u{5B57}\u{7B26}\u{4E32}"; + let russian = "\u{0420}\u{0443}\u{0441}\u{0441}\u{043A}\u{0438}\u{0439}"; + let emoji = "\u{1F600}\u{1F601}\u{1F602}"; + let strings = vec![ + FoundString::new( + chinese.to_string(), + Encoding::Utf8, + 0x1000, + chinese.len() as u32, + StringSource::SectionData, + ), + FoundString::new( + russian.to_string(), + Encoding::Utf8, + 0x1001, + russian.len() as u32, + StringSource::SectionData, + ), + FoundString::new( + emoji.to_string(), + Encoding::Utf8, + 0x1002, + emoji.len() as u32, + StringSource::SectionData, + ), + ];

🤖 Prompt for AI Agents

In `@tests/output_table_integration.rs` around lines 302 - 308, The test test_plain_unicode_strings constructs Unicode-containing cells using make_string but currently uses Encoding::Ascii (via make_string), which is incorrect; update the calls to make_string to use an appropriate Unicode encoding (e.g., Encoding::Utf8) and ensure the associated metadata from make_metadata reflects UTF-8 encoding so encoding handling is exercised correctly when calling format_table_with_mode; locate the make_string calls inside test_plain_unicode_strings and change the encoding argument and/or the metadata returned by make_metadata to UTF-8/Unicode.

unclesp1d3r added 4 commits January 18, 2026 02:14

feat(output): add JSON, table, and YARA formatters

71e4c1c

Signed-off-by: UncleSp1d3r <unclesp1d3r@evilbitlabs.io>

unclesp1d3r self-assigned this Jan 18, 2026

Copilot AI review requested due to automatic review settings January 18, 2026 16:44

Copilot started reviewing on behalf of unclesp1d3r January 18, 2026 16:45 View session

coderabbitai bot added documentation Improvements or additions to documentation enhancement New feature or request labels Jan 18, 2026

Copilot AI reviewed Jan 18, 2026

View reviewed changes

tests/output_table_integration.rs Outdated Show resolved Hide resolved

tests/output_table_integration.rs Outdated Show resolved Hide resolved

tests/output_table_integration.rs Show resolved Hide resolved

Enable superpowers plugin in Claude settings

14c3d82

Added 'superpowers@claude-plugins-official' to the enabledPlugins list in .claude/settings.json.

coderabbitai bot reviewed Jan 18, 2026

View reviewed changes

.claude/settings.json Outdated Show resolved Hide resolved

unclesp1d3r and others added 3 commits January 18, 2026 12:03

fix(docs): clarify ASCII rule for Unicode handling

6c1b531

Signed-off-by: UncleSp1d3r <unclesp1d3r@evilbitlabs.io>

fix(reviews): clarify ASCII rule for Unicode punctuation

a122d32

Signed-off-by: UncleSp1d3r <unclesp1d3r@evilbitlabs.io>

coderabbitai bot reviewed Jan 18, 2026

View reviewed changes

src/output/yara/mod.rs Show resolved Hide resolved

src/output/yara/mod.rs Show resolved Hide resolved

coderabbitai bot reviewed Jan 18, 2026

View reviewed changes

AGENTS.md Outdated Show resolved Hide resolved

unclesp1d3r and others added 4 commits January 18, 2026 12:40

chore(settings): remove enabled plugins from configuration

3b9c618

Signed-off-by: UncleSp1d3r <unclesp1d3r@evilbitlabs.io>

chore(contributing): add contributing guidelines document

b6689ce

Signed-off-by: UncleSp1d3r <unclesp1d3r@evilbitlabs.io>

chore(devcontainer): add Rust devcontainer configuration

0c2744e

chore(dependabot): update schedules to weekly for various ecosystems Signed-off-by: UncleSp1d3r <unclesp1d3r@evilbitlabs.io>

coderabbitai bot reviewed Jan 18, 2026

View reviewed changes

chore(tests): add comprehensive testing strategy analysis

f4388be

Signed-off-by: UncleSp1d3r <unclesp1d3r@evilbitlabs.io>

coderabbitai bot reviewed Jan 18, 2026

View reviewed changes

unclesp1d3r added 3 commits January 19, 2026 21:58

chore(devcontainer): update Docker features and remove unused ones

7306f48

- Added 'moby' configuration to Docker feature - Removed unused features from the devcontainer configuration - Added mise.toml for tool dependencies Signed-off-by: UncleSp1d3r <unclesp1d3r@evilbitlabs.io>

chore(setup): update setup commands and add mise installation

d52047a

Signed-off-by: UncleSp1d3r <unclesp1d3r@evilbitlabs.io>

chore(cleanup): remove megalinter configurations and references

1cb3744

Signed-off-by: UncleSp1d3r <unclesp1d3r@evilbitlabs.io>

coderabbitai bot reviewed Jan 19, 2026

View reviewed changes

unclesp1d3r and others added 4 commits January 19, 2026 23:05

chore(mise): add node version to tools configuration

3b821e5

Signed-off-by: UncleSp1d3r <unclesp1d3r@evilbitlabs.io>

chore(agents): fix formatting in critical rules section

c4ec73b

Signed-off-by: UncleSp1d3r <unclesp1d3r@evilbitlabs.io>

coderabbitai bot reviewed Jan 24, 2026

View reviewed changes

.vscode/settings.json Show resolved Hide resolved

AGENTS.md Outdated Show resolved Hide resolved

unclesp1d3r and others added 3 commits January 24, 2026 18:14

coderabbitai bot reviewed Jan 24, 2026

View reviewed changes

unclesp1d3r merged commit 91f4263 into main Jan 24, 2026
20 of 21 checks passed

unclesp1d3r deleted the output_formatters branch January 24, 2026 23:34

		use crate::classification::ranking::RankingConfig;
		use crate::types::Tag;

		Priority: High Current state: 1542 lines (exceeds 500-line project limit by 1042 lines) Files affected: `src/extraction/mod.rs`

+              			]
+              		}
+              	}
+              }

                
                    No newline at end of file

		Coverage by Module:

Uh oh!

feat: Implement enum-based output formatters (Table, JSON, YARA) #127

feat: Implement enum-based output formatters (Table, JSON, YARA) #127

Uh oh!

Conversation

unclesp1d3r commented Jan 18, 2026

Uh oh!

coderabbitai bot commented Jan 18, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Other AI code review bot(s) detected

Architectural Overview

Key Architectural Changes

Public API & Type Changes

Deduplication & Internal Data Handling

Binary Format & Extraction Impact

Semantic Classification & Preservation

Testing, Documentation & Developer Experience

Notable Behavioral Constraints & Validations

Overall Impact

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related PRs

Poem

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

codecov bot commented Jan 18, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Jan 18, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Jan 18, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Jan 18, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Jan 18, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Jan 18, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot commented Jan 18, 2026 •

edited

Loading

codecov bot commented Jan 18, 2026 •

edited

Loading