Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
25 commits
Select commit Hold shift + click to select a range
71e4c1c
feat(output): add JSON, table, and YARA formatters
unclesp1d3r Jan 18, 2026
4c4c955
Implement table output formatter with TTY and plain modes
unclesp1d3r Jan 18, 2026
b8bcc8e
Implement JSON and YARA output formatters
unclesp1d3r Jan 18, 2026
d2710e8
feat(output): add generated_at timestamp to output metadata
unclesp1d3r Jan 18, 2026
14c3d82
Enable superpowers plugin in Claude settings
unclesp1d3r Jan 18, 2026
de2e8d5
refactor(output): improve YARA formatter code quality and test coverage
unclesp1d3r Jan 18, 2026
6c1b531
fix(docs): clarify ASCII rule for Unicode handling
unclesp1d3r Jan 18, 2026
a122d32
fix(reviews): clarify ASCII rule for Unicode punctuation
unclesp1d3r Jan 18, 2026
3b9c618
chore(settings): remove enabled plugins from configuration
unclesp1d3r Jan 18, 2026
b6689ce
chore(contributing): add contributing guidelines document
unclesp1d3r Jan 18, 2026
bec8192
refactor: address code review findings and add project documentation
unclesp1d3r Jan 18, 2026
0c2744e
chore(devcontainer): add Rust devcontainer configuration
unclesp1d3r Jan 18, 2026
6510b90
refactor(output): split table.rs into module directory
unclesp1d3r Jan 18, 2026
5c53d91
fix(yara): correct UTF-16LE encoding and prevent injection attacks
unclesp1d3r Jan 18, 2026
f4388be
chore(tests): add comprehensive testing strategy analysis
unclesp1d3r Jan 18, 2026
7306f48
chore(devcontainer): update Docker features and remove unused ones
unclesp1d3r Jan 19, 2026
d52047a
chore(setup): update setup commands and add mise installation
unclesp1d3r Jan 19, 2026
1cb3744
chore(cleanup): remove megalinter configurations and references
unclesp1d3r Jan 19, 2026
3b821e5
chore(mise): add node version to tools configuration
unclesp1d3r Jan 19, 2026
704e7c5
refactor(yara): split module to stay under 500-line limit
unclesp1d3r Jan 19, 2026
c4ec73b
chore(agents): fix formatting in critical rules section
unclesp1d3r Jan 20, 2026
74c71bc
docs(AGENTS): improve AI agent guidelines with fixes and additions
unclesp1d3r Jan 24, 2026
9663556
fix: address PR review comments for output formatters
unclesp1d3r Jan 24, 2026
5ccbff1
chore: cleanup devcontainer, justfile, and AGENTS.md
unclesp1d3r Jan 24, 2026
e4d1e15
refactor(justfile): use mise exec for all tool commands
unclesp1d3r Jan 24, 2026
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 1 addition & 3 deletions .claude/settings.json
Original file line number Diff line number Diff line change
@@ -1,5 +1,3 @@
{
"enabledPlugins": {
"commit@cc-marketplace": true
}
"enabledPlugins": {}
}
2 changes: 1 addition & 1 deletion .coderabbit.yml
Original file line number Diff line number Diff line change
Expand Up @@ -351,7 +351,7 @@ reviews:
- mode: "warning"
name: "ASCII Only"
instructions: |
Verify that no Unicode punctuation is introduced:
Verify that no Unicode punctuation is introduced unless explicitly required:
1. No emojis in code or documentation
2. No em-dashes - use regular hyphens
3. No smart quotes - use straight quotes
Expand Down
47 changes: 47 additions & 0 deletions .devcontainer/devcontainer.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,47 @@
{
"name": "Rust",
"image": "mcr.microsoft.com/devcontainers/rust:2-1-trixie",
"features": {
"ghcr.io/devcontainers/features/docker-outside-of-docker:1": {
"installDockerBuildx": true,
"version": "latest",
"dockerDashComposeVersion": "v2",
"moby": false
},
"ghcr.io/devcontainers/features/github-cli:1": {
"installDirectlyFromGitHubRelease": true,
"version": "latest"
},
"ghcr.io/eitsupi/devcontainer-features/mdbook:1": {
"version": "latest"
},
"ghcr.io/devcontainers-extra/features/claude-code:1": {
"version": "latest"
},
"ghcr.io/devcontainers-extra/features/mise:1": {
"version": "latest"
}
Comment on lines +5 to +23
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick | 🔵 Trivial

Consider pinning feature versions for reproducible dev environments.

All features use version: "latest", which can cause inconsistent environments across team members or over time. For a binary analysis tool, reproducible builds and test environments reduce debugging headaches.

Pin to specific versions or at minimum document the expected versions in a comment.

Example version pinning
 		"ghcr.io/devcontainers/features/docker-outside-of-docker:1": {
 			"installDockerBuildx": true,
-			"version": "latest",
+			"version": "27.4",
 			"dockerDashComposeVersion": "v2",
 			"moby": false
 		},
 		"ghcr.io/devcontainers/features/github-cli:1": {
 			"installDirectlyFromGitHubRelease": true,
-			"version": "latest"
+			"version": "2.64"
 		},
🤖 Prompt for AI Agents
In @.devcontainer/devcontainer.json around lines 5 - 23, The devcontainer
features currently set "version": "latest" (e.g. entries for
"ghcr.io/devcontainers/features/docker-outside-of-docker:1",
"ghcr.io/devcontainers/features/github-cli:1",
"ghcr.io/eitsupi/devcontainer-features/mdbook:1",
"ghcr.io/devcontainers-extra/features/claude-code:1", and
"ghcr.io/devcontainers-extra/features/mise:1") should be pinned to concrete
version strings for reproducibility; update each feature's "version" value to a
specific release/tag (or add a comment listing the expected version) and, where
available, replace "latest" with the exact version you validated locally to
ensure deterministic dev environments.

},
"customizations": {
"vscode": {
"extensions": [
"mikestead.dotenv",
"EditorConfig.EditorConfig",
"tamasfe.even-better-toml",
"github.vscode-github-actions",
"GitHub.vscode-pull-request-github",
"skellock.just",
"yzhang.markdown-all-in-one",
"bierner.markdown-checkbox",
"bierner.markdown-footnotes",
"bierner.markdown-mermaid",
"bierner.markdown-yaml-preamble",
"DavidAnson.vscode-markdownlint",
"rust-lang.rust-analyzer",
"foxundermoon.shell-format",
"redhat.vscode-yaml",
"ms-vscode-remote.remote-containers"
]
}
}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick | 🔵 Trivial

Add trailing newline.

File ends without a newline. POSIX convention and most linters expect files to end with a newline character.

🤖 Prompt for AI Agents
In @.devcontainer/devcontainer.json at line 48, Add a trailing newline to the
end of the file so it ends with a newline character (POSIX convention). Open the
.devcontainer/devcontainer.json file and insert a single newline at EOF so the
file terminates with '\n'; no code changes beyond adding that final newline are
required.

8 changes: 6 additions & 2 deletions .github/dependabot.yml
Original file line number Diff line number Diff line change
Expand Up @@ -13,8 +13,12 @@ updates:
- package-ecosystem: "github-actions"
directory: "/"
schedule:
interval: "daily"
interval: "weekly"
- package-ecosystem: "rust-toolchain"
directory: "/"
schedule:
interval: "daily"
interval: "weekly"
- package-ecosystem: "devcontainers"
directory: "/"
schedule:
interval: "weekly"
8 changes: 6 additions & 2 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -121,9 +121,13 @@ docs/book/
.envrc
.direnv/

megalinter-reports/

# Override global gitignore
!bin/
# Added by goreleaser init:
.intentionally-empty-file.o


megalinter-reports/*
target/*
stringy-output/*
tests/fixtures/*
4 changes: 1 addition & 3 deletions .mdformat.toml
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,6 @@ exclude = [
"**/*.tpl.md",
"**/CHANGELOG.md",
"target/**",
"megalinter-reports/**",
]
validate = true
number = true
Expand All @@ -26,5 +25,4 @@ extensions = [

[plugin.mkdocs]
align_semantic_breaks_in_lists = true
ignore_missing_references = true

ignore_missing_references = true
48 changes: 0 additions & 48 deletions .mega-linter.yml

This file was deleted.

4 changes: 4 additions & 0 deletions .repomixignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
megalinter-reports/*
target/*
stringy-output/*
tests/fixtures/*
12 changes: 12 additions & 0 deletions .vscode/settings.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,12 @@
{
"ruff.path": [
"${workspaceFolder}/.vscode/mise-tools/ruff"
],
"ruff.interpreter": [
"${workspaceFolder}/.vscode/mise-tools/python"
],
"python.defaultInterpreterPath": "${workspaceFolder}/.vscode/mise-tools/python",
"debug.javascript.defaultRuntimeExecutable": {
"pwa-node": "${workspaceFolder}/.vscode/mise-tools/node"
}
}
21 changes: 17 additions & 4 deletions AGENTS.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,24 +6,25 @@

1. **No `unsafe` code** - `#![forbid(unsafe_code)]` enforced
2. **Zero warnings** - `cargo clippy -- -D warnings` must pass
3. **ASCII only** - No emojis, em-dashes, smart quotes, or Unicode punctuation
3. **ASCII only** - No emojis, em-dashes, smart quotes, or Unicode punctuation (except when explicitly testing or working with Unicode strings or emojis)
4. **File size limit** - Keep files under 500 lines; split larger files
5. **No blanket `#[allow]`** - Any `allow` requires inline justification

## Project Summary

Stringy extracts meaningful strings from ELF, PE, and Mach-O binaries using format-specific knowledge and semantic classification. Unlike standard `strings`, it is section-aware and semantically intelligent.

**Data flow**: Binary -> Format Detection -> Container Parsing -> String Extraction -> Deduplication -> Classification -> Ranking -> Output
- **Rust**: Edition 2024, MSRV 1.91
- **Data flow**: Binary -> Format Detection -> Container Parsing -> String Extraction -> Deduplication -> Classification -> Ranking -> Output

## Module Structure

| Module | Purpose |
| ----------------- | ---------------------------------------------------------------- |
| `container/` | Format detection, section analysis, imports/exports via `goblin` |
| `extraction/` | ASCII/UTF-8/UTF-16 extraction, deduplication, PE resources |
| `classification/` | Semantic tagging (URLs, IPs, domains, paths, GUIDs) |
| `output/` | Formatters (JSON, human-readable, YARA-friendly) |
| `classification/` | Semantic tagging (URLs, IPs, domains, paths, GUIDs), ranking |
| `output/` | Formatters: `json/`, `table/` (tty/plain), `yara/` |
| `types/` | Core data structures, error handling with `thiserror` |

## Key Patterns
Expand All @@ -48,6 +49,10 @@ just test # Run tests with nextest
just lint # Full lint suite
just fix # Auto-fix clippy warnings
just ci-check # Full CI suite locally
just build # Debug build
just run <args> # Run stringy with arguments
just bench # Run benchmarks
just format # Format all (Rust, JSON, YAML, Markdown, Justfile)
```

## Testing
Expand All @@ -60,6 +65,14 @@ just ci-check # Full CI suite locally

Import from `stringy::extraction` or `stringy::types`, not deeply nested paths. Re-exports are in `lib.rs`.

## Key Dependencies

- `goblin` - Binary format parsing (ELF, PE, Mach-O)
- `pelite` - PE resource extraction
- `thiserror` - Error type definitions
- `insta` - Snapshot testing (dev)
- `criterion` - Benchmarking (dev)

## Adding Features

**New semantic tag**: Add variant to `Tag` enum in `types.rs`, implement pattern in `classification/semantic.rs`
Expand Down
53 changes: 53 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,53 @@
# Changelog

All notable changes to this project will be documented in this file.

The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/),
and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).

## [Unreleased]

### Added
- Output formatters: JSON (JSONL), table (TTY-friendly), and YARA rule templates
- `generated_at` timestamp support in output metadata for deterministic outputs
- Ranking system for prioritizing extracted strings by relevance
- Symbol demangling support for Rust mangled names
- File path classification for POSIX, Windows, and registry paths
- Semantic classification for URLs, domains, and IP addresses (IPv4/IPv6)
- String deduplication with full occurrence metadata preservation
- `CanonicalString` type for deduplicated strings with occurrence tracking
- UTF-16 string extraction with confidence scoring
- Noise filtering framework with entropy, linguistic, and repetition filters
- Mach-O load command extraction with section weight normalization
- Comprehensive PE support: section classification, import/export parsing, resource extraction
- ELF symbol extraction with type support and visibility filtering
- `#[non_exhaustive]` and builder pattern for `FoundString` public API
- Contributing guidelines document

### Changed
- Repository renamed from StringyMcStringFace to Stringy
- Improved YARA formatter code quality and test coverage
- Clarified ASCII rule for Unicode handling in documentation
Comment on lines +27 to +30
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Document the extract_utf16le_strings to extract_utf16_strings API rename.

The UTF-16 extraction function was renamed in this PR. This is a breaking change that should be explicitly documented under "Changed" for users upgrading.

Proposed addition
 ### Changed
 - Repository renamed from StringyMcStringFace to Stringy
 - Improved YARA formatter code quality and test coverage
 - Clarified ASCII rule for Unicode handling in documentation
+- Renamed `extract_utf16le_strings` to `extract_utf16_strings` for consistency
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
### Changed
- Repository renamed from StringyMcStringFace to Stringy
- Improved YARA formatter code quality and test coverage
- Clarified ASCII rule for Unicode handling in documentation
### Changed
- Repository renamed from StringyMcStringFace to Stringy
- Improved YARA formatter code quality and test coverage
- Clarified ASCII rule for Unicode handling in documentation
- Renamed `extract_utf16le_strings` to `extract_utf16_strings` for consistency
🤖 Prompt for AI Agents
In `@CHANGELOG.md` around lines 27 - 30, Update the "Changed" section in
CHANGELOG.md to explicitly document the breaking API rename: note that
extract_utf16le_strings was renamed to extract_utf16_strings (include both names
and state it's a breaking change for callers), add a short upgrade note
instructing users to replace calls to extract_utf16le_strings with
extract_utf16_strings, and ensure this entry is listed alongside the other
"Changed" items so consumers see the migration step when upgrading.


### Fixed
- Rustdoc warning for IPv6 address example in documentation

### Dependencies
- Updated criterion to 0.8.1
- Updated actions/checkout to v6
- Updated actions/download-artifact to v7
- Updated actions/attest-build-provenance to v3
- Updated actions/upload-artifact to v5
- Updated github/codeql-action to v4
- Updated EmbarkStudios/cargo-deny-action to v2

## [0.1.0] - TBD

Initial release with core functionality:

### Added
- ELF, PE, and Mach-O binary format detection and parsing
- ASCII and UTF-8 string extraction from binary sections
- Section-aware extraction with weight-based prioritization
- Basic semantic tagging infrastructure
- Command-line interface (in development)
90 changes: 90 additions & 0 deletions CONTRIBUTING.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,90 @@
# Contributing to Stringy

Thanks for your interest in Stringy. This guide explains how to propose changes and what we expect for code quality.

## Quick start

1. Search existing issues and pull requests before filing a new one.
2. For bugs, open an issue with a clear reproduction and expected vs actual behavior.
3. For new features or larger changes, open an issue first to discuss scope.

## Development setup

Stringy uses Rust 2024 (MSRV 1.85+, see `rust-toolchain.toml`). We also use just for common tasks.

Recommended workflow:

- `just setup` (to install tools)
- `just build` (compiles a debug build)
- `just test` (runs tests)
- `just lint` (runs linters)

If you do not use just, the critical requirement is that:

- `cargo clippy -- -D warnings` passes
- `cargo fmt` produces no changes

## Coding standards

These rules are enforced by CI:

- No unsafe code
- Zero warnings (`clippy -D warnings`)
- ASCII only in code and documentation, unless explicitly working with Unicode handling
- Keep files under 500-600 lines; split when needed
- No blanket `#[allow]` on modules or files
- No async; this is a synchronous CLI tool

Use thiserror for structured errors and include context (offsets, section names, file paths) when relevant.

## Project-specific guidance

Module layout:

- `container/` handles format detection and section analysis
- `extraction/` handles string extraction, filtering, and deduplication
- `classification/` handles semantic tagging and ranking
- `output/` handles output formatters
- `types.rs` contains core data structures and error types

Key patterns:

- Section weights: add new section weights in `container/*.rs` using existing match patterns. Higher weight means more likely to contain useful strings.
- Semantic tags: add new Tag variants in `types.rs`, implement detection in `classification/semantic.rs`, and update any tag merging logic if needed.
- Deduplication: preserve all occurrences and merge tags across occurrences in `extraction/dedup.rs`.
- Public structs: keep public API structs non_exhaustive and provide explicit constructors.
- Imports: prefer `stringy::extraction` or `stringy::types`. Do not import locally-defined types inside `extraction/mod.rs`.

## Tests

- Add or update tests for behavior changes.
- Use insta snapshots for output verification when appropriate.
- Integration tests live in tests/ and fixtures in tests/fixtures/.
- Use insta snapshots for output verification when changing output formatters.

Run:

- `just test`

Comment on lines +58 to +68
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Duplicate guidance and missing language specifier.

Lines 61 and 63 contain duplicate text about insta snapshots:

  • Line 61: "Use insta snapshots for output verification when appropriate."
  • Line 63: "Use insta snapshots for output verification when changing output formatters."

Also, line 59 has an empty code block - consider adding a language specifier or removing it if unneeded.

Proposed fix
 ## Tests

 - Add or update tests for behavior changes.
-- Use insta snapshots for output verification when appropriate.
+- Use insta snapshots for output verification, especially when changing output formatters.
 - Integration tests live in tests/ and fixtures in tests/fixtures/.
-- Use insta snapshots for output verification when changing output formatters.
-
-Run:
-
-- `just test`
+- Run tests with `just test`
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
## Tests
- Add or update tests for behavior changes.
- Use insta snapshots for output verification when appropriate.
- Integration tests live in tests/ and fixtures in tests/fixtures/.
- Use insta snapshots for output verification when changing output formatters.
Run:
- `just test`
## Tests
- Add or update tests for behavior changes.
- Use insta snapshots for output verification, especially when changing output formatters.
- Integration tests live in tests/ and fixtures in tests/fixtures/.
- Run tests with `just test`
🧰 Tools
🪛 markdownlint-cli2 (0.18.1)

59-59: Fenced code blocks should have a language specified

(MD040, fenced-code-language)

🤖 Prompt for AI Agents
In `@CONTRIBUTING.md` around lines 58 - 68, In CONTRIBUTING.md under the "##
Tests" section remove the duplicate insta snapshot guidance by keeping one
concise sentence (e.g., "Use insta snapshots for output verification when
appropriate or when changing output formatters") and delete the redundant line;
also update the empty/unspecified code fence under the "Run:" block to include a
language specifier (e.g., ```bash) so the `just test` snippet is properly
highlighted and the file no longer contains an empty/unspecified code block.

## Pull requests

- Keep PRs focused and small when possible.
- Include a clear description of the problem and the solution.
- Link related issues in the PR description.
- Update documentation when behavior changes.

## Documentation

Docs live under docs/ and project planning artifacts are in project_plan/. Update them when you change user-facing behavior.

## Security

If you believe you found a security issue, please do not open a public issue. Use GitHub Security Advisories if available, or contact the maintainers privately.

## AI-assisted development

This project includes Claude Code configuration in `.claude/settings.json`. These settings enable plugins that help maintain code quality and follow project conventions. If you use Claude Code, the configuration will be applied automatically.

## Questions

If you are unsure where to start, open an issue with your question and we will point you in the right direction.
Loading
Loading