diff --git a/.claude/commands/review-architecture.md b/.claude/commands/review-architecture.md index 3869320..a56fe52 100644 --- a/.claude/commands/review-architecture.md +++ b/.claude/commands/review-architecture.md @@ -1,8 +1,8 @@ -Analyze this codebase's architecture: +# Analyze this codebase's architecture 1. Evaluate the overall structure and patterns 2. Identify potential architectural issues 3. Suggest improvements for scalability 4. Note areas that follow best practices -Focus on maintainability and modularity. +Focus on maintainability and modularity. Consider the simplest approach that will work for the long term. diff --git a/.claude/commands/review-dependencies.md b/.claude/commands/review-dependencies.md index 35b6edf..6983c10 100644 --- a/.claude/commands/review-dependencies.md +++ b/.claude/commands/review-dependencies.md @@ -1,8 +1,10 @@ -Analyze the project dependencies: +# Analyze the project dependencies 1. Identify outdated packages 2. Check for security vulnerabilities 3. Suggest alternative packages 4. Review dependency usage patterns +5. Make the refactoring changes +6. Summarize what you reviewed and why -Include specific upgrade recommendations. +Always finish by running `just ci-check` and ensuring that all checks and tests remain green. diff --git a/.claude/commands/review-performance.md b/.claude/commands/review-performance.md index a5193ef..90ca690 100644 --- a/.claude/commands/review-performance.md +++ b/.claude/commands/review-performance.md @@ -1,8 +1,10 @@ -Review the codebase for performance: +# Review the codebase for performance 1. Identify performance bottlenecks 2. Check resource utilization 3. Review algorithmic efficiency 4. Assess caching strategies +5. Make the refactoring changes +6. Summarize what you reviewed and why -Include specific optimization recommendations. +Always finish by running `just ci-check` and ensuring that all checks and tests remain green. diff --git a/.claude/commands/review-simplicity.md b/.claude/commands/review-simplicity.md index ef31b83..242f1d7 100644 --- a/.claude/commands/review-simplicity.md +++ b/.claude/commands/review-simplicity.md @@ -1,36 +1,36 @@ -CODE SIMPLIFICATION REVIEW +# Code Simplification Review Start by examining the uncommitted changes (or the changes in the current branch compared with main branch if there are no uncommitted changes) in the current codebase. -ANALYSIS STEPS: +## ANALYSIS STEPS 1. Identify what files have been modified or added 2. Review the actual code changes 3. Apply simplification principles below 4. Refactor directly, then show what you changed -SIMPLIFICATION PRINCIPLES: +## SIMPLIFICATION PRINCIPLES -Complexity Reduction: +### Complexity Reduction - Remove abstraction layers that don't provide clear value - Replace complex patterns with straightforward implementations - Use language idioms over custom abstractions - If a simple function/lambda works, use it—don't create classes -Test Proportionality: +### Test Proportionality - Keep only tests for critical functionality and real edge cases - Delete tests for trivial operations, framework behavior, or hypothetical scenarios - For small projects: aim for \<10 meaningful tests per feature - Test code should be shorter than implementation -Idiomatic Code: +### Idiomatic Code - Use conventional patterns for the language - Prioritize readability and maintainability - Apply the principle of least surprise -Ask yourself: "What's the simplest version that actually works reliably?" +## Ask yourself: "What's the simplest version that actually works reliably?" Make the refactoring changes, then summarize what you simplified and why. Always finish by running `just ci-check` and ensuring that all checks and tests remain green. diff --git a/.claude/commands/review-tests.md b/.claude/commands/review-tests.md index e69de29..b8b862f 100644 --- a/.claude/commands/review-tests.md +++ b/.claude/commands/review-tests.md @@ -0,0 +1,9 @@ +# Review the test coverage + +1. Identify untested components +2. Suggest additional test cases +3. Review test quality +4. Recommend testing strategies +5. Make the refactoring changes +6. Summarize what you reviewed and why +7. Always finish by running `just ci-check` and ensuring that all checks and tests remain green. diff --git a/.coderabbitai.yaml b/.coderabbitai.yaml index c0c8dbb..6b37930 100644 --- a/.coderabbitai.yaml +++ b/.coderabbitai.yaml @@ -399,3 +399,36 @@ code_generation: "instructions": "Generate React/TypeScript unit tests using Jest and React Testing Library. Test component rendering, user interactions, state management, and API integration. Focus on accessibility and responsive design testing.", }, ] +issue_enrichment: + auto_enrich: + enabled: true + + planning: + enabled: true + auto_planning: + enabled: false + labels: + - "enhancement" + - "bug" + + labeling: + auto_apply_labels: true + labeling_instructions: + - label: "bug" + instructions: "Apply when the issue reports something that isn't working correctly. Look for error messages, unexpected behavior, crashes, or regressions in existing functionality." + - label: "enhancement" + instructions: "Apply when the issue requests new features or improvements. This includes new CLI options, new LLM providers, new output formats, performance improvements, or usability enhancements." + - label: "documentation" + instructions: "Apply when the issue is about missing, incorrect, or unclear documentation. This includes README updates, API documentation, examples, or inline code comments." + - label: "good first issue" + instructions: "Apply when the issue is well-scoped, has clear requirements, and doesn't require deep knowledge of the codebase. Good for newcomers to contribute." + - label: "help wanted" + instructions: "Apply when the issue needs community input, additional expertise, or the maintainers explicitly request assistance." + - label: "question" + instructions: "Apply when the issue is asking for clarification, guidance, or discussion rather than reporting a bug or requesting a feature." + - label: "duplicate" + instructions: "Apply when this issue duplicates an existing open or recently closed issue. Reference the original issue." + - label: "invalid" + instructions: "Apply when the issue doesn't provide enough information, is not related to this project, or cannot be reproduced." + - label: "wontfix" + instructions: "Apply when the issue describes behavior that is working as intended, is out of scope for the project, or conflicts with project goals." diff --git a/.github/workflows/audit.yml b/.github/workflows/audit.yml index cd02da9..52de326 100644 --- a/.github/workflows/audit.yml +++ b/.github/workflows/audit.yml @@ -22,6 +22,6 @@ jobs: contents: read issues: write steps: - - uses: actions/checkout@v5 + - uses: actions/checkout@v6 - uses: actions-rust-lang/audit@v1 name: Audit Rust Dependencies diff --git a/.github/workflows/ci.yml b/.github/workflows/ci.yml index d41cdc5..f0c20d2 100644 --- a/.github/workflows/ci.yml +++ b/.github/workflows/ci.yml @@ -28,7 +28,7 @@ jobs: rust: ${{ steps.filter.outputs.rust }} docs: ${{ steps.filter.outputs.docs }} steps: - - uses: actions/checkout@v5 + - uses: actions/checkout@v6 - uses: dorny/paths-filter@v3 id: filter with: @@ -51,49 +51,34 @@ jobs: # Code quality checks - always run quality: runs-on: ubuntu-latest + needs: changes + if: needs.changes.outputs.rust == 'true' steps: - - uses: actions/checkout@v5 - - uses: dtolnay/rust-toolchain@1.90 + - uses: actions/checkout@v6 + - uses: dtolnay/rust-toolchain@1.91.0 with: - components: rustfmt, clippy - - - name: Install just - uses: extractions/setup-just@v3 - - - name: Install protobuf - uses: arduino/setup-protoc@v3 + components: llvm-tools, cargo, rustfmt, clippy + - uses: jdx/mise-action@v3 with: - repo-token: ${{ secrets.GITHUB_TOKEN }} - - - name: Cache Rust dependencies - uses: Swatinem/rust-cache@v2 - - - name: Rustfmt Check - uses: actions-rust-lang/rustfmt@v1 + install: true + cache: true + github_token: ${{ secrets.GITHUB_TOKEN }} - - name: Run clippy (all features) - run: cargo clippy --all-targets --all-features -- -D warnings + - name: Check formatting + run: just lint-rust test: runs-on: ubuntu-latest - needs: [changes, quality] + needs: changes if: needs.changes.outputs.rust == 'true' steps: - - uses: actions/checkout@v5 - - - name: Setup Rust - uses: dtolnay/rust-toolchain@1.90 + - uses: actions/checkout@v6 + - uses: jdx/mise-action@v3 with: - components: rustfmt, clippy + install: true + cache: true + github_token: ${{ secrets.GITHUB_TOKEN }} - - name: Install cargo-nextest - uses: taiki-e/install-action@v2 - with: - tool: cargo-nextest - - name: Install protobuf - uses: arduino/setup-protoc@v3 - with: - repo-token: ${{ secrets.GITHUB_TOKEN }} - name: Run tests (all features) run: cargo nextest run --all-features @@ -110,8 +95,9 @@ jobs: # Primary Support - Linux # - os: ubuntu-24.04 # platform: "Linux" - - os: ubuntu-24.04-arm - platform: "Linux" + # Disabled: ARM runners are flaky on GitHub Actions + # - os: ubuntu-24.04-arm + # platform: "Linux" # Primary Support - macOS (using available runners) - os: macos-15 platform: "macOS" @@ -134,20 +120,12 @@ jobs: needs: [changes, quality] if: needs.changes.outputs.rust == 'true' steps: - - uses: actions/checkout@v5 - - - name: Setup Rust - uses: dtolnay/rust-toolchain@1.90 - - - name: Install cargo-nextest - uses: taiki-e/install-action@v2 - with: - tool: cargo-nextest - - - name: Install protobuf - uses: arduino/setup-protoc@v3 + - uses: actions/checkout@v6 + - uses: jdx/mise-action@v3 with: - repo-token: ${{ secrets.GITHUB_TOKEN }} + install: true + cache: true + github_token: ${{ secrets.GITHUB_TOKEN }} # Run tests and build the release binary - run: cargo nextest run --all-features @@ -159,22 +137,12 @@ jobs: needs: [changes, test, test-cross-platform, quality] if: needs.changes.outputs.rust == 'true' steps: - - uses: actions/checkout@v5 - - - name: Setup Rust - uses: dtolnay/rust-toolchain@1.90 - with: - components: llvm-tools - - - name: Install cargo-llvm-cov - uses: taiki-e/install-action@v2 - with: - tool: cargo-llvm-cov - - - name: Install protobuf - uses: arduino/setup-protoc@v3 + - uses: actions/checkout@v6 + - uses: jdx/mise-action@v3 with: - repo-token: ${{ secrets.GITHUB_TOKEN }} + install: true + cache: true + github_token: ${{ secrets.GITHUB_TOKEN }} - name: Generate coverage run: cargo llvm-cov --all-features --no-report diff --git a/.github/workflows/codeql.yml b/.github/workflows/codeql.yml index 5693fb1..b03e07f 100644 --- a/.github/workflows/codeql.yml +++ b/.github/workflows/codeql.yml @@ -19,10 +19,12 @@ jobs: name: CodeQL Analyze runs-on: ubuntu-22.04 steps: - - uses: actions/checkout@v5 - - - name: Setup Rust - uses: dtolnay/rust-toolchain@1.90 + - uses: actions/checkout@v6 + - uses: jdx/mise-action@v3 + with: + install: true + cache: true + github_token: ${{ secrets.GITHUB_TOKEN }} - uses: github/codeql-action/init@v3 with: diff --git a/.github/workflows/copilot-setup-steps.yml b/.github/workflows/copilot-setup-steps.yml index 39f76d4..84d613c 100644 --- a/.github/workflows/copilot-setup-steps.yml +++ b/.github/workflows/copilot-setup-steps.yml @@ -28,37 +28,12 @@ jobs: # You can define any steps you want, and they will run before the agent starts. # If you do not check out your code, Copilot will do this for you. steps: - - name: Checkout code - uses: actions/checkout@v5 - - - uses: dtolnay/rust-toolchain@1.90 - - - name: Install just task runner - uses: taiki-e/install-action@v2 - with: - tool: just - - - name: Set up Python for pre-commit - uses: actions/setup-python@v6 - with: - python-version: "3.13" - - - name: Install cargo tools - uses: taiki-e/install-action@v2 + - uses: actions/checkout@v6 + - uses: jdx/mise-action@v3 with: - tool: cargo-nextest,cargo-llvm-cov,cargo-audit,cargo-deny,cargo-dist,mdbook - - - name: Install mdbook plugins - uses: taiki-e/install-action@v2 - with: - tool: mdbook-admonish,mdbook-mermaid,mdbook-linkcheck,mdbook-toc,mdbook-open-on-gh,mdbook-tabs,mdbook-i18n-helpers - - - name: Install protobuf - uses: arduino/setup-protoc@v3 - - - name: Run just install - run: | - just install-tools + install: true + cache: true + github_token: ${{ secrets.GITHUB_TOKEN }} - name: Setup summary run: | diff --git a/.github/workflows/docs.yml b/.github/workflows/docs.yml index a3a7dab..acb0fc8 100644 --- a/.github/workflows/docs.yml +++ b/.github/workflows/docs.yml @@ -24,18 +24,12 @@ jobs: build: runs-on: ubuntu-latest steps: - - name: Checkout - uses: actions/checkout@v5 - - - name: Setup Rust - uses: dtolnay/rust-toolchain@1.90 - with: - components: rustfmt, clippy - - - name: Install protobuf - uses: arduino/setup-protoc@v3 + - uses: actions/checkout@v6 + - uses: jdx/mise-action@v3 with: - repo-token: ${{ secrets.GITHUB_TOKEN }} + install: true + cache: true + github_token: ${{ secrets.GITHUB_TOKEN }} - name: Setup mdBook uses: jontze/action-mdbook@v4 diff --git a/.github/workflows/security.yml b/.github/workflows/security.yml index 419951b..6794838 100644 --- a/.github/workflows/security.yml +++ b/.github/workflows/security.yml @@ -24,14 +24,12 @@ jobs: audit: runs-on: ubuntu-latest steps: - - uses: actions/checkout@v5 - - - name: Setup Rust - uses: dtolnay/rust-toolchain@1.90 - - - uses: taiki-e/install-action@v2 + - uses: actions/checkout@v6 + - uses: jdx/mise-action@v3 with: - tool: cargo-deny,cargo-outdated,cargo-dist + install: true + cache: true + github_token: ${{ secrets.GITHUB_TOKEN }} - name: Run cargo deny check run: cargo deny check --config deny.ci.toml diff --git a/.serena/project.yml b/.serena/project.yml index bc24e56..9303080 100644 --- a/.serena/project.yml +++ b/.serena/project.yml @@ -1,15 +1,18 @@ # list of languages for which language servers are started; choose from: # al bash clojure cpp csharp csharp_omnisharp -# dart elixir elm erlang fortran go -# haskell java julia kotlin lua markdown -# nix perl php python python_jedi r -# rego ruby ruby_solargraph rust scala swift -# terraform typescript typescript_vts yaml zig +# dart elixir elm erlang fortran fsharp +# go groovy haskell java julia kotlin +# lua markdown nix pascal perl php +# powershell python python_jedi r rego ruby +# ruby_solargraph rust scala swift terraform toml +# typescript typescript_vts yaml zig # Note: # - For C, use cpp # - For JavaScript, use typescript +# - For Free Pascal / Lazarus, use pascal # Special requirements: # - csharp: Requires the presence of a .sln file in the project folder. +# - pascal: Requires Free Pascal Compiler (fpc) and optionally Lazarus. # When using multiple languages, the first language server that supports a given file will be used for that file. # The first language is the default language and the respective language server will be used as a fallback. # Note that when using the JetBrains backend, language servers are not used and this list is correspondingly ignored. @@ -80,5 +83,5 @@ excluded_tools: [] # (contrary to the memories, which are loaded on demand). initial_prompt: "" -project_name: "DaemonEye" +project_name: "daemoneye" included_optional_tools: [] diff --git a/.vscode/settings.json b/.vscode/settings.json index a475288..7c2b7ff 100644 --- a/.vscode/settings.json +++ b/.vscode/settings.json @@ -21,5 +21,12 @@ "git.rebaseWhenSync": true, "git.replaceTagsWhenPull": true, "githubPullRequests.codingAgent.uiIntegration": true, - "kiroAgent.configureMCP": "Enabled" + "kiroAgent.configureMCP": "Enabled", + "ruff.path": [ + "${workspaceFolder}/.vscode/mise-tools/ruff" + ], + "ruff.interpreter": [ + "${workspaceFolder}/.vscode/mise-tools/python" + ], + "python.defaultInterpreterPath": "${workspaceFolder}/.vscode/mise-tools/python" } \ No newline at end of file diff --git a/Cargo.lock b/Cargo.lock index 121ac7b..efd697f 100644 --- a/Cargo.lock +++ b/Cargo.lock @@ -11,6 +11,15 @@ dependencies = [ "memchr", ] +[[package]] +name = "alloca" +version = "0.4.0" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "e5a7d05ea6aea7e9e64d25b9156ba2fee3fdd659e34e41063cd2fc7cd020d7f4" +dependencies = [ + "cc", +] + [[package]] name = "android_system_properties" version = "0.1.5" @@ -96,9 +105,9 @@ checksum = "7c02d123df017efcdfbd739ef81735b36c5ba83ec3c59c80a9d7ecc718f92e50" [[package]] name = "assert_cmd" -version = "2.1.1" +version = "2.1.2" source = "registry+https://github.com/rust-lang/crates.io-index" -checksum = "bcbb6924530aa9e0432442af08bbcafdad182db80d2e560da42a6d442535bf85" +checksum = "9c5bcfa8749ac45dd12cb11055aeeb6b27a3895560d60d71e3c23bf979e60514" dependencies = [ "anstyle", "bstr", @@ -109,28 +118,6 @@ dependencies = [ "wait-timeout", ] -[[package]] -name = "async-stream" -version = "0.3.6" -source = "registry+https://github.com/rust-lang/crates.io-index" -checksum = "0b5a71a6f37880a80d1d7f19efd781e4b5de42c88f0722cc13bcb6cc2cfe8476" -dependencies = [ - "async-stream-impl", - "futures-core", - "pin-project-lite", -] - -[[package]] -name = "async-stream-impl" -version = "0.3.6" -source = "registry+https://github.com/rust-lang/crates.io-index" -checksum = "c7c24de15d275a1ecfd47a380fb4d5ec9bfe0933f309ed5e705b775596a3574d" -dependencies = [ - "proc-macro2", - "quote", - "syn", -] - [[package]] name = "async-trait" version = "0.1.89" @@ -152,36 +139,19 @@ dependencies = [ ] [[package]] -name = "autocfg" -version = "1.5.0" -source = "registry+https://github.com/rust-lang/crates.io-index" -checksum = "c08606f8c3cbf4ce6ec8e28fb0014a2c086708fe954eaa885384a6165172e7e8" - -[[package]] -name = "base64" -version = "0.22.1" -source = "registry+https://github.com/rust-lang/crates.io-index" -checksum = "72b3254f16251a8381aa12e40e3c4d2f0199f8c6508fbecb9d91f575e0fbb8c6" - -[[package]] -name = "bincode" -version = "2.0.1" +name = "atomic-polyfill" +version = "1.0.3" source = "registry+https://github.com/rust-lang/crates.io-index" -checksum = "36eaf5d7b090263e8150820482d5d93cd964a81e4019913c972f4edcc6edb740" +checksum = "8cf2bce30dfe09ef0bfaef228b9d414faaf7e563035494d7fe092dba54b300f4" dependencies = [ - "bincode_derive", - "serde", - "unty", + "critical-section", ] [[package]] -name = "bincode_derive" -version = "2.0.1" +name = "autocfg" +version = "1.5.0" source = "registry+https://github.com/rust-lang/crates.io-index" -checksum = "bf95709a440f45e986983918d0e8a1f30a9b1df04918fc828670606804ac3c09" -dependencies = [ - "virtue", -] +checksum = "c08606f8c3cbf4ce6ec8e28fb0014a2c086708fe954eaa885384a6165172e7e8" [[package]] name = "bit-set" @@ -209,15 +179,16 @@ dependencies = [ [[package]] name = "blake3" -version = "1.8.2" +version = "1.8.3" source = "registry+https://github.com/rust-lang/crates.io-index" -checksum = "3888aaa89e4b2a40fca9848e400f6a658a5a3978de7be858e209cafa8be9a4a0" +checksum = "2468ef7d57b3fb7e16b576e8377cdbde2320c60e1491e961d11da40fc4f02a2d" dependencies = [ "arrayref", "arrayvec", "cc", "cfg-if", "constant_time_eq", + "cpufeatures", ] [[package]] @@ -252,17 +223,23 @@ version = "1.24.0" source = "registry+https://github.com/rust-lang/crates.io-index" checksum = "1fbdf580320f38b612e485521afda1ee26d10cc9884efaaa750d383e13e3c5f4" +[[package]] +name = "byteorder" +version = "1.5.0" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "1fd0f2584146f6f2ef48085050886acf353beff7305ebd1ae69500e27c67f64b" + [[package]] name = "bytes" -version = "1.10.1" +version = "1.11.0" source = "registry+https://github.com/rust-lang/crates.io-index" -checksum = "d71b6127be86fdcfddb610f7182ac57211d4b18a3e9c82eb2d17662f2227ad6a" +checksum = "b35204fbdc0b3f4446b89fc1ac2cf84a8a68971995d0bf2e925ec7cd960f9cb3" [[package]] name = "camino" -version = "1.2.1" +version = "1.2.2" source = "registry+https://github.com/rust-lang/crates.io-index" -checksum = "276a59bf2b2c967788139340c9f0c5b12d7fd6630315c15c217e559de85d2609" +checksum = "e629a66d692cb9ff1a1c664e41771b3dcaf961985a9774c0eb0bd1b51cf60a48" [[package]] name = "cast" @@ -294,16 +271,16 @@ checksum = "613afe47fcd5fac7ccf1db93babcb082c5994d996f20b8b159f2ad1658eb5724" [[package]] name = "chrono" -version = "0.4.42" +version = "0.4.43" source = "registry+https://github.com/rust-lang/crates.io-index" -checksum = "145052bdd345b87320e369255277e3fb5152762ad123a901ef5c262dd38fe8d2" +checksum = "fac4744fb15ae8337dc853fee7fb3f4e48c0fbaa23d0afe49c447b4fab126118" dependencies = [ "iana-time-zone", "js-sys", "num-traits", "serde", "wasm-bindgen", - "windows-link 0.2.1", + "windows-link", ] [[package]] @@ -335,9 +312,9 @@ dependencies = [ [[package]] name = "clap" -version = "4.5.51" +version = "4.5.55" source = "registry+https://github.com/rust-lang/crates.io-index" -checksum = "4c26d721170e0295f191a69bd9a1f93efcdb0aff38684b61ab5750468972e5f5" +checksum = "3e34525d5bbbd55da2bb745d34b36121baac88d07619a9a09cfcf4a6c0832785" dependencies = [ "clap_builder", "clap_derive", @@ -345,9 +322,9 @@ dependencies = [ [[package]] name = "clap_builder" -version = "4.5.51" +version = "4.5.55" source = "registry+https://github.com/rust-lang/crates.io-index" -checksum = "75835f0c7bf681bfd05abe44e965760fea999a5286c6eb2d59883634fd02011a" +checksum = "59a20016a20a3da95bef50ec7238dbd09baeef4311dcdd38ec15aba69812fb61" dependencies = [ "anstream", "anstyle", @@ -357,9 +334,9 @@ dependencies = [ [[package]] name = "clap_derive" -version = "4.5.49" +version = "4.5.55" source = "registry+https://github.com/rust-lang/crates.io-index" -checksum = "2a0b5487afeab2deb2ff4e03a807ad1a03ac532ff5a2cee5d86884440c7f7671" +checksum = "a92793da1a46a5f2a02a6f4c46c6496b28c43638adea8306fcb0caa1634f24e5" dependencies = [ "heck", "proc-macro2", @@ -373,37 +350,40 @@ version = "0.7.6" source = "registry+https://github.com/rust-lang/crates.io-index" checksum = "a1d728cc89cf3aee9ff92b05e62b19ee65a02b5702cff7d5a377e32c6ae29d8d" +[[package]] +name = "cobs" +version = "0.3.0" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "0fa961b519f0b462e3a3b4a34b64d119eeaca1d59af726fe450bbba07a9fc0a1" +dependencies = [ + "thiserror", +] + [[package]] name = "collector-core" version = "0.1.0" dependencies = [ "anyhow", "async-trait", - "base64", - "bincode", "bitflags", - "chrono", "criterion", "crossbeam", "daemoneye-eventbus", "daemoneye-lib", "futures", "hostname-validator", - "humantime-serde", - "md5", "parking_lot", + "postcard", "proptest", "prost", "rand", - "regex", "serde", "serde_json", "sqlparser", "tempfile", - "thiserror 2.0.17", + "thiserror", "tokio", "tokio-test", - "toml 0.9.8", "tracing", "tracing-subscriber", "uuid", @@ -429,9 +409,9 @@ dependencies = [ [[package]] name = "constant_time_eq" -version = "0.3.1" +version = "0.4.2" source = "registry+https://github.com/rust-lang/crates.io-index" -checksum = "7c74b8349d32d297c9134b8c88677813a227df8f779daa29bfc29c183fe3dca6" +checksum = "3d52eff69cd5e647efe296129160853a42795992097e8af39800e1060caeea9b" [[package]] name = "core-foundation" @@ -469,10 +449,11 @@ dependencies = [ [[package]] name = "criterion" -version = "0.7.0" +version = "0.8.1" source = "registry+https://github.com/rust-lang/crates.io-index" -checksum = "e1c047a62b0cc3e145fa84415a3191f628e980b194c2755aa12300a4e6cbd928" +checksum = "4d883447757bb0ee46f233e9dc22eb84d93a9508c9b868687b274fc431d886bf" dependencies = [ + "alloca", "anes", "cast", "ciborium", @@ -481,6 +462,7 @@ dependencies = [ "itertools 0.13.0", "num-traits", "oorandom", + "page_size", "plotters", "rayon", "regex", @@ -492,14 +474,20 @@ dependencies = [ [[package]] name = "criterion-plot" -version = "0.6.0" +version = "0.8.1" source = "registry+https://github.com/rust-lang/crates.io-index" -checksum = "9b1bcc0dc7dfae599d84ad0b1a55f80cde8af3725da8313b528da95ef783e338" +checksum = "ed943f81ea2faa8dcecbbfa50164acf95d555afec96a27871663b300e387b2e4" dependencies = [ "cast", "itertools 0.13.0", ] +[[package]] +name = "critical-section" +version = "1.2.0" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "790eea4361631c5e7d22598ecd5723ff611904e3344ce8720784c93e3d83d40b" + [[package]] name = "crossbeam" version = "0.8.4" @@ -579,25 +567,17 @@ dependencies = [ "anyhow", "assert_cmd", "async-trait", - "bytes", - "chrono", "clap", "collector-core", "daemoneye-eventbus", "daemoneye-lib", - "futures", "insta", - "interprocess", "predicates", - "prost", "prost-build", - "prost-types", - "redb", "serde", "serde_json", - "sqlparser", "tempfile", - "thiserror 2.0.17", + "thiserror", "tokio", "tracing", "tracing-subscriber", @@ -608,20 +588,13 @@ dependencies = [ name = "daemoneye-cli" version = "0.1.0" dependencies = [ - "anyhow", "assert_cmd", - "chrono", "clap", "daemoneye-lib", "insta", "predicates", - "redb", - "serde", "serde_json", "tempfile", - "thiserror 2.0.17", - "tokio", - "tracing", "tracing-subscriber", ] @@ -631,19 +604,18 @@ version = "0.1.0" dependencies = [ "anyhow", "async-trait", - "bincode", "blake3", - "chrono", "criterion", "insta", "interprocess", "nix", + "postcard", "rand", "regex", "serde", "serde_json", "tempfile", - "thiserror 2.0.17", + "thiserror", "tokio", "tokio-test", "toml 0.9.8", @@ -683,7 +655,7 @@ dependencies = [ "sqlparser", "sysinfo", "tempfile", - "thiserror 2.0.17", + "thiserror", "tokio", "toml 0.9.8", "tracing", @@ -709,11 +681,11 @@ dependencies = [ [[package]] name = "directories" -version = "5.0.1" +version = "6.0.0" source = "registry+https://github.com/rust-lang/crates.io-index" -checksum = "9a49173b84e034382284f27f1af4dcbbd231ffa358c0fe316541a7337f376a35" +checksum = "16f5094c54661b38d03bd7e50df373292118db60b585c08a411c6d840017fe7d" dependencies = [ - "dirs-sys 0.4.1", + "dirs-sys", ] [[package]] @@ -722,19 +694,7 @@ version = "6.0.0" source = "registry+https://github.com/rust-lang/crates.io-index" checksum = "c3e8aa94d75141228480295a7d0e7feb620b1a5ad9f12bc40be62411e38cce4e" dependencies = [ - "dirs-sys 0.5.0", -] - -[[package]] -name = "dirs-sys" -version = "0.4.1" -source = "registry+https://github.com/rust-lang/crates.io-index" -checksum = "520f05a5cbd335fae5a99ff7a6ab8627577660ee5cfd6a94a6a929b52ff0321c" -dependencies = [ - "libc", - "option-ext", - "redox_users 0.4.6", - "windows-sys 0.48.0", + "dirs-sys", ] [[package]] @@ -745,7 +705,7 @@ checksum = "e01a3366d27ee9890022452ee61b2b63a67e6f13f58900b651ff5665f0bb1fab" dependencies = [ "libc", "option-ext", - "redox_users 0.5.2", + "redox_users", "windows-sys 0.61.2", ] @@ -761,6 +721,18 @@ version = "1.15.0" source = "registry+https://github.com/rust-lang/crates.io-index" checksum = "48c757948c5ede0e46177b7add2e67155f70e33c07fea8284df6576da70b3719" +[[package]] +name = "embedded-io" +version = "0.4.0" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "ef1a6892d9eef45c8fa6b9e0086428a2cca8491aca8f787c534a3d6d0bcb3ced" + +[[package]] +name = "embedded-io" +version = "0.6.1" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "edd0f118536f44f5ccd48bcb8b111bdc3de888b58c74639dfb034a357d0f206d" + [[package]] name = "encode_unicode" version = "1.0.0" @@ -831,6 +803,12 @@ version = "1.0.7" source = "registry+https://github.com/rust-lang/crates.io-index" checksum = "3f9eec918d3f24069decb9af1554cad7c880e2da24a9afd88aca000531ab82c1" +[[package]] +name = "foldhash" +version = "0.1.5" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "d9c4f5dac5e15c24eb999c26181a6ca40b39fe946cbe4c263c7209467bc83af2" + [[package]] name = "futures" version = "0.3.31" @@ -938,7 +916,7 @@ checksum = "335ff9f135e4384c8150d6f27c6daed433577f86b4750418338c01a1a2528592" dependencies = [ "cfg-if", "libc", - "wasi", + "wasi 0.11.1+wasi-snapshot-preview1", ] [[package]] @@ -964,12 +942,44 @@ dependencies = [ "zerocopy", ] +[[package]] +name = "hash32" +version = "0.2.1" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "b0c35f58762feb77d74ebe43bdbc3210f09be9fe6742234d573bacc26ed92b67" +dependencies = [ + "byteorder", +] + +[[package]] +name = "hashbrown" +version = "0.15.5" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "9229cfe53dfd69f0609a49f65461bd93001ea1ef889cd5529dd176593f5338a1" +dependencies = [ + "foldhash", +] + [[package]] name = "hashbrown" version = "0.16.0" source = "registry+https://github.com/rust-lang/crates.io-index" checksum = "5419bdc4f6a9207fbeba6d11b604d481addf78ecd10c11ad51e76c2f6482748d" +[[package]] +name = "heapless" +version = "0.7.17" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "cdc6457c0eb62c71aac4bc17216026d8410337c4126773b9c5daba343f17964f" +dependencies = [ + "atomic-polyfill", + "hash32", + "rustc_version", + "serde", + "spin", + "stable_deref_trait", +] + [[package]] name = "heck" version = "0.5.0" @@ -982,22 +992,6 @@ version = "1.1.1" source = "registry+https://github.com/rust-lang/crates.io-index" checksum = "f558a64ac9af88b5ba400d99b579451af0d39c6d360980045b91aac966d705e2" -[[package]] -name = "humantime" -version = "2.3.0" -source = "registry+https://github.com/rust-lang/crates.io-index" -checksum = "135b12329e5e3ce057a9f972339ea52bc954fe1e9358ef27f95e89716fbc5424" - -[[package]] -name = "humantime-serde" -version = "1.1.1" -source = "registry+https://github.com/rust-lang/crates.io-index" -checksum = "57a3db5ea5923d99402c94e9feb261dc5ee9b4efa158b0315f788cf549cc200c" -dependencies = [ - "humantime", - "serde", -] - [[package]] name = "iana-time-zone" version = "0.1.64" @@ -1010,7 +1004,7 @@ dependencies = [ "js-sys", "log", "wasm-bindgen", - "windows-core 0.62.2", + "windows-core", ] [[package]] @@ -1029,7 +1023,7 @@ source = "registry+https://github.com/rust-lang/crates.io-index" checksum = "4b0f83760fb341a774ed326568e19f5a863af4a952def8c39f9ab92fd95b88e5" dependencies = [ "equivalent", - "hashbrown", + "hashbrown 0.16.0", ] [[package]] @@ -1040,15 +1034,16 @@ checksum = "c8fae54786f62fb2918dcfae3d568594e50eb9b5c25bf04371af6fe7516452fb" [[package]] name = "insta" -version = "1.43.2" +version = "1.46.1" source = "registry+https://github.com/rust-lang/crates.io-index" -checksum = "46fdb647ebde000f43b5b53f773c30cf9b0cb4300453208713fa38b2c70935a0" +checksum = "248b42847813a1550dafd15296fd9748c651d0c32194559dbc05d804d54b21e8" dependencies = [ "console", "once_cell", "regex", "serde", "similar", + "tempfile", ] [[package]] @@ -1120,13 +1115,13 @@ checksum = "2874a2af47a2325c2001a6e6fad9b16a53b802102b528163885171cf92b15976" [[package]] name = "libredox" -version = "0.1.10" +version = "0.1.12" source = "registry+https://github.com/rust-lang/crates.io-index" -checksum = "416f7e718bdb06000964960ffa43b4335ad4012ae8b99060261aa4a8088d5ccb" +checksum = "3d0b95e02c851351f877147b7deea7b1afb1df71b63aa5f8270716e0c5720616" dependencies = [ "bitflags", "libc", - "redox_syscall", + "redox_syscall 0.7.0", ] [[package]] @@ -1159,12 +1154,6 @@ dependencies = [ "regex-automata", ] -[[package]] -name = "md5" -version = "0.8.0" -source = "registry+https://github.com/rust-lang/crates.io-index" -checksum = "ae960838283323069879657ca3de837e9f7bbb4c7bf6ea7f1b290d5e9476d2e0" - [[package]] name = "memchr" version = "2.7.6" @@ -1187,7 +1176,7 @@ source = "registry+https://github.com/rust-lang/crates.io-index" checksum = "78bed444cc8a2160f01cbcf811ef18cac863ad68ae8ca62092e8db51d51c761c" dependencies = [ "libc", - "wasi", + "wasi 0.11.1+wasi-snapshot-preview1", "windows-sys 0.59.0", ] @@ -1286,6 +1275,16 @@ version = "0.2.0" source = "registry+https://github.com/rust-lang/crates.io-index" checksum = "04744f49eae99ab78e0d5c0b603ab218f515ea8cfe5a456d7629ad883a3b6e7d" +[[package]] +name = "page_size" +version = "0.6.0" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "30d5b2194ed13191c1999ae0704b7839fb18384fa22e49b57eeaa97d79ce40da" +dependencies = [ + "libc", + "winapi", +] + [[package]] name = "parking_lot" version = "0.12.5" @@ -1304,9 +1303,9 @@ checksum = "2621685985a2ebf1c516881c026032ac7deafcda1a2c9b7850dc81e3dfcb64c1" dependencies = [ "cfg-if", "libc", - "redox_syscall", + "redox_syscall 0.5.18", "smallvec", - "windows-link 0.2.1", + "windows-link", ] [[package]] @@ -1334,11 +1333,12 @@ dependencies = [ [[package]] name = "petgraph" -version = "0.7.1" +version = "0.8.3" source = "registry+https://github.com/rust-lang/crates.io-index" -checksum = "3672b37090dbd86368a4145bc067582552b29c27377cad4e0a306c97f9bd7772" +checksum = "8701b58ea97060d5e5b155d383a69952a60943f0e6dfe30b04c287beb0b27455" dependencies = [ "fixedbitset", + "hashbrown 0.15.5", "indexmap", ] @@ -1382,6 +1382,19 @@ dependencies = [ "plotters-backend", ] +[[package]] +name = "postcard" +version = "1.1.3" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "6764c3b5dd454e283a30e6dfe78e9b31096d9e32036b5d1eaac7a6119ccb9a24" +dependencies = [ + "cobs", + "embedded-io 0.4.0", + "embedded-io 0.6.1", + "heapless", + "serde", +] + [[package]] name = "ppv-lite86" version = "0.2.21" @@ -1460,36 +1473,31 @@ dependencies = [ "anyhow", "assert_cmd", "async-trait", - "bincode", - "bytes", "chrono", "clap", "collector-core", "crc32c", "criterion", + "daemoneye-eventbus", "daemoneye-lib", "insta", - "interprocess", + "postcard", "predicates", "proptest", - "prost", "prost-build", - "prost-types", - "redb", "security-framework", "serde", "serde_json", - "sha2", "sysinfo", "tempfile", - "thiserror 2.0.17", + "thiserror", "tokio", "tracing", "tracing-subscriber", "tracing-test", "uuid", "uzers", - "whoami", + "whoami 2.1.0", "windows", "windows-service", "winreg", @@ -1516,9 +1524,9 @@ dependencies = [ [[package]] name = "prost" -version = "0.14.1" +version = "0.14.3" source = "registry+https://github.com/rust-lang/crates.io-index" -checksum = "7231bd9b3d3d33c86b58adbac74b5ec0ad9f496b19d22801d773636feaa95f3d" +checksum = "d2ea70524a2f82d518bce41317d0fae74151505651af45faf1ffbd6fd33f0568" dependencies = [ "bytes", "prost-derive", @@ -1526,15 +1534,14 @@ dependencies = [ [[package]] name = "prost-build" -version = "0.14.1" +version = "0.14.3" source = "registry+https://github.com/rust-lang/crates.io-index" -checksum = "ac6c3320f9abac597dcbc668774ef006702672474aad53c6d596b62e487b40b1" +checksum = "343d3bd7056eda839b03204e68deff7d1b13aba7af2b2fd16890697274262ee7" dependencies = [ "heck", "itertools 0.14.0", "log", "multimap", - "once_cell", "petgraph", "prettyplease", "prost", @@ -1546,9 +1553,9 @@ dependencies = [ [[package]] name = "prost-derive" -version = "0.14.1" +version = "0.14.3" source = "registry+https://github.com/rust-lang/crates.io-index" -checksum = "9120690fafc389a67ba3803df527d0ec9cbbc9cc45e4cc20b332996dfb672425" +checksum = "27c6023962132f4b30eb4c172c91ce92d933da334c59c23cddee82358ddafb0b" dependencies = [ "anyhow", "itertools 0.14.0", @@ -1559,9 +1566,9 @@ dependencies = [ [[package]] name = "prost-types" -version = "0.14.1" +version = "0.14.3" source = "registry+https://github.com/rust-lang/crates.io-index" -checksum = "b9b4db3d6da204ed77bb26ba83b6122a73aeb2e87e25fbf7ad2e84c4ccbf8f72" +checksum = "8991c4cbdb8bc5b11f0b074ffe286c30e523de90fee5ba8132f1399f23cb3dd7" dependencies = [ "prost", ] @@ -1699,14 +1706,12 @@ dependencies = [ ] [[package]] -name = "redox_users" -version = "0.4.6" +name = "redox_syscall" +version = "0.7.0" source = "registry+https://github.com/rust-lang/crates.io-index" -checksum = "ba009ff324d1fc1b900bd1fdb31564febe58a8ccc8a6fdbb93b543d33b13ca43" +checksum = "49f3fe0889e69e2ae9e41f4d6c4c0181701d00e4697b356fb1f74173a5e0ee27" dependencies = [ - "getrandom 0.2.16", - "libredox", - "thiserror 1.0.69", + "bitflags", ] [[package]] @@ -1717,7 +1722,7 @@ checksum = "a4e608c6638b9c18977b00b475ac1f28d14e84b27d8d42f70e0bf1e3dec127ac" dependencies = [ "getrandom 0.2.16", "libredox", - "thiserror 2.0.17", + "thiserror", ] [[package]] @@ -1769,9 +1774,9 @@ dependencies = [ [[package]] name = "rustix" -version = "1.1.2" +version = "1.1.3" source = "registry+https://github.com/rust-lang/crates.io-index" -checksum = "cd15f8a2c5551a84d56efdc1cd049089e409ac19a3072d5037a17fd70719ff3e" +checksum = "146c9e247ccc180c1f61615433868c99f3de3ae256a30a43b49f67c2d9171f34" dependencies = [ "bitflags", "errno", @@ -1798,12 +1803,6 @@ dependencies = [ "wait-timeout", ] -[[package]] -name = "ryu" -version = "1.0.20" -source = "registry+https://github.com/rust-lang/crates.io-index" -checksum = "28d3b2b1366ec20994f1fd18c3c594f05c5dd4bc44d8bb0c1c632c8d6829481f" - [[package]] name = "same-file" version = "1.0.6" @@ -1880,15 +1879,15 @@ dependencies = [ [[package]] name = "serde_json" -version = "1.0.145" +version = "1.0.149" source = "registry+https://github.com/rust-lang/crates.io-index" -checksum = "402a6f66d8c709116cf22f558eab210f5a50187f702eb4d7e5ef38d9a7f1c79c" +checksum = "83fc039473c5595ace860d8c4fafa220ff474b3fc6bfdb4293327f1a37e94d86" dependencies = [ "itoa", "memchr", - "ryu", "serde", "serde_core", + "zmij", ] [[package]] @@ -1972,16 +1971,31 @@ dependencies = [ "windows-sys 0.60.2", ] +[[package]] +name = "spin" +version = "0.9.8" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "6980e8d7511241f8acf4aebddbb1ff938df5eebe98691418c4468d0b72a96a67" +dependencies = [ + "lock_api", +] + [[package]] name = "sqlparser" -version = "0.59.0" +version = "0.60.0" source = "registry+https://github.com/rust-lang/crates.io-index" -checksum = "4591acadbcf52f0af60eafbb2c003232b2b4cd8de5f0e9437cb8b1b59046cc0f" +checksum = "505aa16b045c4c1375bf5f125cce3813d0176325bfe9ffc4a903f423de7774ff" dependencies = [ "log", "recursive", ] +[[package]] +name = "stable_deref_trait" +version = "1.2.1" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "6ce2be8dc25455e1f91df71bfa12ad37d7af1092ae736f3a6cd0e37bc7810596" + [[package]] name = "stacker" version = "0.1.22" @@ -2014,9 +2028,9 @@ dependencies = [ [[package]] name = "sysinfo" -version = "0.37.2" +version = "0.38.0" source = "registry+https://github.com/rust-lang/crates.io-index" -checksum = "16607d5caffd1c07ce073528f9ed972d88db15dd44023fa57142963be3feb11f" +checksum = "fe840c5b1afe259a5657392a4dbb74473a14c8db999c3ec2f4ae812e028a94da" dependencies = [ "libc", "memchr", @@ -2028,9 +2042,9 @@ dependencies = [ [[package]] name = "tempfile" -version = "3.23.0" +version = "3.24.0" source = "registry+https://github.com/rust-lang/crates.io-index" -checksum = "2d31c77bdf42a745371d260a26ca7163f1e0924b64afa0b688e61b5a9fa02f16" +checksum = "655da9c7eb6305c55742045d5a8d2037996d61d8de95806335c7c86ce0f82e9c" dependencies = [ "fastrand", "getrandom 0.3.4", @@ -2047,38 +2061,18 @@ checksum = "8f50febec83f5ee1df3015341d8bd429f2d1cc62bcba7ea2076759d315084683" [[package]] name = "thiserror" -version = "1.0.69" +version = "2.0.18" source = "registry+https://github.com/rust-lang/crates.io-index" -checksum = "b6aaf5339b578ea85b50e080feb250a3e8ae8cfcdff9a461c9ec2904bc923f52" +checksum = "4288b5bcbc7920c07a1149a35cf9590a2aa808e0bc1eafaade0b80947865fbc4" dependencies = [ - "thiserror-impl 1.0.69", -] - -[[package]] -name = "thiserror" -version = "2.0.17" -source = "registry+https://github.com/rust-lang/crates.io-index" -checksum = "f63587ca0f12b72a0600bcba1d40081f830876000bb46dd2337a3051618f4fc8" -dependencies = [ - "thiserror-impl 2.0.17", + "thiserror-impl", ] [[package]] name = "thiserror-impl" -version = "1.0.69" +version = "2.0.18" source = "registry+https://github.com/rust-lang/crates.io-index" -checksum = "4fee6c4efc90059e10f81e6d42c60a18f76588c3d74cb83a0b242a2b6c7504c1" -dependencies = [ - "proc-macro2", - "quote", - "syn", -] - -[[package]] -name = "thiserror-impl" -version = "2.0.17" -source = "registry+https://github.com/rust-lang/crates.io-index" -checksum = "3ff15c8ecd7de3849db632e14d18d2571fa09dfc5ed93479bc4485c7a517c913" +checksum = "ebc4ee7f67670e9b64d05fa4253e753e016c6c95ff35b89b7941d6b856dec1d5" dependencies = [ "proc-macro2", "quote", @@ -2106,9 +2100,9 @@ dependencies = [ [[package]] name = "tokio" -version = "1.48.0" +version = "1.49.0" source = "registry+https://github.com/rust-lang/crates.io-index" -checksum = "ff360e02eab121e0bc37a2d3b4d4dc622e6eda3a8e5253d5435ecf5bd4c68408" +checksum = "72a2903cd7736441aac9df9d7688bd0ce48edccaadf181c3b90be801e81d3d86" dependencies = [ "bytes", "libc", @@ -2144,12 +2138,10 @@ dependencies = [ [[package]] name = "tokio-test" -version = "0.4.4" +version = "0.4.5" source = "registry+https://github.com/rust-lang/crates.io-index" -checksum = "2468baabc3311435b55dd935f702f42cd1b8abb7e754fb7dfb16bd36aa88f9f7" +checksum = "3f6d24790a10a7af737693a3e8f1d03faef7e6ca0cc99aae5066f533766de545" dependencies = [ - "async-stream", - "bytes", "futures-core", "tokio", "tokio-stream", @@ -2237,9 +2229,9 @@ checksum = "df8b2b54733674ad286d16267dcfc7a71ed5c776e4ac7aa3c3e2561f7c637bf2" [[package]] name = "tracing" -version = "0.1.41" +version = "0.1.44" source = "registry+https://github.com/rust-lang/crates.io-index" -checksum = "784e0ac535deb450455cbfa28a6f0df145ea1bb7ae51b821cf5e7927fdcfbdd0" +checksum = "63e71662fa4b2a2c3a26f570f037eb95bb1f85397f3cd8076caed2f026a6d100" dependencies = [ "pin-project-lite", "tracing-attributes", @@ -2248,9 +2240,9 @@ dependencies = [ [[package]] name = "tracing-attributes" -version = "0.1.30" +version = "0.1.31" source = "registry+https://github.com/rust-lang/crates.io-index" -checksum = "81383ab64e72a7a8b8e13130c49e3dab29def6d0c7d76a03087b3cf71c5c6903" +checksum = "7490cfa5ec963746568740651ac6781f701c9c5ea257c58e057f3ba8cf69e8da" dependencies = [ "proc-macro2", "quote", @@ -2259,9 +2251,9 @@ dependencies = [ [[package]] name = "tracing-core" -version = "0.1.34" +version = "0.1.36" source = "registry+https://github.com/rust-lang/crates.io-index" -checksum = "b9d12581f227e93f094d3af2ae690a574abb8a2b9b7a96e7cfe9647b2b617678" +checksum = "db97caf9d906fbde555dd62fa95ddba9eecfd14cb388e4f491a66d74cd5fb79a" dependencies = [ "once_cell", "valuable", @@ -2280,9 +2272,9 @@ dependencies = [ [[package]] name = "tracing-subscriber" -version = "0.3.20" +version = "0.3.22" source = "registry+https://github.com/rust-lang/crates.io-index" -checksum = "2054a14f5307d601f88daf0553e1cbf472acc4f2c51afab632431cdcd72124d5" +checksum = "2f30143827ddab0d256fd843b7a66d164e9f271cfa0dde49142c5ca0ca291f1e" dependencies = [ "matchers", "nu-ansi-term", @@ -2346,21 +2338,15 @@ checksum = "f63a545481291138910575129486daeaf8ac54aee4387fe7906919f7830c7d9d" [[package]] name = "unidirs" -version = "0.1.1" +version = "0.1.2" source = "registry+https://github.com/rust-lang/crates.io-index" -checksum = "8541ae17d5f59ea0fc67df37ca8fed88b06a73ba5c84beff6660d8f5e2065b48" +checksum = "a93c94ee9b12aeb67d6455e3c991df1da11b7037ac9814d7cb4efe671a803f0c" dependencies = [ "camino", "directories", - "whoami", + "whoami 1.6.1", ] -[[package]] -name = "unty" -version = "0.0.4" -source = "registry+https://github.com/rust-lang/crates.io-index" -checksum = "6d49784317cd0d1ee7ec5c716dd598ec5b4483ea832a2dced265471cc0f690ae" - [[package]] name = "utf8parse" version = "0.2.2" @@ -2369,21 +2355,21 @@ checksum = "06abde3611657adf66d383f00b093d7faecc7fa57071cce2578660c9f1010821" [[package]] name = "uuid" -version = "1.18.1" +version = "1.20.0" source = "registry+https://github.com/rust-lang/crates.io-index" -checksum = "2f87b8aa10b915a06587d0dec516c282ff295b475d94abf425d62b57710070a2" +checksum = "ee48d38b119b0cd71fe4141b30f5ba9c7c5d9f4e7a3a8b4a674e4b6ef789976f" dependencies = [ "getrandom 0.3.4", "js-sys", - "serde", + "serde_core", "wasm-bindgen", ] [[package]] name = "uzers" -version = "0.12.1" +version = "0.12.2" source = "registry+https://github.com/rust-lang/crates.io-index" -checksum = "4df81ff504e7d82ad53e95ed1ad5b72103c11253f39238bcc0235b90768a97dd" +checksum = "0b8275fb1afee25b4111d2dc8b5c505dbbc4afd0b990cb96deb2d88bff8be18d" dependencies = [ "libc", "log", @@ -2401,12 +2387,6 @@ version = "0.9.5" source = "registry+https://github.com/rust-lang/crates.io-index" checksum = "0b928f33d975fc6ad9f86c8f283853ad26bdd5b10b7f1542aa2fa15e2289105a" -[[package]] -name = "virtue" -version = "0.0.18" -source = "registry+https://github.com/rust-lang/crates.io-index" -checksum = "051eb1abcf10076295e815102942cc58f9d5e3b4560e46e53c21e8ff6f3af7b1" - [[package]] name = "wait-timeout" version = "0.2.1" @@ -2432,6 +2412,15 @@ version = "0.11.1+wasi-snapshot-preview1" source = "registry+https://github.com/rust-lang/crates.io-index" checksum = "ccf3ec651a847eb01de73ccad15eb7d99f80485de043efb2f370cd654f4ea44b" +[[package]] +name = "wasi" +version = "0.14.7+wasi-0.2.4" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "883478de20367e224c0090af9cf5f9fa85bed63a95c1abf3afc5c083ebc06e8c" +dependencies = [ + "wasip2", +] + [[package]] name = "wasip2" version = "1.0.1+wasi-0.2.4" @@ -2447,6 +2436,15 @@ version = "0.1.0" source = "registry+https://github.com/rust-lang/crates.io-index" checksum = "b8dad83b4f25e74f184f64c43b150b91efe7647395b42289f38e50566d82855b" +[[package]] +name = "wasite" +version = "1.0.2" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "66fe902b4a6b8028a753d5424909b764ccf79b7a209eac9bf97e59cda9f71a42" +dependencies = [ + "wasi 0.14.7+wasi-0.2.4", +] + [[package]] name = "wasm-bindgen" version = "0.2.104" @@ -2523,7 +2521,18 @@ source = "registry+https://github.com/rust-lang/crates.io-index" checksum = "5d4a4db5077702ca3015d3d02d74974948aba2ad9e12ab7df718ee64ccd7e97d" dependencies = [ "libredox", - "wasite", + "wasite 0.1.0", + "web-sys", +] + +[[package]] +name = "whoami" +version = "2.1.0" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "8fae98cf96deed1b7572272dfc777713c249ae40aa1cf8862e091e8b745f5361" +dependencies = [ + "libredox", + "wasite 1.0.2", "web-sys", ] @@ -2566,37 +2575,23 @@ checksum = "712e227841d057c1ee1cd2fb22fa7e5a5461ae8e48fa2ca79ec42cfc1931183f" [[package]] name = "windows" -version = "0.61.3" +version = "0.62.2" source = "registry+https://github.com/rust-lang/crates.io-index" -checksum = "9babd3a767a4c1aef6900409f85f5d53ce2544ccdfaa86dad48c91782c6d6893" +checksum = "527fadee13e0c05939a6a05d5bd6eec6cd2e3dbd648b9f8e447c6518133d8580" dependencies = [ "windows-collections", - "windows-core 0.61.2", + "windows-core", "windows-future", - "windows-link 0.1.3", "windows-numerics", ] [[package]] name = "windows-collections" -version = "0.2.0" -source = "registry+https://github.com/rust-lang/crates.io-index" -checksum = "3beeceb5e5cfd9eb1d76b381630e82c4241ccd0d27f1a39ed41b2760b255c5e8" -dependencies = [ - "windows-core 0.61.2", -] - -[[package]] -name = "windows-core" -version = "0.61.2" +version = "0.3.2" source = "registry+https://github.com/rust-lang/crates.io-index" -checksum = "c0fdd3ddb90610c7638aa2b3a3ab2904fb9e5cdbecc643ddb3647212781c4ae3" +checksum = "23b2d95af1a8a14a3c7367e1ed4fc9c20e0a26e79551b1454d72583c97cc6610" dependencies = [ - "windows-implement", - "windows-interface", - "windows-link 0.1.3", - "windows-result 0.3.4", - "windows-strings 0.4.2", + "windows-core", ] [[package]] @@ -2607,19 +2602,19 @@ checksum = "b8e83a14d34d0623b51dce9581199302a221863196a1dde71a7663a4c2be9deb" dependencies = [ "windows-implement", "windows-interface", - "windows-link 0.2.1", - "windows-result 0.4.1", - "windows-strings 0.5.1", + "windows-link", + "windows-result", + "windows-strings", ] [[package]] name = "windows-future" -version = "0.2.1" +version = "0.3.2" source = "registry+https://github.com/rust-lang/crates.io-index" -checksum = "fc6a41e98427b19fe4b73c550f060b59fa592d7d686537eebf9385621bfbad8e" +checksum = "e1d6f90251fe18a279739e78025bd6ddc52a7e22f921070ccdc67dde84c605cb" dependencies = [ - "windows-core 0.61.2", - "windows-link 0.1.3", + "windows-core", + "windows-link", "windows-threading", ] @@ -2645,12 +2640,6 @@ dependencies = [ "syn", ] -[[package]] -name = "windows-link" -version = "0.1.3" -source = "registry+https://github.com/rust-lang/crates.io-index" -checksum = "5e6ad25900d524eaabdbbb96d20b4311e1e7ae1699af4fb28c17ae66c80d798a" - [[package]] name = "windows-link" version = "0.2.1" @@ -2659,21 +2648,12 @@ checksum = "f0805222e57f7521d6a62e36fa9163bc891acd422f971defe97d64e70d0a4fe5" [[package]] name = "windows-numerics" -version = "0.2.0" -source = "registry+https://github.com/rust-lang/crates.io-index" -checksum = "9150af68066c4c5c07ddc0ce30421554771e528bde427614c61038bc2c92c2b1" -dependencies = [ - "windows-core 0.61.2", - "windows-link 0.1.3", -] - -[[package]] -name = "windows-result" -version = "0.3.4" +version = "0.3.1" source = "registry+https://github.com/rust-lang/crates.io-index" -checksum = "56f42bd332cc6c8eac5af113fc0c1fd6a8fd2aa08a0119358686e5160d0586c6" +checksum = "6e2e40844ac143cdb44aead537bbf727de9b044e107a0f1220392177d15b0f26" dependencies = [ - "windows-link 0.1.3", + "windows-core", + "windows-link", ] [[package]] @@ -2682,7 +2662,7 @@ version = "0.4.1" source = "registry+https://github.com/rust-lang/crates.io-index" checksum = "7781fa89eaf60850ac3d2da7af8e5242a5ea78d1a11c49bf2910bb5a73853eb5" dependencies = [ - "windows-link 0.2.1", + "windows-link", ] [[package]] @@ -2696,31 +2676,13 @@ dependencies = [ "windows-sys 0.59.0", ] -[[package]] -name = "windows-strings" -version = "0.4.2" -source = "registry+https://github.com/rust-lang/crates.io-index" -checksum = "56e6c93f3a0c3b36176cb1327a4958a0353d5d166c2a35cb268ace15e91d3b57" -dependencies = [ - "windows-link 0.1.3", -] - [[package]] name = "windows-strings" version = "0.5.1" source = "registry+https://github.com/rust-lang/crates.io-index" checksum = "7837d08f69c77cf6b07689544538e017c1bfcf57e34b4c0ff58e6c2cd3b37091" dependencies = [ - "windows-link 0.2.1", -] - -[[package]] -name = "windows-sys" -version = "0.48.0" -source = "registry+https://github.com/rust-lang/crates.io-index" -checksum = "677d2418bec65e3338edb076e806bc1ec15693c5d0104683f2efe857f61056a9" -dependencies = [ - "windows-targets 0.48.5", + "windows-link", ] [[package]] @@ -2756,22 +2718,7 @@ version = "0.61.2" source = "registry+https://github.com/rust-lang/crates.io-index" checksum = "ae137229bcbd6cdf0f7b80a31df61766145077ddf49416a728b02cb3921ff3fc" dependencies = [ - "windows-link 0.2.1", -] - -[[package]] -name = "windows-targets" -version = "0.48.5" -source = "registry+https://github.com/rust-lang/crates.io-index" -checksum = "9a2fa6e2155d7247be68c096456083145c183cbbbc2764150dda45a87197940c" -dependencies = [ - "windows_aarch64_gnullvm 0.48.5", - "windows_aarch64_msvc 0.48.5", - "windows_i686_gnu 0.48.5", - "windows_i686_msvc 0.48.5", - "windows_x86_64_gnu 0.48.5", - "windows_x86_64_gnullvm 0.48.5", - "windows_x86_64_msvc 0.48.5", + "windows-link", ] [[package]] @@ -2796,7 +2743,7 @@ version = "0.53.5" source = "registry+https://github.com/rust-lang/crates.io-index" checksum = "4945f9f551b88e0d65f3db0bc25c33b8acea4d9e41163edf90dcd0b19f9069f3" dependencies = [ - "windows-link 0.2.1", + "windows-link", "windows_aarch64_gnullvm 0.53.1", "windows_aarch64_msvc 0.53.1", "windows_i686_gnu 0.53.1", @@ -2809,19 +2756,13 @@ dependencies = [ [[package]] name = "windows-threading" -version = "0.1.0" +version = "0.2.1" source = "registry+https://github.com/rust-lang/crates.io-index" -checksum = "b66463ad2e0ea3bbf808b7f1d371311c80e115c0b71d60efc142cafbcfb057a6" +checksum = "3949bd5b99cafdf1c7ca86b43ca564028dfe27d66958f2470940f73d86d75b37" dependencies = [ - "windows-link 0.1.3", + "windows-link", ] -[[package]] -name = "windows_aarch64_gnullvm" -version = "0.48.5" -source = "registry+https://github.com/rust-lang/crates.io-index" -checksum = "2b38e32f0abccf9987a4e3079dfb67dcd799fb61361e53e2882c3cbaf0d905d8" - [[package]] name = "windows_aarch64_gnullvm" version = "0.52.6" @@ -2834,12 +2775,6 @@ version = "0.53.1" source = "registry+https://github.com/rust-lang/crates.io-index" checksum = "a9d8416fa8b42f5c947f8482c43e7d89e73a173cead56d044f6a56104a6d1b53" -[[package]] -name = "windows_aarch64_msvc" -version = "0.48.5" -source = "registry+https://github.com/rust-lang/crates.io-index" -checksum = "dc35310971f3b2dbbf3f0690a219f40e2d9afcf64f9ab7cc1be722937c26b4bc" - [[package]] name = "windows_aarch64_msvc" version = "0.52.6" @@ -2852,12 +2787,6 @@ version = "0.53.1" source = "registry+https://github.com/rust-lang/crates.io-index" checksum = "b9d782e804c2f632e395708e99a94275910eb9100b2114651e04744e9b125006" -[[package]] -name = "windows_i686_gnu" -version = "0.48.5" -source = "registry+https://github.com/rust-lang/crates.io-index" -checksum = "a75915e7def60c94dcef72200b9a8e58e5091744960da64ec734a6c6e9b3743e" - [[package]] name = "windows_i686_gnu" version = "0.52.6" @@ -2882,12 +2811,6 @@ version = "0.53.1" source = "registry+https://github.com/rust-lang/crates.io-index" checksum = "fa7359d10048f68ab8b09fa71c3daccfb0e9b559aed648a8f95469c27057180c" -[[package]] -name = "windows_i686_msvc" -version = "0.48.5" -source = "registry+https://github.com/rust-lang/crates.io-index" -checksum = "8f55c233f70c4b27f66c523580f78f1004e8b5a8b659e05a4eb49d4166cca406" - [[package]] name = "windows_i686_msvc" version = "0.52.6" @@ -2900,12 +2823,6 @@ version = "0.53.1" source = "registry+https://github.com/rust-lang/crates.io-index" checksum = "1e7ac75179f18232fe9c285163565a57ef8d3c89254a30685b57d83a38d326c2" -[[package]] -name = "windows_x86_64_gnu" -version = "0.48.5" -source = "registry+https://github.com/rust-lang/crates.io-index" -checksum = "53d40abd2583d23e4718fddf1ebec84dbff8381c07cae67ff7768bbf19c6718e" - [[package]] name = "windows_x86_64_gnu" version = "0.52.6" @@ -2918,12 +2835,6 @@ version = "0.53.1" source = "registry+https://github.com/rust-lang/crates.io-index" checksum = "9c3842cdd74a865a8066ab39c8a7a473c0778a3f29370b5fd6b4b9aa7df4a499" -[[package]] -name = "windows_x86_64_gnullvm" -version = "0.48.5" -source = "registry+https://github.com/rust-lang/crates.io-index" -checksum = "0b7b52767868a23d5bab768e390dc5f5c55825b6d30b86c844ff2dc7414044cc" - [[package]] name = "windows_x86_64_gnullvm" version = "0.52.6" @@ -2936,12 +2847,6 @@ version = "0.53.1" source = "registry+https://github.com/rust-lang/crates.io-index" checksum = "0ffa179e2d07eee8ad8f57493436566c7cc30ac536a3379fdf008f47f6bb7ae1" -[[package]] -name = "windows_x86_64_msvc" -version = "0.48.5" -source = "registry+https://github.com/rust-lang/crates.io-index" -checksum = "ed94fce61571a4006852b7389a063ab983c02eb1bb37b47f8272ce92d06d9538" - [[package]] name = "windows_x86_64_msvc" version = "0.52.6" @@ -3004,3 +2909,9 @@ dependencies = [ "quote", "syn", ] + +[[package]] +name = "zmij" +version = "1.0.12" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "2fc5a66a20078bf1251bde995aa2fdcc4b800c70b5d92dd2c62abc5c60f679f8" diff --git a/Cargo.toml b/Cargo.toml index 19702a5..8317626 100644 --- a/Cargo.toml +++ b/Cargo.toml @@ -44,32 +44,30 @@ exclude = [ anyhow = "1.0.100" # Testing and development -assert_cmd = "2.1.1" +assert_cmd = "2.1.2" # Async runtime and traits async-trait = "0.1.89" - -# Additional dependencies found in member crates -base64 = "0.22.1" - -# Serialization and data formats -bincode = "2.0.1" bitflags = { version = "2.10.0", features = ["serde"] } # Cryptographic operations -blake3 = "1.8.2" -bytes = "1.10.1" +blake3 = "1.8.3" + +# Buffer and byte utilities +bytes = "1.11.0" # Date/time and utilities -chrono = { version = "0.4.42", features = ["serde"] } +chrono = { version = "0.4.43", features = ["serde"] } # CLI and configuration -clap = { version = "4.5.51", features = ["derive"] } +clap = { version = "4.5.55", features = ["derive"] } # Internal libraries collector-core = { path = "collector-core" } + +# CRC checksums crc32c = "0.6.8" -criterion = "0.7.0" +criterion = "0.8.1" # High-performance concurrent data structures crossbeam = "0.8.4" @@ -82,37 +80,40 @@ futures-util = "0.3.31" # System information and IPC hostname-validator = "1.1.1" -humantime-serde = "1.1.1" -insta = { version = "1.43.2", features = ["filters"] } +insta = { version = "1.46.1", features = ["filters"] } interprocess = { version = "2.2.3", features = ["tokio"] } -md5 = "0.8.0" parking_lot = "0.12.5" + +# Serialization +postcard = { version = "1.1", features = ["alloc"] } + +# Testing utilities predicates = "3.1.3" proptest = "1.9.0" # Protocol Buffers -prost = "0.14.1" -prost-build = "0.14.1" -prost-types = "0.14.1" +prost = "0.14.3" +prost-build = "0.14.3" +prost-types = "0.14.3" rand = "0.9.2" # Database and storage -redb = "3.0.2" +redb = "3.1.0" regex = "1.12.2" rs_merkle = "1.5.0" # Platform-specific dependencies # macOS -security-framework = "3.5.0" +security-framework = "3.5.1" serde = { version = "1.0.228", features = ["derive"] } -serde_json = "1.0.145" +serde_json = "1.0.149" sha2 = "0.10.9" -sqlparser = "0.59.0" +sqlparser = "0.60.0" -sysinfo = "0.37.2" -tempfile = "3.23.0" -thiserror = "2.0.17" -tokio = { version = "1.47.1", features = [ +sysinfo = "0.38.0" +tempfile = "3.24.0" +thiserror = "2.0.18" +tokio = { version = "1.49.0", features = [ "rt", "rt-multi-thread", "net", @@ -124,20 +125,20 @@ tokio = { version = "1.47.1", features = [ "fs", "signal", ] } -tokio-test = "0.4.4" +tokio-test = "0.4.5" toml = "0.9.8" # Logging and observability -tracing = "0.1.41" -tracing-subscriber = { version = "0.3.20", features = ["env-filter"] } +tracing = "0.1.44" +tracing-subscriber = { version = "0.3.22", features = ["env-filter"] } tracing-test = "0.2.5" -unidirs = "0.1.1" -uuid = { version = "1.18.1", features = ["v4", "serde"] } -uzers = "0.12.1" -whoami = "1.4" +unidirs = "0.1.2" +uuid = { version = "1.20.0", features = ["v4", "serde"] } +uzers = "0.12.2" +whoami = "2.1.0" # Windows -windows = { version = "0.61.3", features = [ +windows = { version = "0.62.2", features = [ "Win32_Foundation", "Win32_System_ProcessStatus", "Win32_System_Threading", diff --git a/collector-core/Cargo.toml b/collector-core/Cargo.toml index 1036b26..7028e10 100644 --- a/collector-core/Cargo.toml +++ b/collector-core/Cargo.toml @@ -24,15 +24,11 @@ eventbus-integration = [] anyhow = { workspace = true } async-trait = { workspace = true } -# Base64 encoding for message payloads -base64 = { workspace = true } - # Binary serialization -bincode = { workspace = true } +postcard = { workspace = true } # Bitflags for capabilities bitflags = { workspace = true } -chrono = { workspace = true, features = ["serde"] } # High-performance concurrent data structures crossbeam = { workspace = true } @@ -46,15 +42,9 @@ hostname-validator = { workspace = true } # IPC integration daemoneye-lib = { workspace = true } futures = { workspace = true } -humantime-serde = { workspace = true } -# MD5 hashing for configuration checksums -md5 = { workspace = true } parking_lot = { workspace = true } -# Regular expressions for topic pattern matching -regex = { workspace = true } - # Serialization serde = { workspace = true } serde_json = { workspace = true } @@ -76,11 +66,10 @@ rand = { workspace = true } # UUID generation uuid = { workspace = true } -toml = { workspace = true } [dev-dependencies] anyhow = { workspace = true } -bincode = { workspace = true } +postcard = { workspace = true } criterion = { workspace = true, features = ["html_reports"] } daemoneye-eventbus = { workspace = true } futures = { workspace = true } diff --git a/collector-core/src/rpc_services.rs b/collector-core/src/rpc_services.rs index df30083..757c38e 100644 --- a/collector-core/src/rpc_services.rs +++ b/collector-core/src/rpc_services.rs @@ -260,11 +260,9 @@ impl CollectorRpcServiceManager { } // Deserialize RPC request - let request: RpcRequest = match bincode::serde::decode_from_slice::( - &message.payload, - bincode::config::standard(), - ) { - Ok((req, _)) => { + let request: RpcRequest = match postcard::from_bytes::(&message.payload) + { + Ok(req) => { eprintln!( "RPC_SERVICE_LOOP: Deserialized request: operation={:?}, client_id={}, request_id={}", req.operation, req.client_id, req.request_id @@ -478,10 +476,7 @@ impl CollectorRpcServiceManager { ); // Serialize and publish response - let payload = match bincode::serde::encode_to_vec( - &response, - bincode::config::standard(), - ) { + let payload = match postcard::to_allocvec(&response) { Ok(data) => data, Err(serialization_error) => { // Build minimal error response containing original request info and serialization error @@ -519,10 +514,7 @@ impl CollectorRpcServiceManager { }; // Attempt to serialize the error response - match bincode::serde::encode_to_vec( - &error_response, - bincode::config::standard(), - ) { + match postcard::to_allocvec(&error_response) { Ok(error_payload) => error_payload, Err(error_serialization_error) => { // Even error response serialization failed - log and continue diff --git a/collector-core/src/task_distributor.rs b/collector-core/src/task_distributor.rs index 9d59d98..9580e45 100644 --- a/collector-core/src/task_distributor.rs +++ b/collector-core/src/task_distributor.rs @@ -410,8 +410,7 @@ impl TaskDistributor { }; // Serialize event as payload - let payload = bincode::serde::encode_to_vec(event, bincode::config::standard()) - .context("Failed to serialize event")?; + let payload = postcard::to_allocvec(event).context("Failed to serialize event")?; Ok(DistributionTask { task_id, diff --git a/collector-core/tests/daemoneye_eventbus_integration.rs b/collector-core/tests/daemoneye_eventbus_integration.rs index 7a9cbb6..84930eb 100644 --- a/collector-core/tests/daemoneye_eventbus_integration.rs +++ b/collector-core/tests/daemoneye_eventbus_integration.rs @@ -1,4 +1,5 @@ #![cfg(feature = "eventbus-integration")] +#![allow(dead_code, unused_imports)] //! Integration tests for DaemoneyeEventBus with collector-core framework. use collector_core::{ diff --git a/collector-core/tests/daemoneye_eventbus_ipc_integration.rs b/collector-core/tests/daemoneye_eventbus_ipc_integration.rs index 7910081..ddb0a7d 100644 --- a/collector-core/tests/daemoneye_eventbus_ipc_integration.rs +++ b/collector-core/tests/daemoneye_eventbus_ipc_integration.rs @@ -304,8 +304,7 @@ async fn test_eventbus_broker_access_integration() { platform_metadata: None, }); - let test_payload = bincode::serde::encode_to_vec(&test_event, bincode::config::standard()) - .expect("Failed to serialize test event"); + let test_payload = postcard::to_allocvec(&test_event).expect("Failed to serialize test event"); broker .publish("events.process.new", "test-correlation", test_payload) diff --git a/collector-core/tests/eventbus_performance_comparison.rs b/collector-core/tests/eventbus_performance_comparison.rs index dae055e..d7b25da 100644 --- a/collector-core/tests/eventbus_performance_comparison.rs +++ b/collector-core/tests/eventbus_performance_comparison.rs @@ -1,4 +1,5 @@ #![cfg(feature = "eventbus-integration")] +#![allow(dead_code, unused_imports)] //! Performance comparison tests between daemoneye-eventbus and crossbeam event distribution. //! //! This test suite validates that the migration from crossbeam to daemoneye-eventbus diff --git a/collector-core/tests/rpc_server_integration.rs b/collector-core/tests/rpc_server_integration.rs index b9bc248..fc3abe7 100644 --- a/collector-core/tests/rpc_server_integration.rs +++ b/collector-core/tests/rpc_server_integration.rs @@ -81,11 +81,8 @@ async fn spawn_registration_handler( eprintln!("REG_HANDLER: Received message, payload size: {}", message.payload.len()); // Deserialize RPC request - let request: RpcRequest = match bincode::serde::decode_from_slice( - &message.payload, - bincode::config::standard(), - ) { - Ok((req, _)) => req, + let request: RpcRequest = match postcard::from_bytes(&message.payload) { + Ok(req) => req, Err(e) => { eprintln!("REG_HANDLER: Failed to deserialize request: {:?}", e); continue; @@ -100,7 +97,7 @@ async fn spawn_registration_handler( // Serialize and publish response to the client's response topic let response_topic = format!("control.rpc.response.{}", request.client_id); eprintln!("REG_HANDLER: Publishing response to topic: {}", response_topic); - if let Ok(payload) = bincode::serde::encode_to_vec(&response, bincode::config::standard()) { + if let Ok(payload) = postcard::to_allocvec(&response) { let result = broker_clone.publish(&response_topic, &response.request_id, payload).await; eprintln!("REG_HANDLER: Publish result: {:?}", result.is_ok()); } diff --git a/collector-core/tests/simple_daemoneye_test.rs b/collector-core/tests/simple_daemoneye_test.rs index 49042ba..c0ade98 100644 --- a/collector-core/tests/simple_daemoneye_test.rs +++ b/collector-core/tests/simple_daemoneye_test.rs @@ -1,4 +1,5 @@ #![cfg(feature = "eventbus-integration")] +#![allow(dead_code, unused_imports)] //! Simple test to verify DaemoneyeEventBus basic functionality. use collector_core::{ diff --git a/daemoneye-agent/Cargo.toml b/daemoneye-agent/Cargo.toml index eedf2f5..7282cc3 100644 --- a/daemoneye-agent/Cargo.toml +++ b/daemoneye-agent/Cargo.toml @@ -24,30 +24,16 @@ eula = false # Core dependencies anyhow = { workspace = true } async-trait = { workspace = true } -bytes = { workspace = true } -chrono = { workspace = true } clap = { workspace = true } # Internal library daemoneye-lib = { workspace = true } daemoneye-eventbus = { workspace = true } collector-core = { workspace = true } -futures = { workspace = true } -# IPC communication -interprocess = { workspace = true } - -# Protocol Buffers -prost = { workspace = true } -prost-types = { workspace = true } - -# Database -redb = { workspace = true } serde = { workspace = true } serde_json = { workspace = true } -# SQL parsing -sqlparser = { workspace = true } thiserror = { workspace = true } tokio = { workspace = true } tracing = { workspace = true } @@ -63,6 +49,5 @@ insta = { workspace = true } predicates = { workspace = true } tempfile = { workspace = true } -[lints.rust] -unsafe_code = "forbid" -warnings = "deny" +[lints] +workspace = true diff --git a/daemoneye-agent/examples/dual_protocol_demo.rs b/daemoneye-agent/examples/dual_protocol_demo.rs index d5f05c5..8e22862 100644 --- a/daemoneye-agent/examples/dual_protocol_demo.rs +++ b/daemoneye-agent/examples/dual_protocol_demo.rs @@ -6,6 +6,15 @@ //! //! Both services coexist without interference. +#![allow( + clippy::uninlined_format_args, + clippy::print_stdout, + clippy::expect_used, + clippy::unwrap_used, + clippy::doc_markdown, + clippy::str_to_string +)] + use daemoneye_agent::{BrokerManager, IpcServerManager, create_cli_ipc_config}; use daemoneye_lib::config::BrokerConfig; use std::time::Duration; diff --git a/daemoneye-agent/src/broker_manager.rs b/daemoneye-agent/src/broker_manager.rs index 39d8eb5..c47b62b 100644 --- a/daemoneye-agent/src/broker_manager.rs +++ b/daemoneye-agent/src/broker_manager.rs @@ -1,4 +1,4 @@ -//! Embedded EventBus broker management for daemoneye-agent +//! Embedded `EventBus` broker management for daemoneye-agent //! //! This module provides the `BrokerManager` which embeds a `DaemoneyeBroker` instance //! within the daemoneye-agent process architecture. The broker operates independently @@ -6,6 +6,7 @@ //! for collector-core component coordination. use crate::collector_registry::{CollectorRegistry, RegistryError}; +use crate::health::{self, HealthState}; use anyhow::{Context, Result}; use daemoneye_eventbus::ConfigManager; use daemoneye_eventbus::rpc::{ @@ -26,6 +27,7 @@ use tracing::{debug, error, info, warn}; /// Health status of the embedded broker #[derive(Debug, Clone, PartialEq, Eq)] +#[non_exhaustive] pub enum BrokerHealth { /// Broker is healthy and operational Healthy, @@ -39,14 +41,39 @@ pub enum BrokerHealth { Stopped, } -/// Embedded broker manager that coordinates the DaemoneyeBroker lifecycle +impl HealthState for BrokerHealth { + fn is_healthy(&self) -> bool { + matches!(self, Self::Healthy) + } + + fn is_starting(&self) -> bool { + matches!(self, Self::Starting) + } + + fn unhealthy_message(&self) -> Option<&str> { + match *self { + Self::Unhealthy(ref msg) => Some(msg), + Self::Healthy | Self::Starting | Self::ShuttingDown | Self::Stopped => None, + } + } + + fn is_stopped_or_shutting_down(&self) -> bool { + matches!(self, Self::ShuttingDown | Self::Stopped) + } + + fn service_name() -> &'static str { + "Broker" + } +} + +/// Embedded broker manager that coordinates the `DaemoneyeBroker` lifecycle /// within the daemoneye-agent process architecture. pub struct BrokerManager { /// Configuration for the broker config: BrokerConfig, /// The embedded broker instance broker: Arc>>>, - /// EventBus client for agent-side operations + /// `EventBus` client for agent-side operations event_bus: Arc>>, /// Current health status health_status: Arc>, @@ -107,10 +134,7 @@ impl BrokerManager { } // Update health status to starting - { - let mut health = self.health_status.write().await; - *health = BrokerHealth::Starting; - } + *self.health_status.write().await = BrokerHealth::Starting; info!( socket_path = %self.config.socket_path, @@ -152,28 +176,14 @@ impl BrokerManager { let broker_arc = Arc::clone(event_bus.broker()); // Store the broker and event bus - { - let mut broker_guard = self.broker.write().await; - *broker_guard = Some(Arc::clone(&broker_arc)); - } - - { - let mut event_bus_guard = self.event_bus.lock().await; - *event_bus_guard = Some(event_bus); - } + *self.broker.write().await = Some(Arc::clone(&broker_arc)); + *self.event_bus.lock().await = Some(event_bus); // Initialize collector registry - { - let registry = Arc::new(CollectorRegistry::default()); - let mut registry_guard = self.collector_registry.write().await; - *registry_guard = Some(registry); - } + *self.collector_registry.write().await = Some(Arc::new(CollectorRegistry::default())); // Update health status to healthy - { - let mut health = self.health_status.write().await; - *health = BrokerHealth::Healthy; - } + *self.health_status.write().await = BrokerHealth::Healthy; info!("Embedded DaemonEye EventBus broker started successfully"); Ok(()) @@ -184,10 +194,7 @@ impl BrokerManager { info!("Initiating graceful shutdown of embedded broker"); // Update health status to shutting down - { - let mut health = self.health_status.write().await; - *health = BrokerHealth::ShuttingDown; - } + *self.health_status.write().await = BrokerHealth::ShuttingDown; // Send graceful shutdown RPC to all collectors first info!("Sending graceful shutdown RPC to all collectors"); @@ -281,22 +288,13 @@ impl BrokerManager { } // Clear the broker reference - { - let mut broker_guard = self.broker.write().await; - *broker_guard = None; - } + *self.broker.write().await = None; // Clear collector registry - { - let mut registry_guard = self.collector_registry.write().await; - registry_guard.take(); - } + self.collector_registry.write().await.take(); // Update health status to stopped - { - let mut health = self.health_status.write().await; - *health = BrokerHealth::Stopped; - } + *self.health_status.write().await = BrokerHealth::Stopped; info!("Embedded broker shutdown complete"); Ok(()) @@ -321,7 +319,7 @@ impl BrokerManager { async fn registry(&self) -> std::result::Result, RegistrationError> { let guard = self.collector_registry.read().await; guard.as_ref().cloned().ok_or_else(|| { - RegistrationError::Internal("collector registry not initialized".to_string()) + RegistrationError::Internal("collector registry not initialized".to_owned()) }) } @@ -333,7 +331,7 @@ impl BrokerManager { } } - /// Get a reference to the EventBus client for agent operations + /// Get a reference to the `EventBus` client for agent operations #[allow(dead_code)] pub fn event_bus(&self) -> Arc>> { Arc::clone(&self.event_bus) @@ -353,7 +351,7 @@ impl BrokerManager { /// Get a reference to the process manager #[allow(dead_code)] // Public accessor for future use - pub fn process_manager(&self) -> &Arc { + pub const fn process_manager(&self) -> &Arc { &self.process_manager } @@ -400,7 +398,7 @@ impl BrokerManager { if any_unhealthy { let unhealthy_status = BrokerHealth::Unhealthy( - "One or more collectors are unhealthy".to_string(), + "One or more collectors are unhealthy".to_owned(), ); let mut health = self.health_status.write().await; *health = unhealthy_status.clone(); @@ -408,7 +406,7 @@ impl BrokerManager { } else if any_degraded { // Represent degraded collector state as Unhealthy with reason let degraded_status = BrokerHealth::Unhealthy( - "One or more collectors are degraded".to_string(), + "One or more collectors are degraded".to_owned(), ); let mut health = self.health_status.write().await; *health = degraded_status.clone(); @@ -419,65 +417,48 @@ impl BrokerManager { } else { warn!("Broker health check failed - unable to get statistics"); let unhealthy_status = - BrokerHealth::Unhealthy("Unable to get statistics".to_string()); + BrokerHealth::Unhealthy("Unable to get statistics".to_owned()); let mut health = self.health_status.write().await; *health = unhealthy_status.clone(); unhealthy_status } } - other => other, + BrokerHealth::Starting + | BrokerHealth::ShuttingDown + | BrokerHealth::Unhealthy(_) + | BrokerHealth::Stopped => current_health, } } /// Wait for the broker to become healthy with a timeout pub async fn wait_for_healthy(&self, timeout: Duration) -> Result<()> { - let start = std::time::Instant::now(); - - while start.elapsed() < timeout { - let health = self.health_status().await; - match health { - BrokerHealth::Healthy => { - debug!("Broker is healthy"); - return Ok(()); - } - BrokerHealth::Starting => { - debug!("Broker is still starting, waiting..."); - tokio::time::sleep(Duration::from_millis(100)).await; - } - BrokerHealth::Unhealthy(ref error) => { - return Err(anyhow::anyhow!("Broker is unhealthy: {}", error)); - } - BrokerHealth::ShuttingDown | BrokerHealth::Stopped => { - return Err(anyhow::anyhow!("Broker is not running")); - } - } - } - - Err(anyhow::anyhow!( - "Timeout waiting for broker to become healthy after {:?}", - timeout - )) + let health_status = Arc::clone(&self.health_status); + health::wait_for_healthy(timeout, || async { health_status.read().await.clone() }).await } /// Create an RPC client for a collector pub async fn create_rpc_client(&self, collector_id: &str) -> Result> { - let broker_guard = self.broker.read().await; - let broker = broker_guard - .as_ref() - .ok_or_else(|| anyhow::anyhow!("Broker not available"))?; + let broker = { + let broker_guard = self.broker.read().await; + Arc::clone( + broker_guard + .as_ref() + .ok_or_else(|| anyhow::anyhow!("Broker not available"))?, + ) + }; - let target_topic = format!("control.collector.{}", collector_id); + let target_topic = format!("control.collector.{collector_id}"); let client = Arc::new( - CollectorRpcClient::new(&target_topic, Arc::clone(broker)) + CollectorRpcClient::new(&target_topic, broker) .await .context("Failed to create RPC client")?, ); // Store the client - { - let mut clients = self.rpc_clients.write().await; - clients.insert(collector_id.to_string(), Arc::clone(&client)); - } + self.rpc_clients + .write() + .await + .insert(collector_id.to_owned(), Arc::clone(&client)); info!( collector_id = %collector_id, @@ -516,11 +497,11 @@ impl BrokerManager { if graceful { let shutdown_request = ShutdownRequest { - collector_id: collector_id.to_string(), + collector_id: collector_id.to_owned(), shutdown_type: ShutdownType::Graceful, graceful_timeout_ms: 5000, force_after_timeout: true, - reason: Some("Agent-initiated graceful shutdown".to_string()), + reason: Some("Agent-initiated graceful shutdown".to_owned()), }; let request = RpcRequest::shutdown( client.client_id.clone(), @@ -606,6 +587,8 @@ impl BrokerManager { let task_json = serde_json::to_value(&task).context("Failed to serialize detection task")?; + #[allow(clippy::arithmetic_side_effects)] // Safe: SystemTime + Duration is well-defined + let deadline = std::time::SystemTime::now() + Duration::from_secs(30); let request = RpcRequest { request_id: uuid::Uuid::new_v4().to_string(), client_id: client.client_id.clone(), @@ -613,7 +596,7 @@ impl BrokerManager { operation: CollectorOperation::ExecuteTask, payload: RpcPayload::Task(task_json), timestamp: std::time::SystemTime::now(), - deadline: std::time::SystemTime::now() + Duration::from_secs(30), + deadline, correlation_metadata: daemoneye_eventbus::rpc::RpcCorrelationMetadata::new( uuid::Uuid::new_v4().to_string(), ), @@ -638,11 +621,8 @@ impl BrokerManager { /// Get or create an RPC client for a collector pub async fn get_rpc_client(&self, collector_id: &str) -> Result> { // Check if client already exists - { - let clients = self.rpc_clients.read().await; - if let Some(client) = clients.get(collector_id) { - return Ok(Arc::clone(client)); - } + if let Some(client) = self.rpc_clients.read().await.get(collector_id) { + return Ok(Arc::clone(client)); } // Create new client @@ -678,9 +658,9 @@ impl HealthProvider for BrokerManager { // Build component details let mut components = std::collections::HashMap::new(); components.insert( - "process".to_string(), + "process".to_owned(), ComponentHealth { - name: "process".to_string(), + name: "process".to_owned(), status: match health { daemoneye_eventbus::process_manager::HealthStatus::Healthy => { HealthStatus::Healthy @@ -715,9 +695,9 @@ impl HealthProvider for BrokerManager { .as_secs(); components.insert( - "heartbeat".to_string(), + "heartbeat".to_owned(), ComponentHealth { - name: "heartbeat".to_string(), + name: "heartbeat".to_owned(), status: hb_status, message: Some(format!( "Last heartbeat: {}s ago, Missed: {}", @@ -730,11 +710,11 @@ impl HealthProvider for BrokerManager { // Event sources health is not yet implemented components.insert( - "event_sources".to_string(), + "event_sources".to_owned(), ComponentHealth { - name: "event_sources".to_string(), + name: "event_sources".to_owned(), status: HealthStatus::Unknown, - message: Some("Event sources health monitoring not yet implemented".to_string()), + message: Some("Event sources health monitoring not yet implemented".to_owned()), last_check: std::time::SystemTime::now(), check_interval_seconds: 60, }, @@ -743,22 +723,25 @@ impl HealthProvider for BrokerManager { // Compute overall health using worst-of aggregation let overall = aggregate_worst_of(components.values().map(|c| c.status)); + #[allow(clippy::as_conversions)] + // Safe: uptime_seconds and heartbeat_age are small u64 values + let uptime_seconds_f64 = status.uptime.as_secs() as f64; + #[allow(clippy::as_conversions)] + let heartbeat_age_f64 = heartbeat_age as f64; + let mut metrics = std::collections::HashMap::new(); - metrics.insert("pid".to_string(), status.pid as f64); - metrics.insert("restart_count".to_string(), status.restart_count as f64); - metrics.insert("uptime_seconds".to_string(), status.uptime.as_secs() as f64); - metrics.insert( - "missed_heartbeats".to_string(), - status.missed_heartbeats as f64, - ); + metrics.insert("pid".to_owned(), f64::from(status.pid)); + metrics.insert("restart_count".to_owned(), f64::from(status.restart_count)); + metrics.insert("uptime_seconds".to_owned(), uptime_seconds_f64); metrics.insert( - "last_heartbeat_age_seconds".to_string(), - heartbeat_age as f64, + "missed_heartbeats".to_owned(), + f64::from(status.missed_heartbeats), ); - metrics.insert("error_count".to_string(), 0.0); + metrics.insert("last_heartbeat_age_seconds".to_owned(), heartbeat_age_f64); + metrics.insert("error_count".to_owned(), 0.0); Ok(HealthCheckData { - collector_id: collector_id.to_string(), + collector_id: collector_id.to_owned(), status: overall, components, metrics, @@ -820,17 +803,16 @@ impl ConfigProvider for BrokerManager { .await .map_err(|e| { daemoneye_eventbus::ConfigManagerError::PersistenceFailed(format!( - "Failed to restart collector after config update: {}", - e + "Failed to restart collector after config update: {e}" )) })?; } else { // Publish hot-reload notification if broker is present let broker_guard = self.broker.read().await; if let Some(broker) = broker_guard.as_ref() { - let topic = format!("control.collector.config.{}", collector_id); + let topic = format!("control.collector.config.{collector_id}"); let notification = daemoneye_eventbus::rpc::ConfigChangeNotification { - collector_id: collector_id.to_string(), + collector_id: collector_id.to_owned(), changed_fields: changed_fields.clone(), version: snapshot.version, timestamp: std::time::SystemTime::now() @@ -841,7 +823,7 @@ impl ConfigProvider for BrokerManager { match serde_json::to_vec(¬ification) { Ok(payload) => { if let Err(e) = broker - .publish(&topic, &format!("config-change-{}", collector_id), payload) + .publish(&topic, &format!("config-change-{collector_id}"), payload) .await { tracing::warn!( @@ -894,7 +876,7 @@ impl RegistrationProvider for BrokerManager { let response = registry .register(request.clone()) .await - .map_err(BrokerManager::map_registry_error)?; + .map_err(Self::map_registry_error)?; // Create RPC client after successful registration if response.accepted @@ -918,11 +900,11 @@ impl RegistrationProvider for BrokerManager { registry .deregister(request.clone()) .await - .map_err(BrokerManager::map_registry_error)?; + .map_err(Self::map_registry_error)?; // Remove RPC client and shut it down - let mut clients = self.rpc_clients.write().await; - if let Some(client) = clients.remove(&request.collector_id) + let removed_client = self.rpc_clients.write().await.remove(&request.collector_id); + if let Some(client) = removed_client && let Err(e) = client.shutdown().await { warn!( @@ -943,7 +925,7 @@ impl RegistrationProvider for BrokerManager { registry .update_heartbeat(collector_id) .await - .map_err(BrokerManager::map_registry_error) + .map_err(Self::map_registry_error) } } @@ -982,18 +964,26 @@ fn aggregate_worst_of>(iter: I) -> HealthStatus } #[cfg(test)] +#[allow( + clippy::expect_used, + clippy::unwrap_used, + clippy::str_to_string, + clippy::semicolon_outside_block, + clippy::semicolon_inside_block, + clippy::semicolon_if_nothing_returned +)] mod tests { use super::*; use daemoneye_lib::config::BrokerConfig; fn sample_registration_request() -> RegistrationRequest { RegistrationRequest { - collector_id: "test-collector".to_string(), - collector_type: "test-collector".to_string(), - hostname: "localhost".to_string(), - version: Some("1.0.0".to_string()), + collector_id: "test-collector".to_owned(), + collector_type: "test-collector".to_owned(), + hostname: "localhost".to_owned(), + version: Some("1.0.0".to_owned()), pid: Some(1234), - capabilities: vec!["process".to_string()], + capabilities: vec!["process".to_owned()], attributes: std::collections::HashMap::new(), heartbeat_interval_ms: Some(5_000), } @@ -1027,7 +1017,7 @@ mod tests { #[tokio::test] async fn test_broker_manager_socket_path() { let config = BrokerConfig { - socket_path: "/tmp/test-broker.sock".to_string(), + socket_path: "/tmp/test-broker.sock".to_owned(), ..Default::default() }; @@ -1050,8 +1040,8 @@ mod tests { { let mut guard = manager.collector_registry.write().await; - *guard = Some(Arc::new(CollectorRegistry::default())); - } + *guard = Some(Arc::new(CollectorRegistry::default())) + }; let request = sample_registration_request(); let response = manager @@ -1069,7 +1059,7 @@ mod tests { manager .deregister_collector(DeregistrationRequest { collector_id: request.collector_id, - reason: Some("test".to_string()), + reason: Some("test".to_owned()), force: false, }) .await diff --git a/daemoneye-agent/src/collector_registry.rs b/daemoneye-agent/src/collector_registry.rs index fe83a71..9790188 100644 --- a/daemoneye-agent/src/collector_registry.rs +++ b/daemoneye-agent/src/collector_registry.rs @@ -46,8 +46,7 @@ impl CollectorRegistry { let now = SystemTime::now(); let heartbeat_interval = request .heartbeat_interval_ms - .map(Duration::from_millis) - .unwrap_or(self.default_heartbeat); + .map_or(self.default_heartbeat, Duration::from_millis); let record = CollectorRecord { registration: request, @@ -56,6 +55,7 @@ impl CollectorRegistry { heartbeat_interval, }; records.insert(collector_id.clone(), record); + drop(records); let assigned_topics = vec![ format!("control.collector.{}", collector_id), @@ -91,7 +91,7 @@ impl CollectorRegistry { record.last_heartbeat = SystemTime::now(); Ok(()) } - None => Err(RegistryError::NotFound(collector_id.to_string())), + None => Err(RegistryError::NotFound(collector_id.to_owned())), } } @@ -139,6 +139,7 @@ pub struct CollectorRecord { /// Errors that can occur when interacting with the collector registry. #[derive(Debug, Error)] +#[non_exhaustive] pub enum RegistryError { /// Collector is already registered. #[error("collector `{0}` is already registered")] @@ -154,19 +155,19 @@ pub enum RegistryError { fn validate_registration(request: &RegistrationRequest) -> Result<(), RegistryError> { if request.collector_id.trim().is_empty() { return Err(RegistryError::Validation( - "collector_id cannot be empty".to_string(), + "collector_id cannot be empty".to_owned(), )); } if request.collector_type.trim().is_empty() { return Err(RegistryError::Validation( - "collector_type cannot be empty".to_string(), + "collector_type cannot be empty".to_owned(), )); } if request.hostname.trim().is_empty() { return Err(RegistryError::Validation( - "hostname cannot be empty".to_string(), + "hostname cannot be empty".to_owned(), )); } @@ -174,6 +175,12 @@ fn validate_registration(request: &RegistrationRequest) -> Result<(), RegistryEr } #[cfg(test)] +#[allow( + clippy::str_to_string, + clippy::expect_used, + clippy::unwrap_used, + clippy::indexing_slicing +)] mod tests { use super::*; use std::collections::HashMap; diff --git a/daemoneye-agent/src/health.rs b/daemoneye-agent/src/health.rs new file mode 100644 index 0000000..778913e --- /dev/null +++ b/daemoneye-agent/src/health.rs @@ -0,0 +1,83 @@ +//! Health state utilities for service managers. +//! +//! This module provides common health state abstractions and utilities +//! for waiting on service health transitions. + +use anyhow::Result; +use std::time::Duration; +use tracing::debug; + +/// Trait for health state enums that support the wait-for-healthy pattern. +/// +/// Implement this trait for health state enums to enable use of +/// [`wait_for_healthy`] helper function. +pub trait HealthState: Clone + Send + Sync { + /// Returns `true` if the state represents a healthy/operational service. + fn is_healthy(&self) -> bool; + + /// Returns `true` if the state represents a starting/initializing service. + fn is_starting(&self) -> bool; + + /// Returns an error message if the state represents an unhealthy service. + /// + /// Returns `None` for healthy, starting, or stopped states. + fn unhealthy_message(&self) -> Option<&str>; + + /// Returns `true` if the state represents a stopped or shutting down service. + fn is_stopped_or_shutting_down(&self) -> bool; + + /// Returns a descriptive name for the service type (e.g., "Broker", "IPC server"). + fn service_name() -> &'static str; +} + +/// Wait for a service to become healthy with a timeout. +/// +/// This is a generic implementation of the wait-for-healthy pattern that +/// polls the provided health status function until the service becomes +/// healthy, an error state is reached, or the timeout expires. +/// +/// # Arguments +/// +/// * `timeout` - Maximum time to wait for healthy state +/// * `get_health` - Async function that returns the current health state +/// +/// # Returns +/// +/// - `Ok(())` if service becomes healthy +/// - `Err` if service is unhealthy, stopped, or timeout expires +pub async fn wait_for_healthy(timeout: Duration, get_health: F) -> Result<()> +where + H: HealthState, + F: Fn() -> Fut, + Fut: std::future::Future, +{ + let start = std::time::Instant::now(); + let service_name = H::service_name(); + + while start.elapsed() < timeout { + let health = get_health().await; + + if health.is_healthy() { + debug!("{} is healthy", service_name); + return Ok(()); + } + + if health.is_starting() { + debug!("{} is still starting, waiting...", service_name); + tokio::time::sleep(Duration::from_millis(100)).await; + continue; + } + + if let Some(error) = health.unhealthy_message() { + return Err(anyhow::anyhow!("{service_name} is unhealthy: {error}")); + } + + if health.is_stopped_or_shutting_down() { + return Err(anyhow::anyhow!("{service_name} is not running")); + } + } + + Err(anyhow::anyhow!( + "Timeout waiting for {service_name} to become healthy after {timeout:?}" + )) +} diff --git a/daemoneye-agent/src/ipc_server.rs b/daemoneye-agent/src/ipc_server.rs index 1729ea7..6e8518f 100644 --- a/daemoneye-agent/src/ipc_server.rs +++ b/daemoneye-agent/src/ipc_server.rs @@ -2,7 +2,7 @@ //! //! This module provides the `IpcServerManager` which manages an IPC server //! for communication with daemoneye-cli using protobuf + CRC32 framing. -//! This operates alongside the embedded EventBus broker for collector-core +//! This operates alongside the embedded `EventBus` broker for collector-core //! component communication, implementing the dual-protocol architecture. use anyhow::{Context, Result}; @@ -15,6 +15,7 @@ use tracing::{debug, error, info, warn}; /// Health status of the IPC server #[derive(Debug, Clone, PartialEq, Eq)] +#[non_exhaustive] pub enum IpcServerHealth { /// Server is healthy and operational Healthy, @@ -28,7 +29,32 @@ pub enum IpcServerHealth { Stopped, } -/// IPC server manager that coordinates the InterprocessServer lifecycle +impl super::health::HealthState for IpcServerHealth { + fn is_healthy(&self) -> bool { + matches!(self, Self::Healthy) + } + + fn is_starting(&self) -> bool { + matches!(self, Self::Starting) + } + + fn unhealthy_message(&self) -> Option<&str> { + match *self { + Self::Unhealthy(ref msg) => Some(msg), + Self::Healthy | Self::Starting | Self::ShuttingDown | Self::Stopped => None, + } + } + + fn is_stopped_or_shutting_down(&self) -> bool { + matches!(self, Self::ShuttingDown | Self::Stopped) + } + + fn service_name() -> &'static str { + "IPC server" + } +} + +/// IPC server manager that coordinates the `InterprocessServer` lifecycle /// within the daemoneye-agent process architecture for CLI communication. pub struct IpcServerManager { /// Configuration for the IPC server @@ -55,10 +81,7 @@ impl IpcServerManager { /// Initialize and start the IPC server pub async fn start(&self) -> Result<()> { // Update health status to starting - { - let mut health = self.health_status.write().await; - *health = IpcServerHealth::Starting; - } + *self.health_status.write().await = IpcServerHealth::Starting; info!( endpoint_path = %self.config.endpoint_path, @@ -95,16 +118,10 @@ impl IpcServerManager { server.start().await.context("Failed to start IPC server")?; // Store the server instance - { - let mut server_guard = self.server.write().await; - *server_guard = Some(server); - } + *self.server.write().await = Some(server); // Update health status to healthy - { - let mut health = self.health_status.write().await; - *health = IpcServerHealth::Healthy; - } + *self.health_status.write().await = IpcServerHealth::Healthy; info!("IPC server started successfully for CLI communication"); Ok(()) @@ -115,10 +132,7 @@ impl IpcServerManager { info!("Initiating graceful shutdown of IPC server"); // Update health status to shutting down - { - let mut health = self.health_status.write().await; - *health = IpcServerHealth::ShuttingDown; - } + *self.health_status.write().await = IpcServerHealth::ShuttingDown; // Send shutdown signal if available { @@ -164,10 +178,7 @@ impl IpcServerManager { } // Update health status to stopped - { - let mut health = self.health_status.write().await; - *health = IpcServerHealth::Stopped; - } + *self.health_status.write().await = IpcServerHealth::Stopped; info!("IPC server shutdown complete"); Ok(()) @@ -204,43 +215,24 @@ impl IpcServerManager { IpcServerHealth::Healthy } else { warn!("IPC server health check failed - server instance not found"); - let mut health = self.health_status.write().await; - *health = IpcServerHealth::Unhealthy("Server instance not found".to_string()); - IpcServerHealth::Unhealthy("Server instance not found".to_string()) + let unhealthy_status = + IpcServerHealth::Unhealthy("Server instance not found".to_owned()); + *self.health_status.write().await = unhealthy_status.clone(); + unhealthy_status } } - other => other, + IpcServerHealth::Starting + | IpcServerHealth::ShuttingDown + | IpcServerHealth::Unhealthy(_) + | IpcServerHealth::Stopped => current_health, } } /// Wait for the IPC server to become healthy with a timeout pub async fn wait_for_healthy(&self, timeout: Duration) -> Result<()> { - let start = std::time::Instant::now(); - - while start.elapsed() < timeout { - let health = self.health_status().await; - match health { - IpcServerHealth::Healthy => { - debug!("IPC server is healthy"); - return Ok(()); - } - IpcServerHealth::Starting => { - debug!("IPC server is still starting, waiting..."); - tokio::time::sleep(Duration::from_millis(100)).await; - } - IpcServerHealth::Unhealthy(ref error) => { - return Err(anyhow::anyhow!("IPC server is unhealthy: {}", error)); - } - IpcServerHealth::ShuttingDown | IpcServerHealth::Stopped => { - return Err(anyhow::anyhow!("IPC server is not running")); - } - } - } - - Err(anyhow::anyhow!( - "Timeout waiting for IPC server to become healthy after {:?}", - timeout - )) + let health_status = Arc::clone(&self.health_status); + super::health::wait_for_healthy(timeout, || async { health_status.read().await.clone() }) + .await } } diff --git a/daemoneye-agent/src/lib.rs b/daemoneye-agent/src/lib.rs index 3a5f8b9..0a66fd4 100644 --- a/daemoneye-agent/src/lib.rs +++ b/daemoneye-agent/src/lib.rs @@ -1,15 +1,17 @@ -//! DaemonEye Agent Library +//! `DaemonEye` Agent Library //! -//! This library provides the core functionality for the DaemonEye detection orchestrator, -//! including embedded EventBus broker management, IPC client functionality, and IPC server +//! This library provides the core functionality for the `DaemonEye` detection orchestrator, +//! including embedded `EventBus` broker management, IPC client functionality, and IPC server //! management for CLI communication. #![forbid(unsafe_code)] pub mod broker_manager; pub mod collector_registry; +pub mod health; pub mod ipc_server; pub use broker_manager::{BrokerHealth, BrokerManager}; pub use collector_registry::{CollectorRegistry, RegistryError}; +pub use health::{HealthState, wait_for_healthy}; pub use ipc_server::{IpcServerHealth, IpcServerManager, create_cli_ipc_config}; diff --git a/daemoneye-agent/src/main.rs b/daemoneye-agent/src/main.rs index 75f9326..b4a147b 100644 --- a/daemoneye-agent/src/main.rs +++ b/daemoneye-agent/src/main.rs @@ -7,6 +7,7 @@ use tracing::{debug, error, info, warn}; mod broker_manager; mod collector_registry; +mod health; mod ipc_server; use broker_manager::BrokerManager; @@ -29,7 +30,7 @@ struct Cli { #[tokio::main] pub async fn main() -> Result<(), Box> { if let Err(e) = run().await { - eprintln!("Error: {}", e); + eprintln!("Error: {e}"); std::process::exit(1); } Ok(()) @@ -46,7 +47,10 @@ async fn run() -> Result<(), Box> { .map(|v| v == "1") .unwrap_or(false) { - println!("daemoneye-agent started successfully"); + #[allow(clippy::print_stdout, clippy::semicolon_if_nothing_returned)] + { + println!("daemoneye-agent started successfully") + }; return Ok(()); } @@ -58,7 +62,7 @@ async fn run() -> Result<(), Box> { config.database.path = cli.database.into(); // Initialize telemetry - let mut telemetry = telemetry::TelemetryCollector::new("daemoneye-agent".to_string()); + let mut telemetry = telemetry::TelemetryCollector::new("daemoneye-agent".to_owned()); // Initialize database let _db_manager = storage::DatabaseManager::new(&config.database.path)?; @@ -117,11 +121,11 @@ async fn run() -> Result<(), Box> { // Create a sample detection rule let rule = models::DetectionRule::new( - "rule-1".to_string(), - "Test Rule".to_string(), - "Test detection rule".to_string(), - "SELECT * FROM processes WHERE name = 'test'".to_string(), - "test".to_string(), + "rule-1".to_owned(), + "Test Rule".to_owned(), + "Test detection rule".to_owned(), + "SELECT * FROM processes WHERE name = 'test'".to_owned(), + "test".to_owned(), models::AlertSeverity::Medium, ); @@ -131,13 +135,16 @@ async fn run() -> Result<(), Box> { // Initialize alert manager let mut alert_manager = alerting::AlertManager::new(); let stdout_sink = Box::new(alerting::StdoutSink::new( - "stdout".to_string(), + "stdout".to_owned(), alerting::OutputFormat::Json, )); alert_manager.add_sink(stdout_sink); // Indicate startup success before entering main loop - println!("daemoneye-agent started successfully"); + #[allow(clippy::print_stdout, clippy::semicolon_if_nothing_returned)] + { + println!("daemoneye-agent started successfully") + }; // Main collection loop using IPC client let scan_interval = Duration::from_millis(config.app.scan_interval_ms); @@ -156,19 +163,18 @@ async fn run() -> Result<(), Box> { // Main loop task // detection_engine already mutable above; reuse directly - let mut alert_manager = alert_manager; // mutable for sink operations let mut iteration: u64 = 0; tokio::pin!(shutdown_signal); loop { tokio::select! { - _ = &mut shutdown_signal => { + () = &mut shutdown_signal => { info!("Shutdown signal received; commencing graceful shutdown"); break; } - _ = tokio::time::sleep(scan_interval) => { - iteration += 1; + () = tokio::time::sleep(scan_interval) => { + iteration = iteration.saturating_add(1); let loop_start = Instant::now(); // Periodic RPC health checks (every 10 iterations) @@ -230,7 +236,7 @@ async fn run() -> Result<(), Box> { }; // Execute detection rules against collected processes - let detection_timer = telemetry::PerformanceTimer::start("detection_execution".to_string()); + let detection_timer = telemetry::PerformanceTimer::start("detection_execution".to_owned()); let alerts = detection_engine.execute_rules(&processes); if !alerts.is_empty() { @@ -273,8 +279,10 @@ async fn run() -> Result<(), Box> { broker_manager::BrokerHealth::Unhealthy(ref error) => { warn!(error = %error, "Broker health check failed"); } - other => { - debug!(status = ?other, "Broker health status"); + broker_manager::BrokerHealth::Starting + | broker_manager::BrokerHealth::ShuttingDown + | broker_manager::BrokerHealth::Stopped => { + debug!(status = ?broker_health, "Broker health status"); } } @@ -287,13 +295,17 @@ async fn run() -> Result<(), Box> { ipc_server::IpcServerHealth::Unhealthy(ref error) => { warn!(error = %error, "IPC server health check failed"); } - other => { - debug!(status = ?other, "IPC server health status"); + ipc_server::IpcServerHealth::Starting + | ipc_server::IpcServerHealth::ShuttingDown + | ipc_server::IpcServerHealth::Stopped => { + debug!(status = ?ipc_health, "IPC server health status"); } } } let loop_elapsed = loop_start.elapsed(); - if loop_elapsed > scan_interval { warn!(elapsed_ms = loop_elapsed.as_millis() as u64, "Loop overran scan interval"); } + #[allow(clippy::as_conversions)] // Safe: loop elapsed will not overflow u64 + let elapsed_ms = loop_elapsed.as_millis() as u64; + if loop_elapsed > scan_interval { warn!(elapsed_ms = elapsed_ms, "Loop overran scan interval"); } } } } @@ -312,6 +324,9 @@ async fn run() -> Result<(), Box> { error!(error = %e, "Failed to shutdown embedded broker gracefully"); } - println!("daemoneye-agent shutdown complete."); + #[allow(clippy::print_stdout, clippy::semicolon_if_nothing_returned)] + { + println!("daemoneye-agent shutdown complete.") + }; Ok(()) } diff --git a/daemoneye-agent/tests/broker_integration.rs b/daemoneye-agent/tests/broker_integration.rs index fd04f96..eaf7a0c 100644 --- a/daemoneye-agent/tests/broker_integration.rs +++ b/daemoneye-agent/tests/broker_integration.rs @@ -1,5 +1,16 @@ //! Integration tests for embedded broker functionality +#![allow( + clippy::expect_used, + clippy::unwrap_used, + clippy::str_to_string, + clippy::shadow_unrelated, + clippy::shadow_reuse, + clippy::ignore_without_reason, + clippy::print_stdout, + clippy::uninlined_format_args +)] + use daemoneye_agent::BrokerManager; use daemoneye_lib::config::BrokerConfig; use std::time::Duration; diff --git a/daemoneye-agent/tests/cli.rs b/daemoneye-agent/tests/cli.rs index 7fe5de5..1e1a41a 100644 --- a/daemoneye-agent/tests/cli.rs +++ b/daemoneye-agent/tests/cli.rs @@ -1,3 +1,5 @@ +#![allow(clippy::expect_used, clippy::unwrap_used)] + use insta::assert_snapshot; use std::process::Command; use tempfile::TempDir; diff --git a/daemoneye-agent/tests/dual_protocol_integration.rs b/daemoneye-agent/tests/dual_protocol_integration.rs index e3d2efb..37f8370 100644 --- a/daemoneye-agent/tests/dual_protocol_integration.rs +++ b/daemoneye-agent/tests/dual_protocol_integration.rs @@ -2,6 +2,17 @@ //! //! Tests the coexistence of EventBus broker and IPC server within daemoneye-agent +#![allow( + clippy::str_to_string, + clippy::expect_used, + clippy::unwrap_used, + clippy::uninlined_format_args, + clippy::let_underscore_must_use, + clippy::print_stdout, + clippy::ignored_unit_patterns, + clippy::doc_markdown +)] + use anyhow::Result; use daemoneye_agent::{BrokerManager, IpcServerManager, create_cli_ipc_config}; use daemoneye_lib::config::BrokerConfig; diff --git a/daemoneye-agent/tests/rpc_collector_management_integration.rs b/daemoneye-agent/tests/rpc_collector_management_integration.rs index 074ce7e..fee5ab9 100644 --- a/daemoneye-agent/tests/rpc_collector_management_integration.rs +++ b/daemoneye-agent/tests/rpc_collector_management_integration.rs @@ -7,6 +7,16 @@ //! - Graceful shutdown coordination //! - Error handling and timeout scenarios +#![allow( + clippy::unwrap_used, + clippy::expect_used, + clippy::let_underscore_must_use, + clippy::shadow_reuse, + clippy::str_to_string, + clippy::uninlined_format_args, + clippy::unused_async +)] + use anyhow::Result; use daemoneye_agent::broker_manager::BrokerManager; use daemoneye_lib::config::BrokerConfig; diff --git a/daemoneye-agent/tests/rpc_lifecycle_integration.rs b/daemoneye-agent/tests/rpc_lifecycle_integration.rs index 709d778..6be1066 100644 --- a/daemoneye-agent/tests/rpc_lifecycle_integration.rs +++ b/daemoneye-agent/tests/rpc_lifecycle_integration.rs @@ -3,6 +3,15 @@ //! These tests validate the complete RPC workflow across process boundaries, //! including registration, lifecycle operations, health checks, and graceful shutdown. +#![allow( + clippy::str_to_string, + clippy::expect_used, + clippy::unwrap_used, + clippy::uninlined_format_args, + clippy::let_underscore_must_use, + clippy::print_stdout +)] + use daemoneye_agent::broker_manager::BrokerManager; use daemoneye_lib::config::BrokerConfig; use std::sync::Arc; diff --git a/daemoneye-cli/Cargo.toml b/daemoneye-cli/Cargo.toml index 8f3f872..200d105 100644 --- a/daemoneye-cli/Cargo.toml +++ b/daemoneye-cli/Cargo.toml @@ -22,20 +22,12 @@ eula = false [dependencies] # Core dependencies -anyhow = { workspace = true } -chrono = { workspace = true } clap = { workspace = true } # Internal library daemoneye-lib = { workspace = true } -# Database -redb = { workspace = true } -serde = { workspace = true } serde_json = { workspace = true } -thiserror = { workspace = true } -tokio = { workspace = true } -tracing = { workspace = true } tracing-subscriber = { workspace = true } [dev-dependencies] @@ -44,6 +36,5 @@ insta = { workspace = true } predicates = { workspace = true } tempfile = { workspace = true } -[lints.rust] -unsafe_code = "forbid" -warnings = "deny" +[lints] +workspace = true diff --git a/daemoneye-cli/src/main.rs b/daemoneye-cli/src/main.rs index ff13581..abea4e7 100644 --- a/daemoneye-cli/src/main.rs +++ b/daemoneye-cli/src/main.rs @@ -1,9 +1,10 @@ #![forbid(unsafe_code)] +#![allow(clippy::print_stdout)] // CLI output to user is intentional use clap::Parser; use daemoneye_lib::{config, storage, telemetry}; -/// DaemonEye CLI interface +/// `DaemonEye` CLI interface #[derive(Parser)] #[command(name = "daemoneye-cli")] #[command(about = "DaemonEye CLI interface")] diff --git a/daemoneye-cli/tests/cli.rs b/daemoneye-cli/tests/cli.rs index 2bd2882..e39c155 100644 --- a/daemoneye-cli/tests/cli.rs +++ b/daemoneye-cli/tests/cli.rs @@ -14,7 +14,7 @@ fn prints_expected_greeting() -> Result<(), Box> { let output = cmd.output()?; if !output.status.success() { let stderr = String::from_utf8_lossy(&output.stderr); - eprintln!("Command failed with stderr: {}", stderr); + eprintln!("Command failed with stderr: {stderr}"); } assert!(output.status.success()); let stdout = String::from_utf8_lossy(&output.stdout); diff --git a/daemoneye-eventbus/Cargo.toml b/daemoneye-eventbus/Cargo.toml index a38aa10..dc649a8 100644 --- a/daemoneye-eventbus/Cargo.toml +++ b/daemoneye-eventbus/Cargo.toml @@ -21,9 +21,8 @@ path = "src/lib.rs" [dependencies] anyhow = { workspace = true } async-trait = { workspace = true } -bincode = { workspace = true, features = ["serde"] } +postcard = { workspace = true } blake3 = { workspace = true } -chrono = { workspace = true } interprocess = { workspace = true, optional = true } rand = { workspace = true } serde = { workspace = true } diff --git a/daemoneye-eventbus/benches/throughput.rs b/daemoneye-eventbus/benches/throughput.rs index 3d195e2..b8a2900 100644 --- a/daemoneye-eventbus/benches/throughput.rs +++ b/daemoneye-eventbus/benches/throughput.rs @@ -43,8 +43,7 @@ fn bench_message_serialization(c: &mut Criterion) { c.bench_function("message_serialization", |b| { b.iter(|| { let event = create_test_process_event(black_box(1234)); - let serialized = - bincode::serde::encode_to_vec(&event, bincode::config::standard()).unwrap(); + let serialized = postcard::to_allocvec(&event).unwrap(); black_box(serialized) }) }); @@ -53,15 +52,12 @@ fn bench_message_serialization(c: &mut Criterion) { fn bench_message_deserialization(c: &mut Criterion) { // Pre-serialize some data let event = create_test_process_event(1234); - let serialized = bincode::serde::encode_to_vec(&event, bincode::config::standard()).unwrap(); + let serialized = postcard::to_allocvec(&event).unwrap(); c.bench_function("message_deserialization", |b| { b.iter(|| { - let (deserialized, _): (CollectionEvent, _) = bincode::serde::decode_from_slice( - black_box(&serialized), - bincode::config::standard(), - ) - .unwrap(); + let deserialized: CollectionEvent = + postcard::from_bytes(black_box(&serialized)).unwrap(); black_box(deserialized) }) }); diff --git a/daemoneye-eventbus/docs/rpc-patterns.md b/daemoneye-eventbus/docs/rpc-patterns.md index 62cd1ce..e00cd05 100644 --- a/daemoneye-eventbus/docs/rpc-patterns.md +++ b/daemoneye-eventbus/docs/rpc-patterns.md @@ -201,7 +201,7 @@ RpcResponse { Message { topic: "control.health.heartbeat.procmond", message_type: MessageType::Heartbeat, - payload: bincode::encode(HeartbeatData { + payload: postcard::to_allocvec(&HeartbeatData { collector_id: "procmond", timestamp: SystemTime::now(), sequence: 12345, @@ -609,15 +609,13 @@ impl ProcessCollectorRpcService { pub async fn handle_rpc_message(&self, message: Message) -> Result { // Deserialize RPC request - let request: RpcRequest = - bincode::serde::decode_from_slice(&message.payload, bincode::config::standard())?.0; + let request: RpcRequest = postcard::from_bytes(&message.payload)?; // Handle request let response = self.rpc_service.handle_request(request).await; // Serialize response - let response_payload = - bincode::serde::encode_to_vec(&response, bincode::config::standard())?; + let response_payload = postcard::to_allocvec(&response)?; Ok(Message::new( format!("response.{}", message.correlation_id), diff --git a/daemoneye-eventbus/src/broker.rs b/daemoneye-eventbus/src/broker.rs index e92bf2b..7c3edf3 100644 --- a/daemoneye-eventbus/src/broker.rs +++ b/daemoneye-eventbus/src/broker.rs @@ -549,9 +549,7 @@ impl DaemoneyeBroker { // Attempt to decode CollectionEvent for internal subscribers let collection_event_result: Result = - bincode::serde::decode_from_slice(&payload, bincode::config::standard()) - .map(|(event, _)| event) - .map_err(|e| EventBusError::serialization(e.to_string())); + postcard::from_bytes(&payload).map_err(|e| EventBusError::serialization(e.to_string())); for subscriber_id in &subscribers { if let Some(sender) = senders_guard.get(subscriber_id) { @@ -955,7 +953,7 @@ impl EventBus for DaemoneyeEventBus { }; // Serialize event to payload - let payload = bincode::serde::encode_to_vec(&event, bincode::config::standard()) + let payload = postcard::to_allocvec(&event) .map_err(|e| EventBusError::serialization(e.to_string()))?; self.broker.publish(topic, &correlation_id, payload).await diff --git a/daemoneye-eventbus/src/client.rs b/daemoneye-eventbus/src/client.rs index ec2c347..ddb1b70 100644 --- a/daemoneye-eventbus/src/client.rs +++ b/daemoneye-eventbus/src/client.rs @@ -279,7 +279,7 @@ impl EventBusClient { }; // Serialize event - let payload = bincode::serde::encode_to_vec(&event, bincode::config::standard()) + let payload = postcard::to_allocvec(&event) .map_err(|e| EventBusError::serialization(e.to_string()))?; // Validate payload size @@ -514,10 +514,8 @@ impl EventBusClient { _client_id: &str, ) -> Result<()> { // Deserialize event - let event: CollectionEvent = - bincode::serde::decode_from_slice(&message.payload, bincode::config::standard()) - .map_err(|e| EventBusError::serialization(e.to_string()))? - .0; + let event: CollectionEvent = postcard::from_bytes(&message.payload) + .map_err(|e| EventBusError::serialization(e.to_string()))?; // Create bus event let bus_event = BusEvent { diff --git a/daemoneye-eventbus/src/message.rs b/daemoneye-eventbus/src/message.rs index 484acc9..3b5d7bf 100644 --- a/daemoneye-eventbus/src/message.rs +++ b/daemoneye-eventbus/src/message.rs @@ -379,7 +379,7 @@ impl Message { topic: String, request: &crate::rpc::RpcRequest, ) -> Result { - let payload = bincode::serde::encode_to_vec(request, bincode::config::standard()) + let payload = postcard::to_allocvec(request) .map_err(|e| crate::error::EventBusError::serialization(e.to_string()))?; // Convert RpcCorrelationMetadata to CorrelationMetadata @@ -406,7 +406,7 @@ impl Message { topic: String, response: &crate::rpc::RpcResponse, ) -> Result { - let payload = bincode::serde::encode_to_vec(response, bincode::config::standard()) + let payload = postcard::to_allocvec(response) .map_err(|e| crate::error::EventBusError::serialization(e.to_string()))?; // Convert RpcCorrelationMetadata to CorrelationMetadata @@ -430,15 +430,14 @@ impl Message { /// Serialize message to bytes pub fn serialize(&self) -> Result, crate::error::EventBusError> { - bincode::serde::encode_to_vec(self, bincode::config::standard()) + postcard::to_allocvec(self) .map_err(|e| crate::error::EventBusError::serialization(e.to_string())) } /// Deserialize message from bytes pub fn deserialize(data: &[u8]) -> Result { - bincode::serde::decode_from_slice(data, bincode::config::standard()) + postcard::from_bytes(data) .map_err(|e| crate::error::EventBusError::serialization(e.to_string())) - .map(|(result, _)| result) } } diff --git a/daemoneye-eventbus/src/rpc.rs b/daemoneye-eventbus/src/rpc.rs index 4910f6b..b2f4f89 100644 --- a/daemoneye-eventbus/src/rpc.rs +++ b/daemoneye-eventbus/src/rpc.rs @@ -709,7 +709,7 @@ impl CollectorRpcClient { } // Serialize RPC request - let payload = bincode::serde::encode_to_vec(&request, bincode::config::standard()) + let payload = postcard::to_allocvec(&request) .map_err(|e| EventBusError::serialization(e.to_string()))?; // Publish request to broker using the target from the request @@ -774,11 +774,8 @@ impl CollectorRpcClient { } // Deserialize response from message payload - let response: RpcResponse = match bincode::serde::decode_from_slice::( - &message.payload, - bincode::config::standard(), - ) { - Ok((resp, _)) => resp, + let response: RpcResponse = match postcard::from_bytes(&message.payload) { + Ok(resp) => resp, Err(e) => { tracing::error!("Failed to deserialize RPC response: {}", e); continue; @@ -2702,13 +2699,10 @@ mod tests { correlation_metadata: RpcCorrelationMetadata::default(), }; - let serialized = bincode::serde::encode_to_vec(&response, bincode::config::standard()); - assert!(serialized.is_ok()); + let serialized = postcard::to_allocvec(&response).expect("serialization should succeed"); - let deserialized = - bincode::serde::decode_from_slice(&serialized.unwrap(), bincode::config::standard()); - assert!(deserialized.is_ok()); - let (deserialized_response, _): (RpcResponse, _) = deserialized.unwrap(); + let deserialized_response: RpcResponse = + postcard::from_bytes(&serialized).expect("deserialization should succeed"); assert_eq!(deserialized_response.service_id, "test-service"); assert_eq!(deserialized_response.status, RpcStatus::Success); } diff --git a/daemoneye-eventbus/src/task_distribution.rs b/daemoneye-eventbus/src/task_distribution.rs index 15b16fb..8d4801f 100644 --- a/daemoneye-eventbus/src/task_distribution.rs +++ b/daemoneye-eventbus/src/task_distribution.rs @@ -507,8 +507,8 @@ impl TaskDistributor { collector: &CollectorRegistration, ) -> Result<()> { // Serialize task - let payload = bincode::serde::encode_to_vec(task, bincode::config::standard()) - .map_err(|e| EventBusError::serialization(e.to_string()))?; + let payload = + postcard::to_allocvec(task).map_err(|e| EventBusError::serialization(e.to_string()))?; // Publish to collector's task topic let correlation_id = task diff --git a/daemoneye-eventbus/tests/rpc_integration_tests.rs b/daemoneye-eventbus/tests/rpc_integration_tests.rs index aaa937c..d5491ec 100644 --- a/daemoneye-eventbus/tests/rpc_integration_tests.rs +++ b/daemoneye-eventbus/tests/rpc_integration_tests.rs @@ -777,15 +777,11 @@ async fn test_rpc_request_serialization() { ); // Test serialization - let serialized = bincode::serde::encode_to_vec(&request, bincode::config::standard()); + let serialized = postcard::to_allocvec(&request); assert!(serialized.is_ok()); // Test deserialization - let deserialized = - bincode::serde::decode_from_slice(&serialized.unwrap(), bincode::config::standard()); - assert!(deserialized.is_ok()); - - let (deserialized_request, _): (RpcRequest, _) = deserialized.unwrap(); + let deserialized_request: RpcRequest = postcard::from_bytes(&serialized.unwrap()).unwrap(); assert_eq!(deserialized_request.client_id, "test-client"); assert_eq!(deserialized_request.operation, CollectorOperation::Start); } @@ -817,15 +813,11 @@ async fn test_rpc_response_serialization() { }; // Test serialization - let serialized = bincode::serde::encode_to_vec(&response, bincode::config::standard()); + let serialized = postcard::to_allocvec(&response); assert!(serialized.is_ok()); // Test deserialization - let deserialized = - bincode::serde::decode_from_slice(&serialized.unwrap(), bincode::config::standard()); - assert!(deserialized.is_ok()); - - let (deserialized_response, _): (RpcResponse, _) = deserialized.unwrap(); + let deserialized_response: RpcResponse = postcard::from_bytes(&serialized.unwrap()).unwrap(); assert_eq!(deserialized_response.service_id, "test-service"); assert_eq!(deserialized_response.status, RpcStatus::Success); @@ -989,11 +981,8 @@ async fn setup_test_service_handler( ); // Deserialize RPC request - let request: RpcRequest = match bincode::serde::decode_from_slice::( - &message.payload, - bincode::config::standard(), - ) { - Ok((req, _)) => { + let request: RpcRequest = match postcard::from_bytes::(&message.payload) { + Ok(req) => { println!( "DEBUG: Successfully deserialized RPC request: {:?}", req.operation @@ -1019,17 +1008,16 @@ async fn setup_test_service_handler( ); // Serialize and publish response - let payload = - match bincode::serde::encode_to_vec(&response, bincode::config::standard()) { - Ok(data) => { - println!("DEBUG: Serialized response payload length: {}", data.len()); - data - } - Err(e) => { - eprintln!("Failed to serialize RPC response: {}", e); - continue; - } - }; + let payload = match postcard::to_allocvec(&response) { + Ok(data) => { + println!("DEBUG: Serialized response payload length: {}", data.len()); + data + } + Err(e) => { + eprintln!("Failed to serialize RPC response: {}", e); + continue; + } + }; match broker .publish(&response_topic, &response.request_id, payload) @@ -1071,11 +1059,10 @@ async fn test_rpc_call_through_broker() -> Result<()> { ); // Test serialization/deserialization locally first - let serialized = bincode::serde::encode_to_vec(&request, bincode::config::standard()).unwrap(); + let serialized = postcard::to_allocvec(&request).unwrap(); println!("DEBUG: Serialized length: {}", serialized.len()); - let (deserialized, _): (RpcRequest, _) = - bincode::serde::decode_from_slice(&serialized, bincode::config::standard()).unwrap(); + let deserialized: RpcRequest = postcard::from_bytes(&serialized).unwrap(); println!( "DEBUG: Deserialized successfully: {:?}", deserialized.operation diff --git a/daemoneye-lib/src/collection.rs b/daemoneye-lib/src/collection.rs index f5dc3fb..cebcde4 100644 --- a/daemoneye-lib/src/collection.rs +++ b/daemoneye-lib/src/collection.rs @@ -106,6 +106,7 @@ impl ProcessCollectionService for SysinfoProcessCollector { | sysinfo::ProcessStatus::Parked | sysinfo::ProcessStatus::LockBlocked | sysinfo::ProcessStatus::UninterruptibleDiskSleep + | sysinfo::ProcessStatus::Suspended | sysinfo::ProcessStatus::Unknown(_) => { ProcessStatus::Unknown(format!("{:?}", process.status())) } diff --git a/daemoneye-lib/src/models/rule.rs b/daemoneye-lib/src/models/rule.rs index f607a53..3950332 100644 --- a/daemoneye-lib/src/models/rule.rs +++ b/daemoneye-lib/src/models/rule.rs @@ -406,126 +406,15 @@ impl DetectionRule { } // Ensure it's a SELECT statement - we checked length above so this is safe + #[allow(clippy::wildcard_enum_match_arm)] match &statements[0] { Statement::Query(query) => { // Basic validation - ensure it's a SELECT query Self::validate_query_basic(query)?; } - Statement::Analyze { .. } - | Statement::Set(_) - | Statement::Truncate { .. } - | Statement::Msck { .. } - | Statement::Insert(_) - | Statement::Install { .. } - | Statement::Load { .. } - | Statement::Directory { .. } - | Statement::Case(_) - | Statement::If(_) - | Statement::While(_) - | Statement::Raise(_) - | Statement::Call(_) - | Statement::Copy { .. } - | Statement::CopyIntoSnowflake { .. } - | Statement::Open(_) - | Statement::Close { .. } - | Statement::Update { .. } - | Statement::Delete(_) - | Statement::CreateView { .. } - | Statement::CreateTable(_) - | Statement::CreateVirtualTable { .. } - | Statement::CreateIndex(_) - | Statement::CreateRole { .. } - | Statement::CreateSecret { .. } - | Statement::CreateServer(_) - | Statement::CreatePolicy { .. } - | Statement::CreateConnector(_) - | Statement::AlterTable { .. } - | Statement::AlterIndex { .. } - | Statement::AlterView { .. } - | Statement::AlterType(_) - | Statement::AlterRole { .. } - | Statement::AlterPolicy { .. } - | Statement::AlterConnector { .. } - | Statement::AlterSession { .. } - | Statement::AttachDatabase { .. } - | Statement::AttachDuckDBDatabase { .. } - | Statement::DetachDuckDBDatabase { .. } - | Statement::Drop { .. } - | Statement::DropFunction { .. } - | Statement::DropDomain(_) - | Statement::DropProcedure { .. } - | Statement::DropSecret { .. } - | Statement::DropPolicy { .. } - | Statement::DropConnector { .. } - | Statement::Declare { .. } - | Statement::CreateExtension { .. } - | Statement::DropExtension { .. } - | Statement::Fetch { .. } - | Statement::Flush { .. } - | Statement::Discard { .. } - | Statement::ShowFunctions { .. } - | Statement::ShowVariable { .. } - | Statement::ShowStatus { .. } - | Statement::ShowVariables { .. } - | Statement::ShowCreate { .. } - | Statement::ShowColumns { .. } - | Statement::ShowDatabases { .. } - | Statement::ShowSchemas { .. } - | Statement::ShowObjects(_) - | Statement::ShowTables { .. } - | Statement::ShowViews { .. } - | Statement::ShowCollation { .. } - | Statement::Use(_) - | Statement::StartTransaction { .. } - | Statement::Comment { .. } - | Statement::Commit { .. } - | Statement::Rollback { .. } - | Statement::CreateSchema { .. } - | Statement::CreateDatabase { .. } - | Statement::CreateFunction(_) - | Statement::CreateTrigger { .. } - | Statement::DropTrigger { .. } - | Statement::CreateProcedure { .. } - | Statement::CreateMacro { .. } - | Statement::CreateStage { .. } - | Statement::Assert { .. } - | Statement::Grant { .. } - | Statement::Deny(_) - | Statement::Revoke { .. } - | Statement::Deallocate { .. } - | Statement::Execute { .. } - | Statement::Prepare { .. } - | Statement::Kill { .. } - | Statement::ExplainTable { .. } - | Statement::Explain { .. } - | Statement::Savepoint { .. } - | Statement::ReleaseSavepoint { .. } - | Statement::Merge { .. } - | Statement::Cache { .. } - | Statement::UNCache { .. } - | Statement::CreateSequence { .. } - | Statement::CreateDomain(_) - | Statement::CreateType { .. } - | Statement::Pragma { .. } - | Statement::LockTables { .. } - | Statement::UnlockTables - | Statement::Unload { .. } - | Statement::OptimizeTable { .. } - | Statement::LISTEN { .. } - | Statement::UNLISTEN { .. } - | Statement::NOTIFY { .. } - | Statement::LoadData { .. } - | Statement::RenameTable(_) - | Statement::List(_) - | Statement::Remove(_) - | Statement::RaisError { .. } - | Statement::Print(_) - | Statement::Return(_) - | Statement::AlterSchema(_) - | Statement::ShowCharset(_) - | Statement::ExportData(_) - | Statement::CreateUser(_) - | Statement::Vacuum(_) => { + // Only SELECT statements are allowed; reject all other statement types + // including any new variants added in future sqlparser versions + _ => { return Err(RuleError::InvalidSql( "Only SELECT statements are allowed".to_owned(), )); diff --git a/justfile b/justfile index d21ff7c..60d9d00 100644 --- a/justfile +++ b/justfile @@ -1,9 +1,15 @@ # Cross-platform justfile using OS annotations # Windows uses PowerShell, Unix uses bash -set shell := ["bash", "-c"] +set shell := ["bash", "-cu"] set windows-shell := ["powershell", "-NoProfile", "-Command"] +set dotenv-load := true +set ignore-comments := true +# Use mise to manage all dev tools (go, pre-commit, uv, etc.) +# See mise.toml for tool versions + +mise_exec := "mise exec --" root := justfile_dir() # ============================================================================= @@ -11,37 +17,28 @@ root := justfile_dir() # ============================================================================= default: - @just help - -help: @just --list # ============================================================================= -# CROSS-PLATFORM HELPERS +# CROSS-PLATFORM HELPERS (private) # ============================================================================= -# Cross-platform helpers using OS annotations -# Each helper has Windows and Unix variants - -[windows] -cd-root: - Set-Location "{{ root }}" - -[unix] -cd-root: - cd "{{ root }}" +[private] [windows] ensure-dir dir: New-Item -ItemType Directory -Force -Path "{{ dir }}" | Out-Null +[private] [unix] ensure-dir dir: /bin/mkdir -p "{{ dir }}" +[private] [windows] rmrf path: if (Test-Path "{{ path }}") { Remove-Item "{{ path }}" -Recurse -Force } +[private] [unix] rmrf path: /bin/rm -rf "{{ path }}" @@ -50,73 +47,9 @@ rmrf path: # SETUP AND INITIALIZATION # ============================================================================= -# Development setup -[windows] -setup: - Set-Location "{{ root }}" - rustup component add rustfmt clippy llvm-tools-preview - cargo install cargo-binstall --locked - @just mdformat-install - Write-Host "Note: You may need to restart your shell for pipx PATH changes to take effect" - -[unix] +# Development setup - mise handles all tool installation via mise.toml setup: - cd "{{ root }}" - rustup component add rustfmt clippy llvm-tools-preview - cargo install cargo-binstall --locked - @just mdformat-install - echo "Note: You may need to restart your shell for pipx PATH changes to take effect" - -# Install development tools (extended setup) -[windows] -install-tools: - cargo binstall --disable-telemetry cargo-llvm-cov cargo-audit cargo-deny cargo-dist cargo-release cargo-cyclonedx cargo-auditable cargo-nextest --locked - -[unix] -install-tools: - cargo binstall --disable-telemetry cargo-llvm-cov cargo-audit cargo-deny cargo-dist cargo-release cargo-cyclonedx cargo-auditable cargo-nextest --locked - -# Install mdBook and plugins for documentation -[windows] -docs-install: - cargo binstall mdbook mdbook-admonish mdbook-mermaid mdbook-linkcheck mdbook-toc mdbook-open-on-gh mdbook-tabs mdbook-i18n-helpers - -[unix] -docs-install: - cargo binstall mdbook mdbook-admonish mdbook-mermaid mdbook-linkcheck mdbook-toc mdbook-open-on-gh mdbook-tabs mdbook-i18n-helpers - -# Install pipx for Python tool management -[windows] -pipx-install: - python -m pip install --user pipx - python -m pipx ensurepath - -[unix] -pipx-install: - #!/bin/bash - set -e - set -u - set -o pipefail - - if command -v pipx >/dev/null 2>&1; then - echo "pipx already installed" - else - echo "Installing pipx..." - python3 -m pip install --user pipx - python3 -m pipx ensurepath - fi - -# Install mdformat and extensions for markdown formatting -[windows] -mdformat-install: pipx-install - pipx install mdformat - pipx inject mdformat mdformat-gfm mdformat-frontmatter mdformat-footnote mdformat-simple-breaks mdformat-gfm-alerts mdformat-toc mdformat-wikilink mdformat-tables - -[unix] -mdformat-install: - @just pipx-install - pipx install mdformat - pipx inject mdformat mdformat-gfm mdformat-frontmatter mdformat-footnote mdformat-simple-breaks mdformat-gfm-alerts mdformat-toc mdformat-wikilink mdformat-tables + mise install # ============================================================================= # FORMATTING AND LINTING @@ -134,66 +67,62 @@ format-docs: @if command -v mdformat >/dev/null 2>&1; then find . -type f -name "*.md" -not -path "./target/*" -not -path "./node_modules/*" -exec mdformat {} + ; else echo "mdformat not found. Run 'just mdformat-install' first."; fi fmt: - @cargo fmt --all + @{{ mise_exec }} cargo fmt --all fmt-check: - @cargo fmt --all --check + @{{ mise_exec }} cargo fmt --all --check lint-rust: fmt-check - @cargo clippy --workspace --all-targets --all-features -- -D warnings + @{{ mise_exec }} cargo clippy --workspace --all-targets --all-features -- -D warnings lint-rust-min: - @cargo clippy --workspace --all-targets --no-default-features -- -D warnings + @{{ mise_exec }} cargo clippy --workspace --all-targets --no-default-features -- -D warnings # Check documentation compiles without warnings [windows] lint-docs: - $env:RUSTDOCFLAGS='-D warnings'; cargo doc --no-deps --document-private-items + $env:RUSTDOCFLAGS='-D warnings'; @{{ mise_exec }} cargo doc --no-deps --document-private-items [unix] lint-docs: - RUSTDOCFLAGS='-D warnings' cargo doc --no-deps --document-private-items + RUSTDOCFLAGS='-D warnings' {{ mise_exec }} cargo doc --no-deps --document-private-items # Format justfile fmt-justfile: - @just --fmt --unstable + @{{ mise_exec }} just --fmt --unstable # Lint justfile formatting lint-justfile: - @just --fmt --check --unstable + @{{ mise_exec }} just --fmt --check --unstable lint: lint-rust lint-docs lint-justfile # Run clippy with fixes fix: - cargo clippy --fix --allow-dirty --allow-staged + @{{ mise_exec }} cargo clippy --fix --allow-dirty --allow-staged # Quick development check check: pre-commit-run lint pre-commit-run: - uv run pre-commit run -a + @{{ mise_exec }} pre-commit run -a # Format a single file (for pre-commit hooks) format-files +FILES: - npx prettier --write --config .prettierrc.json {{ FILES }} - -megalinter: - cd "{{ root }}" - npx mega-linter-runner --flavor rust + @{{ mise_exec }} prettier --write --config .prettierrc.json {{ FILES }} # ============================================================================= # BUILDING AND TESTING # ============================================================================= build: - @cargo build --workspace + @{{ mise_exec }} cargo build --workspace build-release: - @cargo build --workspace --release + @{{ mise_exec }} cargo build --workspace --release test: - @cargo nextest run --workspace --no-capture + @{{ mise_exec }} cargo nextest run --workspace --no-capture # Test justfile cross-platform functionality [windows] @@ -223,31 +152,31 @@ test-fs: @just rmrf tmp/xfstest test-ci: - cargo nextest run --workspace --no-capture + @{{ mise_exec }} cargo nextest run --workspace --no-capture # Run comprehensive tests (includes performance and security) test-comprehensive: - cargo nextest run --workspace --no-capture --package collector-core + @{{ mise_exec }} cargo nextest run --workspace --no-capture --package collector-core # Run comprehensive tests including ignored/slow tests test-comprehensive-full: - cargo nextest run --workspace --no-capture --package collector-core -- --ignored + @{{ mise_exec }} cargo nextest run --workspace --no-capture --package collector-core -- --ignored # Run all tests including ignored/slow tests across workspace test-all: - cargo nextest run --workspace --no-capture -- --ignored + @{{ mise_exec }} cargo nextest run --workspace --no-capture -- --ignored # Run only fast unit tests test-fast: - cargo nextest run --workspace --no-capture --lib --bins + @{{ mise_exec }} cargo nextest run --workspace --no-capture --lib --bins # Run performance-critical tests test-performance: - cargo nextest run --package collector-core --no-capture --test performance_critical_test + @{{ mise_exec }} cargo nextest run --package collector-core --no-capture --test performance_critical_test # Run security-critical tests test-security: - cargo nextest run --package collector-core --no-capture --test security_critical_test + @{{ mise_exec }} cargo nextest run --package collector-core --no-capture --test security_critical_test # ============================================================================= # BENCHMARKING @@ -255,40 +184,40 @@ test-security: # Run all benchmarks bench: - @cargo bench --workspace + @{{ mise_exec }} cargo bench --workspace # Run specific benchmark suites bench-process: - @cargo bench -p daemoneye-lib --bench process_collection + @{{ mise_exec }} cargo bench -p daemoneye-lib --bench process_collection bench-database: - @cargo bench -p daemoneye-lib --bench database_operations + @{{ mise_exec }} cargo bench -p daemoneye-lib --bench database_operations bench-detection: - @cargo bench -p daemoneye-lib --bench detection_engine + @{{ mise_exec }} cargo bench -p daemoneye-lib --bench detection_engine bench-ipc: - @cargo bench -p daemoneye-lib --bench ipc_communication + @{{ mise_exec }} cargo bench -p daemoneye-lib --bench ipc_communication bench-ipc-comprehensive: - @cargo bench -p daemoneye-lib --bench ipc_performance_comprehensive + @{{ mise_exec }} cargo bench -p daemoneye-lib --bench ipc_performance_comprehensive bench-ipc-validation: - @cargo bench -p daemoneye-lib --bench ipc_client_validation_benchmarks + @{{ mise_exec }} cargo bench -p daemoneye-lib --bench ipc_client_validation_benchmarks bench-alerts: - @cargo bench -p daemoneye-lib --bench alert_processing + @{{ mise_exec }} cargo bench -p daemoneye-lib --bench alert_processing bench-crypto: - @cargo bench -p daemoneye-lib --bench cryptographic_operations + @{{ mise_exec }} cargo bench -p daemoneye-lib --bench cryptographic_operations # Run benchmarks with HTML output (Criterion generates HTML by default) bench-html: - @cargo bench -p daemoneye-lib + @{{ mise_exec }} cargo bench -p daemoneye-lib # Run benchmarks and save results to benchmark.json bench-save: - @cargo bench -p daemoneye-lib -- --save-baseline baseline + @{{ mise_exec }} cargo bench -p daemoneye-lib -- --save-baseline baseline # ============================================================================= # SECURITY AND AUDITING @@ -296,10 +225,10 @@ bench-save: # Supply-chain security checks audit-deps: - cargo audit + @{{ mise_exec }} cargo audit deny-deps: - cargo deny check + @{{ mise_exec }} cargo deny check # Composed security scan security-scan: audit-deps deny-deps @@ -315,11 +244,11 @@ deny: deny-deps # Generate coverage report coverage: - cargo llvm-cov --workspace --lcov --output-path lcov.info + @{{ mise_exec }} cargo llvm-cov --workspace --lcov --output-path lcov.info # Check coverage thresholds coverage-check: - cargo llvm-cov --workspace --lcov --output-path lcov.info --fail-under-lines 9.7 + @{{ mise_exec }} cargo llvm-cov --workspace --lcov --output-path lcov.info --fail-under-lines 9.7 # Full local CI parity check ci-check: pre-commit-run fmt-check lint-rust lint-rust-min test-ci build-release security-scan coverage-check dist-plan @@ -329,29 +258,29 @@ ci-check: pre-commit-run fmt-check lint-rust lint-rust-min test-ci build-release # ============================================================================= run-procmond *args: - @cargo run -p procmond -- {{ args }} + @{{ mise_exec }} cargo run -p procmond -- {{ args }} run-daemoneye-cli *args: - @cargo run -p daemoneye-cli -- {{ args }} + @{{ mise_exec }} cargo run -p daemoneye-cli -- {{ args }} run-daemoneye-agent *args: - @cargo run -p daemoneye-agent -- {{ args }} + @{{ mise_exec }} cargo run -p daemoneye-agent -- {{ args }} # ============================================================================= # DISTRIBUTION AND PACKAGING # ============================================================================= dist: - @dist build + @{{ mise_exec }} dist build dist-check: - @dist check + @{{ mise_exec }} dist check dist-plan: - @dist plan + @{{ mise_exec }} dist plan install: - @cargo install --path . + @{{ mise_exec }} cargo install --path . # ============================================================================= # GORELEASER TESTING @@ -359,12 +288,12 @@ install: # Test GoReleaser configuration goreleaser-check: - @goreleaser check + @{{ mise_exec }} goreleaser check # Build binaries locally with GoReleaser (test build process) [windows] goreleaser-build: - @goreleaser build --clean + @{{ mise_exec }} goreleaser build --clean [unix] goreleaser-build: @@ -380,12 +309,12 @@ goreleaser-build: # Ensure the system linker sees the correct syslibroot and frameworks export RUSTFLAGS="${RUSTFLAGS:-} -C link-arg=-Wl,-syslibroot,${SDKROOT_PATH} -C link-arg=-F${SDKROOT_PATH}/System/Library/Frameworks" fi - goreleaser build --clean + @{{ mise_exec }} goreleaser build --clean # Run snapshot release (test full pipeline without publishing) [windows] goreleaser-snapshot: - @goreleaser release --snapshot --clean + @{{ mise_exec }} goreleaser release --snapshot --clean [unix] goreleaser-snapshot: @@ -401,12 +330,12 @@ goreleaser-snapshot: # Ensure the system linker sees the correct syslibroot and frameworks export RUSTFLAGS="${RUSTFLAGS:-} -C link-arg=-Wl,-syslibroot,${SDKROOT_PATH} -C link-arg=-F${SDKROOT_PATH}/System/Library/Frameworks" fi - goreleaser release --snapshot --clean + @{{ mise_exec }} goreleaser release --snapshot --clean # Test GoReleaser with specific target [windows] goreleaser-build-target target: - @goreleaser build --clean --single-target {{ target }} + @{{ mise_exec }} goreleaser build --clean --single-target {{ target }} [unix] goreleaser-build-target target: @@ -422,7 +351,7 @@ goreleaser-build-target target: # Ensure the system linker sees the correct syslibroot and frameworks export RUSTFLAGS="${RUSTFLAGS:-} -C link-arg=-Wl,-syslibroot,${SDKROOT_PATH} -C link-arg=-F${SDKROOT_PATH}/System/Library/Frameworks" fi - goreleaser build --clean --single-target {{ target }} + @{{ mise_exec }} goreleaser build --clean --single-target {{ target }} # Clean GoReleaser artifacts goreleaser-clean: @@ -443,7 +372,7 @@ goreleaser-test-macos: set -euo pipefail if [[ "$OSTYPE" == "darwin"* ]]; then echo "🍎 Testing macOS configuration..." - goreleaser build --config .goreleaser-macos.yaml --snapshot --clean + {{ mise_exec }} goreleaser build --config .goreleaser-macos.yaml --snapshot --clean echo "✅ macOS build successful" else echo "⚠️ Skipping macOS test (not on macOS)" @@ -460,7 +389,7 @@ goreleaser-test-linux: set -euo pipefail if [[ "$OSTYPE" == "linux-gnu"* ]]; then echo "🐧 Testing Linux configuration..." - goreleaser build --config .goreleaser-linux.yaml --snapshot --clean + {{ mise_exec }} goreleaser build --config .goreleaser-linux.yaml --snapshot --clean echo "✅ Linux build successful" else echo "⚠️ Skipping Linux test (not on Linux)" @@ -470,7 +399,7 @@ goreleaser-test-linux: [windows] goreleaser-test-windows: @echo "🪟 Testing Windows configuration..." - @goreleaser build --config .goreleaser-windows.yaml --snapshot --clean + @{{ mise_exec }} goreleaser build --config .goreleaser-windows.yaml --snapshot --clean @echo "✅ Windows build successful" [unix] @@ -483,7 +412,7 @@ goreleaser-test-all: goreleaser-test-macos goreleaser-test-linux goreleaser-test # Test specific platform configuration goreleaser-test-platform platform: - @goreleaser build --config .goreleaser-{{ platform }}.yaml --snapshot --clean + @{{ mise_exec }} goreleaser build --config .goreleaser-{{ platform }}.yaml --snapshot --clean @echo "✅ {{ platform }} build successful" # ============================================================================= @@ -491,16 +420,16 @@ goreleaser-test-platform platform: # ============================================================================= release: - @cargo release + @{{ mise_exec }} cargo release release-dry-run: - @cargo release --dry-run + @{{ mise_exec }} cargo release --dry-run release-patch: - @cargo release patch + @{{ mise_exec }} cargo release patch release-minor: - @cargo release minor + @{{ mise_exec }} cargo release minor release-major: - @cargo release major + @{{ mise_exec }} cargo release major diff --git a/mise.toml b/mise.toml new file mode 100644 index 0000000..087db2a --- /dev/null +++ b/mise.toml @@ -0,0 +1,30 @@ +[tools] +cargo-binstall = "1.17.3" +cargo-insta = "1.46.1" +"cargo:cargo-audit" = "0.22.0" +"cargo:cargo-deny" = "0.19.0" +"cargo:cargo-dist" = "0.30.3" +"cargo:cargo-llvm-cov" = "0.6.24" +"cargo:cargo-nextest" = "0.9.123-b.4" +"cargo:mdbook" = "0.5.2" +"cargo:mdbook-linkcheck" = "0.7.7" +"cargo:mdbook-tabs" = "0.3.4" +"cargo:mdbook-mermaid" = "0.17.0" +"cargo:mdbook-toc" = "0.15.3" +"cargo:mdbook-admonish" = "1.20.0" +"cargo:mdbook-open-on-gh" = "3.0.0" +"cargo:mdbook-i18n-helpers" = "0.4.0" +just = "latest" +python = "3.13.11" +rust = { version = "1.91.0", components = "llvm-tools,cargo,rustfmt,clippy", profile = "default" } +"cargo:cargo-outdated" = "0.17.0" +"cargo:cargo-release" = "0.25.22" +"cargo:cargo-auditable" = "0.7.2" +"cargo:cargo-cyclonedx" = "0.5.7" +"pipx:mdformat" = { version = "0.7.21", uvx_args = "--with mdformat-gfm --with mdformat-frontmatter --with mdformat-footnote --with mdformat-simple-breaks --with mdformat-gfm-alerts --with mdformat-toc --with mdformat-wikilink --with mdformat-tables" } +prettier = "3.8.1" +actionlint = "1.7.10" +lychee = "0.22.0" +markdownlint-cli2 = "0.20.0" +protobuf = "33.4" +pre-commit = "4.5.1" diff --git a/procmond/Cargo.toml b/procmond/Cargo.toml index 6ea25b5..77c7c8c 100644 --- a/procmond/Cargo.toml +++ b/procmond/Cargo.toml @@ -32,39 +32,25 @@ path = "src/main.rs" # Core dependencies anyhow = { workspace = true } async-trait = { workspace = true } -bincode = { workspace = true } -bytes = { workspace = true } +postcard = { workspace = true } chrono = { workspace = true } clap = { workspace = true } +crc32c = { workspace = true } # Internal libraries collector-core = { workspace = true } -crc32c = { workspace = true } +daemoneye-eventbus = { workspace = true } daemoneye-lib = { workspace = true } -# IPC communication -interprocess = { workspace = true } - -# Protocol Buffers -prost = { workspace = true } -prost-types = { workspace = true } - -# Database -redb = { workspace = true } serde = { workspace = true } serde_json = { workspace = true } -# Cryptographic hashing -sha2 = { workspace = true } - # System information sysinfo = { workspace = true } thiserror = { workspace = true } tokio = { workspace = true } tracing = { workspace = true } tracing-subscriber = { workspace = true } - -# UUID generation uuid = { workspace = true } # Platform-specific dependencies @@ -108,6 +94,5 @@ uzers = { workspace = true } name = "process_collector_benchmarks" harness = false -[lints.rust] -unsafe_code = "forbid" -warnings = "deny" +[lints] +workspace = true diff --git a/procmond/benches/process_collector_benchmarks.rs b/procmond/benches/process_collector_benchmarks.rs index 9aa1ab4..e9ab1f9 100644 --- a/procmond/benches/process_collector_benchmarks.rs +++ b/procmond/benches/process_collector_benchmarks.rs @@ -1,9 +1,44 @@ -//! Criterion benchmarks for ProcessCollector implementations. +//! Criterion benchmarks for `ProcessCollector` implementations. //! -//! This benchmark suite measures performance characteristics of all ProcessCollector +//! This benchmark suite measures performance characteristics of all `ProcessCollector` //! implementations under various load conditions, including high process counts //! (10,000+ processes) to establish baseline performance metrics. +#![allow( + clippy::doc_markdown, + clippy::unreadable_literal, + clippy::expect_used, + clippy::unwrap_used, + clippy::str_to_string, + clippy::arithmetic_side_effects, + clippy::missing_const_for_fn, + clippy::uninlined_format_args, + clippy::print_stdout, + clippy::map_unwrap_or, + clippy::non_ascii_literal, + clippy::use_debug, + clippy::shadow_reuse, + clippy::shadow_unrelated, + clippy::needless_pass_by_value, + clippy::redundant_clone, + clippy::as_conversions, + clippy::panic, + clippy::option_if_let_else, + clippy::wildcard_enum_match_arm, + clippy::large_enum_variant, + clippy::integer_division, + clippy::clone_on_ref_ptr, + clippy::unused_self, + clippy::modulo_arithmetic, + clippy::explicit_iter_loop, + clippy::semicolon_if_nothing_returned, + clippy::missing_assert_message, + clippy::pattern_type_mismatch, + clippy::significant_drop_tightening, + clippy::significant_drop_in_scrutinee, + clippy::if_not_else +)] + use async_trait::async_trait; use collector_core::ProcessEvent; use criterion::{BenchmarkId, Criterion, Throughput, criterion_group, criterion_main}; diff --git a/procmond/examples/process_collector_usage.rs b/procmond/examples/process_collector_usage.rs index dfc656e..06b996a 100644 --- a/procmond/examples/process_collector_usage.rs +++ b/procmond/examples/process_collector_usage.rs @@ -1,8 +1,23 @@ -//! Example demonstrating the ProcessCollector trait usage in ProcessMessageHandler. +//! Example demonstrating the `ProcessCollector` trait usage in `ProcessMessageHandler`. //! -//! This example shows how the refactored ProcessMessageHandler uses the ProcessCollector +//! This example shows how the refactored `ProcessMessageHandler` uses the `ProcessCollector` //! trait for platform-agnostic process enumeration with proper error handling. +#![allow( + clippy::doc_markdown, + clippy::expect_used, + clippy::unwrap_used, + clippy::str_to_string, + clippy::missing_const_for_fn, + clippy::print_stdout, + clippy::use_debug, + clippy::uninlined_format_args, + clippy::arithmetic_side_effects, + clippy::shadow_reuse, + clippy::shadow_unrelated, + clippy::panic +)] + use async_trait::async_trait; use collector_core::ProcessEvent; use daemoneye_lib::{ diff --git a/procmond/src/event_bus_connector.rs b/procmond/src/event_bus_connector.rs new file mode 100644 index 0000000..bef50f1 --- /dev/null +++ b/procmond/src/event_bus_connector.rs @@ -0,0 +1,1354 @@ +//! EventBus connector for reliable event delivery with WAL integration. +//! +//! This module provides the [`EventBusConnector`] component that integrates the +//! Write-Ahead Log (WAL) with the daemoneye-eventbus for reliable, crash-recoverable +//! event delivery from procmond to daemoneye-agent. +//! +//! # Overview +//! +//! The connector implements a durable publishing pattern: +//! 1. Events are first written to the WAL for durability +//! 2. If connected, events are published to the broker +//! 3. On successful publish, WAL entries are marked as published +//! 4. If disconnected, events are buffered in memory (up to 10MB) +//! 5. On reconnection, WAL is replayed to recover unpublished events +//! +//! # Connection Configuration +//! +//! The connector reads the broker socket path from the `DAEMONEYE_BROKER_SOCKET` +//! environment variable. If not set, connection attempts will fail with +//! [`EventBusConnectorError::EnvNotSet`]. +//! +//! # Topic Mapping +//! +//! Process events are published to topic hierarchy under `events.process`: +//! - [`ProcessEventType::Start`] -> `events.process.start` +//! - [`ProcessEventType::Stop`] -> `events.process.stop` +//! - [`ProcessEventType::Modify`] -> `events.process.modify` +//! +//! # Examples +//! +//! ```rust,no_run +//! use procmond::event_bus_connector::{EventBusConnector, ProcessEventType}; +//! use collector_core::event::ProcessEvent; +//! use std::path::PathBuf; +//! use std::time::SystemTime; +//! +//! # async fn example() -> Result<(), Box> { +//! // Create connector with WAL directory +//! let mut connector = EventBusConnector::new(PathBuf::from("/var/lib/procmond/wal")).await?; +//! +//! // Connect to broker +//! connector.connect().await?; +//! +//! // Publish a process start event +//! let event = ProcessEvent { +//! pid: 1234, +//! ppid: Some(1), +//! name: "example".to_string(), +//! executable_path: Some("/usr/bin/example".to_string()), +//! command_line: vec!["example".to_string(), "--flag".to_string()], +//! start_time: Some(SystemTime::now()), +//! cpu_usage: None, +//! memory_usage: None, +//! executable_hash: None, +//! user_id: Some("1000".to_string()), +//! accessible: true, +//! file_exists: true, +//! timestamp: SystemTime::now(), +//! platform_metadata: None, +//! }; +//! +//! let sequence = connector.publish(event, ProcessEventType::Start).await?; +//! println!("Published event with sequence: {}", sequence); +//! +//! // Graceful shutdown +//! connector.shutdown().await?; +//! # Ok(()) +//! # } +//! ``` + +use crate::wal::{WalError, WriteAheadLog}; +use collector_core::event::ProcessEvent; +use daemoneye_eventbus::{ + ClientConfig, CollectionEvent as EventBusCollectionEvent, EventBusClient, + ProcessEvent as EventBusProcessEvent, SocketConfig, +}; +use std::collections::VecDeque; +use std::path::PathBuf; +use thiserror::Error; +use tokio::sync::mpsc; +use tracing::{debug, error, info, warn}; + +/// Environment variable name for broker socket path. +const BROKER_SOCKET_ENV: &str = "DAEMONEYE_BROKER_SOCKET"; + +/// Maximum buffer size in bytes (10MB). +const MAX_BUFFER_SIZE: usize = 10 * 1024 * 1024; + +/// Default Windows named pipe name. +const DEFAULT_WINDOWS_PIPE: &str = r"\\.\pipe\DaemonEye-broker"; + +/// Errors that can occur during event bus connector operations. +#[derive(Debug, Error)] +#[non_exhaustive] +pub enum EventBusConnectorError { + /// WAL operation failed. + #[error("WAL error: {0}")] + Wal(#[from] WalError), + + /// EventBus operation failed. + #[error("EventBus error: {0}")] + EventBus(String), + + /// Connection to broker failed. + #[error("Connection failed: {0}")] + Connection(String), + + /// Buffer has reached capacity. + #[error("Buffer overflow: buffer is at capacity")] + BufferOverflow, + + /// Required environment variable is not set. + #[error("Environment variable not set: {0}")] + EnvNotSet(String), + + /// Serialization failed. + #[error("Serialization error: {0}")] + Serialization(String), +} + +/// Result type for event bus connector operations. +pub type EventBusConnectorResult = Result; + +/// Backpressure signal indicating buffer pressure state. +/// +/// These signals are emitted when the buffer crosses threshold levels, +/// allowing upstream producers to adjust their event generation rate. +#[derive(Debug, Clone, Copy, PartialEq, Eq)] +#[non_exhaustive] +pub enum BackpressureSignal { + /// Buffer has reached high-water mark (70% full). + /// Upstream should slow down event production. + Activated, + + /// Buffer has dropped below low-water mark (50% full). + /// Normal event production can resume. + Released, +} + +/// Type of process event for topic routing. +/// +/// Determines which topic the event will be published to. +#[derive(Debug, Clone, Copy, PartialEq, Eq)] +#[non_exhaustive] +pub enum ProcessEventType { + /// Process started - published to `events.process.start`. + Start, + /// Process stopped - published to `events.process.stop`. + Stop, + /// Process modified (e.g., name change) - published to `events.process.modify`. + Modify, +} + +impl ProcessEventType { + /// Get the topic string for this event type. + const fn topic(self) -> &'static str { + match self { + Self::Start => "events.process.start", + Self::Stop => "events.process.stop", + Self::Modify => "events.process.modify", + } + } + + /// Get the event type as a short string for WAL persistence. + const fn to_type_string(self) -> &'static str { + match self { + Self::Start => "start", + Self::Stop => "stop", + Self::Modify => "modify", + } + } + + /// Parse event type from a string stored in WAL. + /// + /// Returns `Start` as a default for unknown or legacy entries. + fn from_type_string(s: &str) -> Self { + match s { + "start" => Self::Start, + "stop" => Self::Stop, + "modify" => Self::Modify, + _ => { + warn!(event_type = s, "Unknown event type, defaulting to Start"); + Self::Start + } + } + } +} + +/// An event buffered in memory when disconnected from the broker. +#[derive(Debug)] +struct BufferedEvent { + /// WAL sequence number for this event. + sequence: u64, + /// The process event to publish. + event: ProcessEvent, + /// Topic to publish to. + topic: String, + /// Estimated size in bytes for buffer accounting. + size_bytes: usize, +} + +impl BufferedEvent { + /// Create a new buffered event with size estimation. + fn new(sequence: u64, event: ProcessEvent, topic: String) -> Self { + // Estimate size based on event fields + let size_bytes = Self::estimate_size(&event, &topic); + Self { + sequence, + event, + topic, + size_bytes, + } + } + + /// Estimate the serialized size of an event. + fn estimate_size(event: &ProcessEvent, topic: &str) -> usize { + // Base overhead for struct fields + let mut size = 64_usize; + + // Add string lengths + size = size.saturating_add(event.name.len()); + if let Some(ref path) = event.executable_path { + size = size.saturating_add(path.len()); + } + for arg in &event.command_line { + size = size.saturating_add(arg.len()); + } + if let Some(ref hash) = event.executable_hash { + size = size.saturating_add(hash.len()); + } + if let Some(ref uid) = event.user_id { + size = size.saturating_add(uid.len()); + } + if let Some(ref meta) = event.platform_metadata { + // Rough estimate for JSON metadata + size = size.saturating_add(meta.to_string().len()); + } + size = size.saturating_add(topic.len()); + + size + } +} + +/// Connector for publishing events to daemoneye-agent's broker with WAL-backed durability. +/// +/// The `EventBusConnector` provides reliable event delivery by integrating the +/// Write-Ahead Log for persistence with the EventBus client for network transport. +/// Events are guaranteed to be delivered at least once, even across process restarts. +/// +/// # Architecture +/// +/// ```text +/// ProcessEvent -> WAL (disk) -> EventBusClient -> Broker +/// | ^ +/// | | +/// v | +/// BufferedEvent ------+ +/// (memory) (reconnect) +/// ``` +/// +/// # Thread Safety +/// +/// This struct is designed for single-threaded async usage. The WAL uses internal +/// mutexes for thread safety, but the EventBusClient and buffer are not thread-safe. +pub struct EventBusConnector { + /// Write-ahead log for event persistence. + wal: WriteAheadLog, + + /// EventBus client for broker communication (None when disconnected). + client: Option, + + /// In-memory buffer for events when disconnected. + buffer: VecDeque, + + /// Current total size of buffered events in bytes. + buffer_size_bytes: usize, + + /// Maximum buffer size in bytes (10MB). + max_buffer_size: usize, + + /// Whether currently connected to the broker. + connected: bool, + + /// Channel for sending backpressure signals. + backpressure_tx: mpsc::Sender, + + /// Template receiver for backpressure signals (taken once). + backpressure_rx_template: Option>, + + /// Client ID for identification with the broker. + client_id: String, + + /// Socket configuration for reconnection. + socket_config: Option, + + /// Number of consecutive reconnection failures. + reconnect_attempts: u32, + + /// Last reconnection attempt time (for backoff). + last_reconnect_attempt: Option, +} + +impl EventBusConnector { + /// Create a new EventBusConnector with WAL at the specified directory. + /// + /// This creates the WAL directory if it doesn't exist and initializes + /// the connector in a disconnected state. Call [`connect()`](Self::connect) + /// to establish connection to the broker. + /// + /// # Arguments + /// + /// * `wal_dir` - Directory path for WAL files + /// + /// # Returns + /// + /// A new `EventBusConnector` instance ready for connection + /// + /// # Errors + /// + /// Returns `EventBusConnectorError::Wal` if WAL initialization fails + /// + /// # Examples + /// + /// ```rust,no_run + /// use procmond::event_bus_connector::EventBusConnector; + /// use std::path::PathBuf; + /// + /// # async fn example() -> Result<(), Box> { + /// let connector = EventBusConnector::new(PathBuf::from("/var/lib/procmond/wal")).await?; + /// # Ok(()) + /// # } + /// ``` + pub async fn new(wal_dir: PathBuf) -> EventBusConnectorResult { + info!(wal_dir = ?wal_dir, "Initializing EventBusConnector"); + + let wal = WriteAheadLog::new(wal_dir).await?; + + // Create backpressure channel with small buffer for signals + let (backpressure_tx, backpressure_rx) = mpsc::channel(16); + + // Generate unique client ID + let client_id = format!("procmond-{}", uuid::Uuid::new_v4()); + + Ok(Self { + wal, + client: None, + buffer: VecDeque::new(), + buffer_size_bytes: 0, + max_buffer_size: MAX_BUFFER_SIZE, + connected: false, + backpressure_tx, + backpressure_rx_template: Some(backpressure_rx), + client_id, + socket_config: None, + reconnect_attempts: 0, + last_reconnect_attempt: None, + }) + } + + /// Connect to the daemoneye-agent broker. + /// + /// Reads the broker socket path from the `DAEMONEYE_BROKER_SOCKET` environment + /// variable and establishes a connection. If already connected, this is a no-op. + /// + /// After successful connection, any events in the WAL that were not yet published + /// should be replayed using [`replay_wal()`](Self::replay_wal). + /// + /// # Errors + /// + /// - `EventBusConnectorError::EnvNotSet` if `DAEMONEYE_BROKER_SOCKET` is not set + /// - `EventBusConnectorError::Connection` if connection to broker fails + /// + /// # Examples + /// + /// ```rust,no_run + /// use procmond::event_bus_connector::EventBusConnector; + /// use std::path::PathBuf; + /// + /// # async fn example() -> Result<(), Box> { + /// // SAFETY: Single-threaded example before any concurrent operations + /// unsafe { std::env::set_var("DAEMONEYE_BROKER_SOCKET", "/tmp/daemoneye-broker.sock") }; + /// let mut connector = EventBusConnector::new(PathBuf::from("/tmp/wal")).await?; + /// connector.connect().await?; + /// assert!(connector.is_connected()); + /// # Ok(()) + /// # } + /// ``` + pub async fn connect(&mut self) -> EventBusConnectorResult<()> { + if self.connected { + debug!("Already connected to broker"); + return Ok(()); + } + + // Get socket path from environment + let socket_path = std::env::var(BROKER_SOCKET_ENV) + .map_err(|e| EventBusConnectorError::EnvNotSet(format!("{BROKER_SOCKET_ENV}: {e}")))?; + + info!(socket_path = %socket_path, client_id = %self.client_id, "Connecting to broker"); + + // Create socket configuration + let socket_config = SocketConfig { + unix_path: socket_path.clone(), + windows_pipe: DEFAULT_WINDOWS_PIPE.to_owned(), + connection_limit: 1, + #[cfg(target_os = "freebsd")] + freebsd_path: None, + auth_token: None, + per_client_byte_limit: MAX_BUFFER_SIZE, + rate_limit_config: None, + }; + + // Store config for potential reconnection + self.socket_config = Some(socket_config.clone()); + + // Create client configuration with reasonable defaults for procmond + let client_config = ClientConfig { + max_reconnect_attempts: 5, + connection_timeout: std::time::Duration::from_secs(10), + health_check_interval: std::time::Duration::from_secs(30), + health_check_timeout: std::time::Duration::from_secs(5), + ..ClientConfig::default() + }; + + // Attempt to connect + let client = EventBusClient::new(self.client_id.clone(), socket_config, client_config) + .await + .map_err(|e| EventBusConnectorError::Connection(e.to_string()))?; + + self.client = Some(client); + self.connected = true; + self.reconnect_attempts = 0; + self.last_reconnect_attempt = None; + + info!(client_id = %self.client_id, "Connected to broker successfully"); + + Ok(()) + } + + /// Attempt to reconnect to the broker with exponential backoff. + /// + /// This method is called automatically when publishing detects a disconnection. + /// It uses exponential backoff to avoid overwhelming the broker during outages. + /// + /// # Returns + /// + /// - `Ok(true)` if reconnection succeeded + /// - `Ok(false)` if reconnection was skipped due to backoff + /// - `Err` if reconnection was attempted but failed + /// + /// # Backoff Strategy + /// + /// - Initial delay: 100ms + /// - Maximum delay: 30 seconds + /// - Multiplier: 2x per attempt + /// - Jitter: ±10% + async fn try_reconnect(&mut self) -> EventBusConnectorResult { + const MIN_BACKOFF_MS: u64 = 100; + const MAX_BACKOFF_MS: u64 = 30_000; + const BACKOFF_MULTIPLIER: u32 = 2; + + // Check if we have socket config (required for reconnection) + if self.socket_config.is_none() { + debug!("Cannot reconnect: no socket config stored"); + return Ok(false); + } + + // Calculate backoff delay + let base_delay_ms = MIN_BACKOFF_MS.saturating_mul( + BACKOFF_MULTIPLIER + .saturating_pow(self.reconnect_attempts) + .into(), + ); + let delay_ms = base_delay_ms.min(MAX_BACKOFF_MS); + + // Check if enough time has passed since last attempt + if let Some(last_attempt) = self.last_reconnect_attempt { + let elapsed = last_attempt.elapsed(); + if elapsed.as_millis() < u128::from(delay_ms) { + // Safe: elapsed_ms is capped at delay_ms which fits in u64 + let elapsed_ms = u64::try_from(elapsed.as_millis()).unwrap_or(u64::MAX); + debug!( + delay_remaining_ms = delay_ms.saturating_sub(elapsed_ms), + "Reconnection skipped due to backoff" + ); + return Ok(false); + } + } + + // Update attempt tracking + self.last_reconnect_attempt = Some(std::time::Instant::now()); + self.reconnect_attempts = self.reconnect_attempts.saturating_add(1); + + info!( + attempt = self.reconnect_attempts, + delay_ms = delay_ms, + "Attempting reconnection to broker" + ); + + // Attempt reconnection using stored config + // Safe: we checked socket_config.is_none() above and returned early + let Some(socket_config) = self.socket_config.clone() else { + // Should never reach here due to early return above + return Ok(false); + }; + let client_config = ClientConfig { + max_reconnect_attempts: 5, + connection_timeout: std::time::Duration::from_secs(10), + health_check_interval: std::time::Duration::from_secs(30), + health_check_timeout: std::time::Duration::from_secs(5), + ..ClientConfig::default() + }; + + match EventBusClient::new(self.client_id.clone(), socket_config, client_config).await { + Ok(client) => { + self.client = Some(client); + self.connected = true; + self.reconnect_attempts = 0; + self.last_reconnect_attempt = None; + + info!(client_id = %self.client_id, "Reconnected to broker successfully"); + + // Replay WAL after reconnection + if let Err(e) = self.replay_wal().await { + warn!(error = %e, "Failed to replay WAL after reconnection"); + } + + Ok(true) + } + Err(e) => { + warn!( + attempt = self.reconnect_attempts, + error = %e, + "Reconnection attempt failed" + ); + Err(EventBusConnectorError::Connection(format!( + "Reconnection failed (attempt {}): {}", + self.reconnect_attempts, e + ))) + } + } + } + + /// Publish a process event with durability guarantees. + /// + /// This method implements the following flow: + /// 1. Write event to WAL (durability guarantee) + /// 2. If connected: publish to broker via EventBusClient + /// - On success: mark WAL entry as published + /// 3. If disconnected: add to in-memory buffer + /// 4. Check buffer level for backpressure signals + /// + /// # Arguments + /// + /// * `event` - Process event to publish + /// * `event_type` - Type of event for topic routing + /// + /// # Returns + /// + /// The WAL sequence number assigned to this event + /// + /// # Errors + /// + /// - `EventBusConnectorError::Wal` if WAL write fails + /// - `EventBusConnectorError::BufferOverflow` if disconnected and buffer is full + /// - `EventBusConnectorError::EventBus` if publish fails (event is still buffered) + /// + /// # Examples + /// + /// ```rust,no_run + /// use procmond::event_bus_connector::{EventBusConnector, ProcessEventType}; + /// use collector_core::event::ProcessEvent; + /// use std::path::PathBuf; + /// use std::time::SystemTime; + /// + /// # async fn example() -> Result<(), Box> { + /// let mut connector = EventBusConnector::new(PathBuf::from("/tmp/wal")).await?; + /// connector.connect().await?; + /// + /// let event = ProcessEvent { + /// pid: 1234, + /// ppid: None, + /// name: "test".to_string(), + /// executable_path: None, + /// command_line: vec![], + /// start_time: None, + /// cpu_usage: None, + /// memory_usage: None, + /// executable_hash: None, + /// user_id: None, + /// accessible: true, + /// file_exists: true, + /// timestamp: SystemTime::now(), + /// platform_metadata: None, + /// }; + /// + /// let sequence = connector.publish(event, ProcessEventType::Start).await?; + /// # Ok(()) + /// # } + /// ``` + pub async fn publish( + &mut self, + event: ProcessEvent, + event_type: ProcessEventType, + ) -> EventBusConnectorResult { + let topic = event_type.topic().to_owned(); + let event_type_str = event_type.to_type_string().to_owned(); + + // Step 1: Write to WAL for durability (with event type for replay) + let sequence = self + .wal + .write_with_type(event.clone(), event_type_str) + .await?; + + debug!( + sequence = sequence, + topic = %topic, + pid = event.pid, + "Event written to WAL" + ); + + // Step 2: Ensure connected (attempt reconnection if needed) + if !self.connected + && let Err(e) = self.try_reconnect().await + { + debug!( + error = %e, + "Reconnection attempt failed, will buffer event" + ); + } + + // Step 3: Try to publish or buffer + if self.connected { + self.try_publish_or_buffer(sequence, event, topic).await?; + } else { + self.buffer_event(sequence, event, topic)?; + } + + Ok(sequence) + } + + /// Attempt to publish an event to the broker, buffering on failure. + /// + /// If publish succeeds, marks the event as published in WAL. + /// If publish fails, disconnects and buffers the event. + async fn try_publish_or_buffer( + &mut self, + sequence: u64, + event: ProcessEvent, + topic: String, + ) -> EventBusConnectorResult<()> { + match self.publish_to_broker(&event, &topic).await { + Ok(()) => { + // Successfully published - mark as published in WAL + self.mark_published_with_warning(sequence).await; + Ok(()) + } + Err(e) => { + // Publish failed - disconnect and buffer + warn!( + sequence = sequence, + error = %e, + "Failed to publish event, buffering" + ); + self.connected = false; + self.buffer_event(sequence, event, topic) + } + } + } + + /// Mark an event as published in WAL, logging warnings on failure. + /// + /// Failures are non-fatal since WAL cleanup will happen eventually. + async fn mark_published_with_warning(&self, sequence: u64) { + if let Err(e) = self.wal.mark_published(sequence).await { + warn!( + sequence = sequence, + error = %e, + "Failed to mark event as published in WAL" + ); + } + } + + /// Take ownership of the backpressure signal receiver. + /// + /// This can only be called once. Subsequent calls return `None`. + /// The receiver should be monitored by upstream producers to implement + /// backpressure handling. + /// + /// # Returns + /// + /// The backpressure signal receiver, or `None` if already taken + /// + /// # Examples + /// + /// ```rust,no_run + /// use procmond::event_bus_connector::{EventBusConnector, BackpressureSignal}; + /// use std::path::PathBuf; + /// + /// # async fn example() -> Result<(), Box> { + /// let mut connector = EventBusConnector::new(PathBuf::from("/tmp/wal")).await?; + /// let mut bp_rx = connector.take_backpressure_receiver() + /// .expect("First call should succeed"); + /// + /// // Monitor in a separate task + /// tokio::spawn(async move { + /// while let Some(signal) = bp_rx.recv().await { + /// match signal { + /// BackpressureSignal::Activated => println!("Slow down!"), + /// BackpressureSignal::Released => println!("Resume normal rate"), + /// _ => {} // Handle future variants + /// } + /// } + /// }); + /// + /// // Second call returns None + /// assert!(connector.take_backpressure_receiver().is_none()); + /// # Ok(()) + /// # } + /// ``` + #[allow(clippy::missing_const_for_fn)] // take() is not const + pub fn take_backpressure_receiver(&mut self) -> Option> { + self.backpressure_rx_template.take() + } + + /// Replay unpublished events from the WAL after reconnection. + /// + /// This should be called after a successful [`connect()`](Self::connect) + /// following a disconnection or restart. It reads all events from the WAL + /// and attempts to publish those that haven't been marked as published. + /// + /// # Returns + /// + /// The number of events successfully replayed + /// + /// # Errors + /// + /// Returns `EventBusConnectorError::Wal` if WAL replay fails + /// + /// # Examples + /// + /// ```rust,no_run + /// use procmond::event_bus_connector::EventBusConnector; + /// use std::path::PathBuf; + /// + /// # async fn example() -> Result<(), Box> { + /// let mut connector = EventBusConnector::new(PathBuf::from("/tmp/wal")).await?; + /// connector.connect().await?; + /// + /// // Replay any events from previous run + /// let replayed = connector.replay_wal().await?; + /// println!("Replayed {} events from WAL", replayed); + /// # Ok(()) + /// # } + /// ``` + pub async fn replay_wal(&mut self) -> EventBusConnectorResult { + info!("Starting WAL replay"); + + // Use replay_entries to get full entries with sequences and event types + let entries = self.wal.replay_entries().await?; + let total = entries.len(); + + if total == 0 { + info!("No events to replay from WAL"); + return Ok(0); + } + + info!(event_count = total, "Replaying events from WAL"); + + let mut replayed = 0_usize; + let mut last_successful_sequence = 0_u64; + + for entry in entries { + // Get the topic from the stored event type, or default to Start for legacy entries + let event_type = entry + .event_type + .as_ref() + .map_or(ProcessEventType::Start, |s| { + ProcessEventType::from_type_string(s) + }); + let topic = event_type.topic(); + + if self.connected { + match self.publish_to_broker(&entry.event, topic).await { + Ok(()) => { + replayed = replayed.saturating_add(1); + // Track the actual WAL sequence for proper cleanup + last_successful_sequence = entry.sequence; + } + Err(e) => { + warn!( + sequence = entry.sequence, + error = %e, + "Failed to replay event, stopping replay" + ); + self.connected = false; + // Buffer this event and remaining ones + let buffered_event = + BufferedEvent::new(entry.sequence, entry.event, topic.to_owned()); + if self.add_to_buffer(buffered_event).is_err() { + warn!("Buffer full during WAL replay, some events may be lost"); + } + break; + } + } + } else { + // Lost connection during replay - buffer remaining events + let buffered_event = + BufferedEvent::new(entry.sequence, entry.event, topic.to_owned()); + if self.add_to_buffer(buffered_event).is_err() { + warn!("Buffer full during WAL replay, some events may be lost"); + break; + } + } + } + + // Mark replayed events as published in WAL using actual sequence numbers + if last_successful_sequence > 0 + && let Err(e) = self.wal.mark_published(last_successful_sequence).await + { + warn!( + sequence = last_successful_sequence, + error = %e, + "Failed to mark replayed events as published" + ); + } + + // Also flush the in-memory buffer + let buffer_flushed = self.flush_buffer().await; + + info!( + wal_replayed = replayed, + buffer_flushed = buffer_flushed, + "WAL replay completed" + ); + + Ok(replayed.saturating_add(buffer_flushed)) + } + + /// Gracefully shutdown the connector. + /// + /// This attempts to flush any buffered events before closing the connection. + /// The WAL is not affected and can be replayed on next startup. + /// + /// # Note + /// + /// Client shutdown errors are logged but not propagated, as shutdown is + /// best-effort. The connector will be marked as disconnected regardless + /// of whether the underlying client shutdown succeeds. + /// + /// # Examples + /// + /// ```rust,no_run + /// use procmond::event_bus_connector::EventBusConnector; + /// use std::path::PathBuf; + /// + /// # async fn example() -> Result<(), Box> { + /// let mut connector = EventBusConnector::new(PathBuf::from("/tmp/wal")).await?; + /// connector.connect().await?; + /// + /// // ... use connector ... + /// + /// connector.shutdown().await?; + /// # Ok(()) + /// # } + /// ``` + pub async fn shutdown(&mut self) -> EventBusConnectorResult<()> { + info!("Shutting down EventBusConnector"); + + // Try to flush buffer before shutdown + if self.connected { + let flushed = self.flush_buffer().await; + debug!(flushed = flushed, "Flushed buffer before shutdown"); + } + + // Close the client connection + if let Some(client) = self.client.take() + && let Err(e) = client.shutdown().await + { + error!(error = %e, "Error during client shutdown"); + } + + self.connected = false; + + info!( + buffered_events = self.buffer.len(), + buffer_bytes = self.buffer_size_bytes, + "EventBusConnector shutdown complete" + ); + + Ok(()) + } + + /// Check if currently connected to the broker. + /// + /// Note that this reflects the last known connection state. The actual + /// connection may have been lost since the last operation. + /// + /// # Returns + /// + /// `true` if connected, `false` otherwise + pub const fn is_connected(&self) -> bool { + self.connected + } + + /// Get the current buffer usage as a percentage (0-100). + /// + /// This can be used for monitoring and alerting on buffer pressure. + /// + /// # Returns + /// + /// Buffer usage percentage (0-100) + #[allow(clippy::arithmetic_side_effects)] // Division by non-zero is safe + #[allow(clippy::integer_division)] // Integer precision is acceptable for percentage + pub fn buffer_usage_percent(&self) -> u8 { + if self.max_buffer_size == 0 { + return 100; + } + + let usage = self.buffer_size_bytes.saturating_mul(100) / self.max_buffer_size; + + // Clamp to u8 range + #[allow(clippy::as_conversions)] + // Safe: result is 0-100 after division by max_buffer_size + { + usage.min(100) as u8 + } + } + + /// Get current buffer size in bytes. + pub const fn buffer_size_bytes(&self) -> usize { + self.buffer_size_bytes + } + + /// Get number of buffered events. + pub fn buffered_event_count(&self) -> usize { + self.buffer.len() + } + + // === Private Helper Methods === + + /// Publish an event to the broker. + async fn publish_to_broker( + &self, + event: &ProcessEvent, + topic: &str, + ) -> EventBusConnectorResult<()> { + let client = self.client.as_ref().ok_or_else(|| { + EventBusConnectorError::Connection("Not connected to broker".to_owned()) + })?; + + // Convert collector_core::ProcessEvent to eventbus ProcessEvent + let eventbus_event = Self::convert_to_eventbus_event(event); + let collection_event = EventBusCollectionEvent::Process(eventbus_event); + + // Generate correlation ID + let correlation_id = uuid::Uuid::new_v4().to_string(); + + client + .publish(topic, collection_event, Some(correlation_id)) + .await + .map_err(|e| EventBusConnectorError::EventBus(e.to_string()))?; + + debug!(topic = %topic, pid = event.pid, "Event published to broker"); + + Ok(()) + } + + /// Convert collector_core ProcessEvent to eventbus ProcessEvent. + fn convert_to_eventbus_event(event: &ProcessEvent) -> EventBusProcessEvent { + use std::collections::HashMap; + + EventBusProcessEvent { + pid: event.pid, + name: event.name.clone(), + command_line: event.command_line.join(" ").into(), + executable_path: event.executable_path.clone(), + ppid: event.ppid, + start_time: event.start_time, + metadata: HashMap::new(), + } + } + + /// Buffer an event when disconnected. + fn buffer_event( + &mut self, + sequence: u64, + event: ProcessEvent, + topic: String, + ) -> EventBusConnectorResult<()> { + let buffered = BufferedEvent::new(sequence, event, topic); + self.add_to_buffer(buffered) + } + + /// Add a buffered event to the queue with overflow protection. + fn add_to_buffer(&mut self, event: BufferedEvent) -> EventBusConnectorResult<()> { + // Check if adding would exceed max buffer size + let new_size = self.buffer_size_bytes.saturating_add(event.size_bytes); + if new_size > self.max_buffer_size { + error!( + current_size = self.buffer_size_bytes, + event_size = event.size_bytes, + max_size = self.max_buffer_size, + "Buffer overflow - rejecting event" + ); + return Err(EventBusConnectorError::BufferOverflow); + } + + // Track previous usage for backpressure detection + let previous_usage = self.buffer_usage_percent(); + + // Add to buffer + self.buffer_size_bytes = new_size; + self.buffer.push_back(event); + + // Check for backpressure threshold crossing + let current_usage = self.buffer_usage_percent(); + self.check_backpressure(previous_usage, current_usage); + + debug!( + buffered_events = self.buffer.len(), + buffer_bytes = self.buffer_size_bytes, + usage_percent = current_usage, + "Event buffered" + ); + + Ok(()) + } + + /// Flush the in-memory buffer to the broker. + async fn flush_buffer(&mut self) -> usize { + if !self.connected || self.buffer.is_empty() { + return 0; + } + + let mut flushed = 0_usize; + let previous_usage = self.buffer_usage_percent(); + + while let Some(buffered) = self.buffer.pop_front() { + match self + .publish_to_broker(&buffered.event, &buffered.topic) + .await + { + Ok(()) => { + self.buffer_size_bytes = + self.buffer_size_bytes.saturating_sub(buffered.size_bytes); + flushed = flushed.saturating_add(1); + + // Mark as published in WAL + if let Err(e) = self.wal.mark_published(buffered.sequence).await { + warn!( + sequence = buffered.sequence, + error = %e, + "Failed to mark buffered event as published" + ); + } + } + Err(e) => { + // Put event back and stop flushing + warn!(error = %e, "Failed to flush buffered event"); + self.buffer.push_front(buffered); + self.connected = false; + break; + } + } + } + + // Check for backpressure release + let current_usage = self.buffer_usage_percent(); + self.check_backpressure(previous_usage, current_usage); + + flushed + } + + /// Check and emit backpressure signals based on buffer usage. + /// + /// Signals are best-effort - if the receiver is dropped or the channel is full, + /// the failure is logged at debug level and processing continues. + fn check_backpressure(&self, previous_usage: u8, current_usage: u8) { + const HIGH_WATER_MARK: u8 = 70; + const LOW_WATER_MARK: u8 = 50; + + // Check for activation (crossing above high water mark) + if previous_usage < HIGH_WATER_MARK && current_usage >= HIGH_WATER_MARK { + if let Err(e) = self.backpressure_tx.try_send(BackpressureSignal::Activated) { + debug!( + error = %e, + usage = current_usage, + "Failed to send backpressure activation signal (receiver may be dropped)" + ); + } + info!(usage = current_usage, "Backpressure activated"); + } + + // Check for release (crossing below low water mark) + if previous_usage >= LOW_WATER_MARK && current_usage < LOW_WATER_MARK { + if let Err(e) = self.backpressure_tx.try_send(BackpressureSignal::Released) { + debug!( + error = %e, + usage = current_usage, + "Failed to send backpressure release signal (receiver may be dropped)" + ); + } + info!(usage = current_usage, "Backpressure released"); + } + } +} + +#[cfg(test)] +#[allow( + clippy::expect_used, + clippy::unwrap_used, + clippy::panic, + clippy::indexing_slicing, + clippy::str_to_string, + clippy::arithmetic_side_effects, + clippy::wildcard_enum_match_arm, + clippy::equatable_if_let, + clippy::integer_division, + clippy::as_conversions +)] +mod tests { + use super::*; + use std::time::SystemTime; + use tempfile::TempDir; + + /// Create a test process event with specified PID. + fn create_test_event(pid: u32) -> ProcessEvent { + ProcessEvent { + pid, + ppid: Some(1), + name: format!("test_process_{pid}"), + executable_path: Some(format!("/usr/bin/test_{pid}")), + command_line: vec!["test".to_owned(), "--arg".to_owned()], + start_time: Some(SystemTime::now()), + cpu_usage: Some(5.0), + memory_usage: Some(1024 * 1024), + executable_hash: Some("abc123".to_owned()), + user_id: Some("1000".to_owned()), + accessible: true, + file_exists: true, + timestamp: SystemTime::now(), + platform_metadata: None, + } + } + + #[tokio::test] + async fn test_connector_creation() { + let temp_dir = TempDir::new().expect("Failed to create temp dir"); + let connector = EventBusConnector::new(temp_dir.path().to_path_buf()) + .await + .expect("Failed to create connector"); + + assert!(!connector.is_connected()); + assert_eq!(connector.buffer_usage_percent(), 0); + assert_eq!(connector.buffered_event_count(), 0); + } + + #[tokio::test] + async fn test_connect_fails_when_env_not_set() { + // This test verifies behavior when DAEMONEYE_BROKER_SOCKET is not set. + // We check by looking up the env var - if it's not set, we expect EnvNotSet. + // If it IS set (e.g., in CI), we expect a different error (Connection). + + let temp_dir = TempDir::new().expect("Failed to create temp dir"); + let mut connector = EventBusConnector::new(temp_dir.path().to_path_buf()) + .await + .expect("Failed to create connector"); + + let result = connector.connect().await; + + // Connect should fail either because env var is not set or because + // there's no broker listening + assert!(result.is_err()); + + match result.unwrap_err() { + EventBusConnectorError::EnvNotSet(var) => { + // Expected when env var is not set + assert!(var.contains(BROKER_SOCKET_ENV)); + } + EventBusConnectorError::Connection(_) => { + // Expected when env var IS set but no broker is running + // This is also a valid test outcome + } + other => panic!("Expected EnvNotSet or Connection error, got: {other:?}"), + } + } + + #[tokio::test] + async fn test_publish_while_disconnected() { + let temp_dir = TempDir::new().expect("Failed to create temp dir"); + let mut connector = EventBusConnector::new(temp_dir.path().to_path_buf()) + .await + .expect("Failed to create connector"); + + let event = create_test_event(1234); + let result = connector.publish(event, ProcessEventType::Start).await; + + // Should succeed by writing to WAL and buffering + assert!(result.is_ok()); + let sequence = result.unwrap(); + assert_eq!(sequence, 1); + + // Event should be buffered + assert_eq!(connector.buffered_event_count(), 1); + // Buffer size should be non-zero (percentage may round to 0 for small events + // relative to 10MB max buffer) + assert!(connector.buffer_size_bytes() > 0); + } + + #[tokio::test] + async fn test_buffer_overflow_protection() { + let temp_dir = TempDir::new().expect("Failed to create temp dir"); + let mut connector = EventBusConnector::new(temp_dir.path().to_path_buf()) + .await + .expect("Failed to create connector"); + + // Set a very small buffer for testing + connector.max_buffer_size = 500; + + // First event should succeed + let event1 = create_test_event(1); + let result1 = connector.publish(event1, ProcessEventType::Start).await; + assert!(result1.is_ok()); + + // Keep adding events until overflow + let mut overflow_occurred = false; + for i in 2..=100 { + let event = create_test_event(i); + if let Err(EventBusConnectorError::BufferOverflow) = + connector.publish(event, ProcessEventType::Start).await + { + overflow_occurred = true; + break; + } + } + + assert!(overflow_occurred, "Buffer overflow should have occurred"); + } + + #[tokio::test] + async fn test_backpressure_receiver() { + let temp_dir = TempDir::new().expect("Failed to create temp dir"); + let mut connector = EventBusConnector::new(temp_dir.path().to_path_buf()) + .await + .expect("Failed to create connector"); + + // First call should succeed + let rx = connector.take_backpressure_receiver(); + assert!(rx.is_some()); + + // Second call should return None + let rx2 = connector.take_backpressure_receiver(); + assert!(rx2.is_none()); + } + + #[tokio::test] + async fn test_process_event_type_topics() { + assert_eq!(ProcessEventType::Start.topic(), "events.process.start"); + assert_eq!(ProcessEventType::Stop.topic(), "events.process.stop"); + assert_eq!(ProcessEventType::Modify.topic(), "events.process.modify"); + } + + #[tokio::test] + async fn test_buffered_event_size_estimation() { + let event = create_test_event(1234); + let topic = "events.process.start".to_owned(); + let buffered = BufferedEvent::new(1, event, topic); + + // Size should be reasonable (not zero, not huge) + assert!(buffered.size_bytes > 50); + assert!(buffered.size_bytes < 10000); + } + + #[tokio::test] + async fn test_buffer_usage_calculation() { + let temp_dir = TempDir::new().expect("Failed to create temp dir"); + let mut connector = EventBusConnector::new(temp_dir.path().to_path_buf()) + .await + .expect("Failed to create connector"); + + assert_eq!(connector.buffer_usage_percent(), 0); + + // Set small buffer for predictable testing + connector.max_buffer_size = 1000; + + // Add event to buffer directly for testing + let event = create_test_event(1); + let buffered = BufferedEvent::new(1, event, "test".to_owned()); + let event_size = buffered.size_bytes; + connector.buffer.push_back(buffered); + connector.buffer_size_bytes = event_size; + + let usage = connector.buffer_usage_percent(); + // Usage should be event_size * 100 / 1000 + let expected = (event_size * 100 / 1000).min(100); + assert_eq!(usage, expected as u8); + } + + #[tokio::test] + async fn test_shutdown_while_disconnected() { + let temp_dir = TempDir::new().expect("Failed to create temp dir"); + let mut connector = EventBusConnector::new(temp_dir.path().to_path_buf()) + .await + .expect("Failed to create connector"); + + // Should succeed even when not connected + let result = connector.shutdown().await; + assert!(result.is_ok()); + assert!(!connector.is_connected()); + } + + #[tokio::test] + async fn test_event_conversion() { + let event = create_test_event(1234); + let eventbus_event = EventBusConnector::convert_to_eventbus_event(&event); + + assert_eq!(eventbus_event.pid, 1234); + assert_eq!(eventbus_event.name, "test_process_1234"); + assert_eq!(eventbus_event.ppid, Some(1)); + assert!(eventbus_event.executable_path.is_some()); + } + + #[tokio::test] + async fn test_wal_persistence_across_connector_instances() { + let temp_dir = TempDir::new().expect("Failed to create temp dir"); + let wal_path = temp_dir.path().to_path_buf(); + + // First instance - write some events + { + let mut connector = EventBusConnector::new(wal_path.clone()) + .await + .expect("Failed to create connector"); + + for i in 1..=5 { + let event = create_test_event(i); + connector + .publish(event, ProcessEventType::Start) + .await + .expect("Failed to publish"); + } + } // Connector dropped + + // Second instance - should be able to replay events + { + let connector = EventBusConnector::new(wal_path.clone()) + .await + .expect("Failed to create connector"); + + // WAL should have events from first instance + let events = connector.wal.replay().await.expect("Failed to replay WAL"); + assert_eq!(events.len(), 5); + } + } +} diff --git a/procmond/src/event_source.rs b/procmond/src/event_source.rs index 3e41bf3..31aacde 100644 --- a/procmond/src/event_source.rs +++ b/procmond/src/event_source.rs @@ -24,11 +24,11 @@ struct BatchOutcome { } impl BatchOutcome { - fn new(dead_letters: Vec) -> Self { + const fn new(dead_letters: Vec) -> Self { Self { dead_letters } } - fn empty() -> Self { + const fn empty() -> Self { Self { dead_letters: Vec::new(), } @@ -41,7 +41,7 @@ impl BatchOutcome { type BatchResult = Result)>; -/// Process event source that implements the EventSource trait. +/// Process event source that implements the `EventSource` trait. /// /// This struct wraps the existing `ProcessMessageHandler` and provides a bridge /// between the collector-core framework and the existing process collection logic. @@ -276,7 +276,7 @@ impl ProcessEventSource { /// Creates a new process event source with a custom collector and configuration. /// - /// This method allows for dependency injection of different ProcessCollector + /// This method allows for dependency injection of different `ProcessCollector` /// implementations, enabling platform-specific optimizations and testing. /// /// # Arguments @@ -344,7 +344,7 @@ impl ProcessEventSource { /// /// # Returns /// - /// A boxed ProcessCollector implementation suitable for the current platform. + /// A boxed `ProcessCollector` implementation suitable for the current platform. fn create_platform_collector(config: &ProcessSourceConfig) -> Box { let base_collector_config = ProcessCollectionConfig { collect_enhanced_metadata: config.collect_enhanced_metadata, @@ -424,13 +424,13 @@ impl ProcessEventSource { /// Determines if the current platform is a secondary/minimally supported platform. /// /// Secondary platforms are those that don't have dedicated optimized collectors - /// and should use the FallbackProcessCollector instead of SysinfoProcessCollector. + /// and should use the `FallbackProcessCollector` instead of `SysinfoProcessCollector`. /// This includes BSD variants and other Unix-like systems. /// /// # Returns /// /// `true` if the current platform is considered secondary, `false` otherwise. - fn is_secondary_platform() -> bool { + const fn is_secondary_platform() -> bool { cfg!(any( target_os = "freebsd", target_os = "openbsd", @@ -465,7 +465,7 @@ impl ProcessEventSource { tx: &mpsc::Sender, shutdown_signal: &Arc, ) -> BatchResult { - let timer = PerformanceTimer::start("process_collection".to_string()); + let timer = PerformanceTimer::start("process_collection".to_owned()); let collection_start = Instant::now(); // Check for shutdown before starting collection @@ -487,7 +487,7 @@ impl ProcessEventSource { error!(error = %e, "Process enumeration failed"); self.stats.collection_errors.fetch_add(1, Ordering::Relaxed); return Err(( - anyhow::anyhow!("Process collection failed: {}", e), + anyhow::anyhow!("Process collection failed: {e}"), Vec::new(), )); } @@ -512,8 +512,8 @@ impl ProcessEventSource { // Process events in batches with backpressure handling let mut event_batch = Vec::with_capacity(self.config.event_batch_size); - let mut collected_count = 0; - let mut batch_count = 0; + let mut collected_count: u64 = 0; + let mut batch_count: u64 = 0; let mut dead_letter_events = Vec::new(); for process_event in process_events { @@ -524,7 +524,7 @@ impl ProcessEventSource { } event_batch.push(CollectionEvent::Process(process_event)); - collected_count += 1; + collected_count = collected_count.saturating_add(1); // Send batch when it's full if event_batch.len() >= self.config.event_batch_size { @@ -539,7 +539,7 @@ impl ProcessEventSource { return Err((e, dead_letter_events)); } } - batch_count += 1; + batch_count = batch_count.saturating_add(1); } } @@ -556,38 +556,41 @@ impl ProcessEventSource { return Err((e, dead_letter_events)); } } - batch_count += 1; + batch_count = batch_count.saturating_add(1); } // Update statistics let collection_duration = collection_start.elapsed(); self.stats.collection_cycles.fetch_add(1, Ordering::Relaxed); + #[allow(clippy::as_conversions)] // Safe: usize to u64 won't overflow on 64-bit systems self.stats.processes_collected.fetch_add( collection_stats.successful_collections as u64, Ordering::Relaxed, ); // Update average collection duration - let duration_ms = collection_duration.as_millis() as u64; + let duration_ms = u64::try_from(collection_duration.as_millis()).unwrap_or(u64::MAX); let cycles = self.stats.collection_cycles.load(Ordering::Relaxed); let current_avg = self .stats .avg_collection_duration_ms .load(Ordering::Relaxed); + #[allow(clippy::integer_division, clippy::arithmetic_side_effects)] + // Intentional: running average calculation let new_avg = if cycles == 1 { duration_ms } else { - (current_avg * (cycles - 1) + duration_ms) / cycles + (current_avg + .saturating_mul(cycles.saturating_sub(1)) + .saturating_add(duration_ms)) + / cycles }; self.stats .avg_collection_duration_ms .store(new_avg, Ordering::Relaxed); // Update last collection time - { - let mut last_time = self.stats.last_collection_time.lock().await; - *last_time = Some(Instant::now()); - } + *self.stats.last_collection_time.lock().await = Some(Instant::now()); // Record telemetry let _duration = timer.finish(); @@ -617,9 +620,9 @@ impl ProcessEventSource { let batch_size = event_batch.len(); debug!(batch_size = batch_size, "Sending event batch"); - let mut processed_count = 0; + let mut processed_count: usize = 0; let mut dead_letter_events: Vec = Vec::new(); - let max_retries = 3; + let max_retries: u32 = 3; let retry_delay = Duration::from_millis(10); while !event_batch.is_empty() { @@ -630,7 +633,10 @@ impl ProcessEventSource { } // Peek at the first event without removing it - let event = event_batch[0].clone(); + // Safety: we check !event_batch.is_empty() at the start of the while loop + let Some(event) = event_batch.first().cloned() else { + break; + }; // Capture event details for potential error reporting let event_type = event.event_type(); @@ -642,14 +648,14 @@ impl ProcessEventSource { self.backpressure_semaphore.acquire(), ) .await - .map_err(|_| anyhow::anyhow!("Backpressure timeout exceeded"))? - .map_err(|e| anyhow::anyhow!("Failed to acquire backpressure permit: {}", e))?; + .map_err(|_timeout_err| anyhow::anyhow!("Backpressure timeout exceeded"))? + .map_err(|e| anyhow::anyhow!("Failed to acquire backpressure permit: {e}"))?; // Update in-flight counter self.stats.events_in_flight.fetch_add(1, Ordering::Relaxed); // Attempt to send the event with retry logic - let mut retry_count = 0; + let mut retry_count: u32 = 0; let mut send_successful = false; while retry_count < max_retries && !send_successful { @@ -667,7 +673,7 @@ impl ProcessEventSource { Ok(Ok(())) => { // Event sent successfully send_successful = true; - processed_count += 1; + processed_count = processed_count.saturating_add(1); } Ok(Err(_)) => { warn!("Event channel closed during batch send"); @@ -678,7 +684,7 @@ impl ProcessEventSource { } Err(_) => { // Timeout occurred during send - retry_count += 1; + retry_count = retry_count.saturating_add(1); if retry_count < max_retries { debug!( retry_count = retry_count, @@ -686,6 +692,8 @@ impl ProcessEventSource { "Event send timed out, retrying with backoff" ); // Small backoff before retry + #[allow(clippy::arithmetic_side_effects)] + // Safe: retry_count is bounded by max_retries tokio::time::sleep(retry_delay * retry_count).await; // Check for shutdown after backoff @@ -732,7 +740,6 @@ impl ProcessEventSource { ); break; } - continue; } } @@ -745,13 +752,14 @@ impl ProcessEventSource { Ok(dead_letter_events) } + #[allow(clippy::unused_self)] // Method on struct for future extensibility fn log_dead_letter_events(&self, events: &[CollectionEvent]) { + const MAX_DETAILED_EVENTS: usize = 5; + if events.is_empty() { return; } - const MAX_DETAILED_EVENTS: usize = 5; - warn!( failed = events.len(), "Dead-letter events retained after batch processing" @@ -767,7 +775,7 @@ impl ProcessEventSource { if events.len() > MAX_DETAILED_EVENTS { debug!( - omitted = events.len() - MAX_DETAILED_EVENTS, + omitted = events.len().saturating_sub(MAX_DETAILED_EVENTS), "Additional dead-letter events omitted from warn-level logging" ); } @@ -797,6 +805,9 @@ impl EventSource for ProcessEventSource { tx: mpsc::Sender, shutdown_signal: Arc, ) -> anyhow::Result<()> { + const MAX_CONSECUTIVE_FAILURES: u32 = 5; + const FAILURE_BACKOFF_BASE: Duration = Duration::from_secs(1); + info!( collection_interval_secs = self.config.collection_interval.as_secs(), enhanced_metadata = self.config.collect_enhanced_metadata, @@ -808,9 +819,7 @@ impl EventSource for ProcessEventSource { let mut collection_interval = interval(self.config.collection_interval); #[allow(unused_assignments)] - let mut consecutive_failures = 0u32; - const MAX_CONSECUTIVE_FAILURES: u32 = 5; - const FAILURE_BACKOFF_BASE: Duration = Duration::from_secs(1); + let mut consecutive_failures = 0_u32; // Skip the first tick to avoid immediate collection collection_interval.tick().await; @@ -853,7 +862,7 @@ impl EventSource for ProcessEventSource { } Err((err, dead_letters)) => { self.log_dead_letter_events(&dead_letters); - consecutive_failures += 1; + consecutive_failures = consecutive_failures.saturating_add(1); error!( error = %err, consecutive_failures = consecutive_failures, @@ -869,6 +878,7 @@ impl EventSource for ProcessEventSource { ); // Wait longer before next attempt + #[allow(clippy::arithmetic_side_effects)] // Safe: consecutive_failures is bounded let backoff_duration = FAILURE_BACKOFF_BASE * consecutive_failures; let max_backoff = Duration::from_secs(60); let actual_backoff = std::cmp::min(backoff_duration, max_backoff); @@ -883,7 +893,7 @@ impl EventSource for ProcessEventSource { } } } - _ = async { + () = async { // More responsive shutdown checking while !shutdown_signal.load(Ordering::Relaxed) { tokio::time::sleep(Duration::from_millis(50)).await; @@ -934,7 +944,9 @@ impl EventSource for ProcessEventSource { #[instrument(skip(self), fields(source = "process-monitor"))] async fn health_check(&self) -> anyhow::Result<()> { - let timer = PerformanceTimer::start("health_check".to_string()); + const MAX_ERROR_RATE: f64 = 0.5; // 50% error rate threshold + + let timer = PerformanceTimer::start("health_check".to_owned()); let health_check_start = Instant::now(); // Get current statistics @@ -962,13 +974,13 @@ impl EventSource for ProcessEventSource { )); } } else if stats.collection_cycles > 0 { - health_issues.push("No successful collections recorded".to_string()); + health_issues.push("No successful collections recorded".to_owned()); } // 2. Check error rate if stats.collection_cycles > 0 { + #[allow(clippy::as_conversions)] // Safe: casting u64 to f64 for ratio calculation let error_rate = stats.collection_errors as f64 / stats.collection_cycles as f64; - const MAX_ERROR_RATE: f64 = 0.5; // 50% error rate threshold if error_rate > MAX_ERROR_RATE { health_issues.push(format!( @@ -991,16 +1003,17 @@ impl EventSource for ProcessEventSource { debug!("Health check enumeration successful"); } Ok(Err(e)) => { - health_issues.push(format!("Process collector health check failed: {}", e)); + health_issues.push(format!("Process collector health check failed: {e}")); } Err(_) => { - health_issues.push("Process collector health check timed out".to_string()); + health_issues.push("Process collector health check timed out".to_owned()); } } // 4. Check backpressure semaphore availability let available_permits = self.backpressure_semaphore.available_permits(); let total_permits = self.config.max_events_in_flight; + #[allow(clippy::as_conversions)] // Safe: casting counts to f64 for ratio calculation let permit_usage = 1.0 - (available_permits as f64 / total_permits as f64); if permit_usage > 0.9 { @@ -1029,17 +1042,19 @@ impl EventSource for ProcessEventSource { // Report health status if health_issues.is_empty() { + #[allow(clippy::as_conversions)] // Safe: casting u64 to f64 for percentage + let error_rate_str = if stats.collection_cycles > 0 { + format!( + "{:.1}%", + (stats.collection_errors as f64 / stats.collection_cycles as f64) * 100.0 + ) + } else { + "N/A".to_owned() + }; info!( collection_cycles = stats.collection_cycles, processes_collected = stats.processes_collected, - error_rate = if stats.collection_cycles > 0 { - format!( - "{:.1}%", - (stats.collection_errors as f64 / stats.collection_cycles as f64) * 100.0 - ) - } else { - "N/A".to_string() - }, + error_rate = %error_rate_str, avg_duration_ms = stats.avg_collection_duration_ms, events_in_flight = stats.events_in_flight, available_permits = available_permits, @@ -1058,12 +1073,25 @@ impl EventSource for ProcessEventSource { health_check_duration_ms = health_check_duration.as_millis(), "Process event source health check failed" ); - Err(anyhow::anyhow!("Health check failed: {}", health_summary)) + Err(anyhow::anyhow!("Health check failed: {health_summary}")) } } } #[cfg(test)] +#[allow( + clippy::expect_used, + clippy::unwrap_used, + clippy::str_to_string, + clippy::redundant_clone, + clippy::missing_panics_doc, + clippy::uninlined_format_args, + clippy::semicolon_outside_block, + clippy::shadow_unrelated, + clippy::clone_on_ref_ptr, + clippy::single_match_else, + clippy::match_same_arms +)] mod tests { use super::*; use std::sync::atomic::AtomicBool; diff --git a/procmond/src/lib.rs b/procmond/src/lib.rs index 70731d5..a99508f 100644 --- a/procmond/src/lib.rs +++ b/procmond/src/lib.rs @@ -1,9 +1,12 @@ //! Library module for procmond to enable unit testing +#![allow(clippy::doc_markdown)] // Many docs reference code identifiers without backticks +pub mod event_bus_connector; pub mod event_source; pub mod lifecycle; pub mod monitor_collector; pub mod process_collector; +pub mod wal; #[cfg(target_os = "linux")] pub mod linux_collector; @@ -14,6 +17,10 @@ pub mod macos_collector; #[cfg(target_os = "windows")] pub mod windows_collector; +pub use event_bus_connector::{ + BackpressureSignal, EventBusConnector, EventBusConnectorError, EventBusConnectorResult, + ProcessEventType, +}; pub use event_source::{ProcessEventSource, ProcessSourceConfig}; pub use lifecycle::{ LifecycleTrackingConfig, LifecycleTrackingError, LifecycleTrackingResult, @@ -202,8 +209,10 @@ impl ProcessMessageHandler { ) -> Result { tracing::info!("Received detection task: {}", task.task_id); + #[allow(clippy::as_conversions)] // Necessary for protobuf enum comparison + let enumerate_processes_type = ProtoTaskType::EnumerateProcesses as i32; match task.task_type { - task_type if task_type == ProtoTaskType::EnumerateProcesses as i32 => { + task_type if task_type == enumerate_processes_type => { self.enumerate_processes(&task).await } _ => { @@ -290,15 +299,20 @@ impl ProcessMessageHandler { // Convert ProcessCollectionError to appropriate IPC error let error_message = match e { ProcessCollectionError::SystemEnumerationFailed { message } => { - format!("System enumeration failed: {}", message) + format!("System enumeration failed: {message}") } ProcessCollectionError::CollectionTimeout { timeout_ms } => { - format!("Process collection timed out after {}ms", timeout_ms) + format!("Process collection timed out after {timeout_ms}ms") } ProcessCollectionError::PlatformError { message } => { - format!("Platform-specific error: {}", message) + format!("Platform-specific error: {message}") + } + // Explicitly handle remaining variants + catch-all for new variants + ProcessCollectionError::ProcessAccessDenied { .. } + | ProcessCollectionError::ProcessNotFound { .. } + | ProcessCollectionError::InvalidProcessData { .. } => { + format!("Process collection error: {e}") } - _ => format!("Process collection error: {}", e), }; Ok(DetectionResult::failure(&task.task_id, &error_message)) @@ -354,43 +368,46 @@ impl ProcessMessageHandler { use std::time::UNIX_EPOCH; // Convert SystemTime to timestamp with proper error handling - let start_time = event - .start_time - .and_then(|st| match st.duration_since(UNIX_EPOCH) { - Ok(duration) => Some(duration.as_secs() as i64), - Err(_) => { - tracing::warn!( - pid = event.pid, - name = %event.name, - "Process start time is before Unix epoch, skipping" - ); - None - } - }); - - let collection_time = match event.timestamp.duration_since(UNIX_EPOCH) { - Ok(duration) => { - // Use checked arithmetic to prevent overflow - let millis = duration.as_millis(); - if millis > i64::MAX as u128 { - tracing::warn!( - pid = event.pid, - name = %event.name, - "Collection time overflow, clamping to i64::MAX" - ); - i64::MAX - } else { - millis as i64 - } + let start_time = event.start_time.and_then(|st| { + if let Ok(duration) = st.duration_since(UNIX_EPOCH) { + #[allow(clippy::as_conversions, clippy::cast_possible_wrap)] + // Safe: process start times won't overflow i64 (max year ~292 billion) + Some(duration.as_secs() as i64) + } else { + tracing::warn!( + pid = event.pid, + name = %event.name, + "Process start time is before Unix epoch, skipping" + ); + None } - Err(_) => { + }); + + let collection_time = if let Ok(duration) = event.timestamp.duration_since(UNIX_EPOCH) { + // Use checked arithmetic to prevent overflow + let millis = duration.as_millis(); + #[allow(clippy::as_conversions)] // Safe: i64::MAX is a compile-time constant + let max_millis = i64::MAX as u128; + if millis > max_millis { tracing::warn!( pid = event.pid, name = %event.name, - "Collection time is before Unix epoch, using 0" + "Collection time overflow, clamping to i64::MAX" ); - 0 + i64::MAX + } else { + #[allow(clippy::as_conversions)] // Safe: checked above that millis fits in i64 + { + millis as i64 + } } + } else { + tracing::warn!( + pid = event.pid, + name = %event.name, + "Collection time is before Unix epoch, using 0" + ); + 0 }; // Check if executable hash exists before moving the value @@ -406,11 +423,7 @@ impl ProcessMessageHandler { cpu_usage: event.cpu_usage, memory_usage: event.memory_usage, executable_hash: event.executable_hash, - hash_algorithm: if has_executable_hash { - Some("sha256".to_string()) - } else { - None - }, + hash_algorithm: has_executable_hash.then(|| "sha256".to_owned()), user_id: event.user_id, accessible: event.accessible, file_exists: event.file_exists, @@ -437,7 +450,7 @@ impl ProcessMessageHandler { process: &sysinfo::Process, ) -> ProtoProcessRecord { let pid_u32 = pid.as_u32(); - let ppid = process.parent().map(|p| p.as_u32()); + let ppid = process.parent().map(sysinfo::Pid::as_u32); let name = process.name().to_string_lossy().to_string(); let executable_path = process.exe().map(|path| path.to_string_lossy().to_string()); let command_line = process @@ -445,9 +458,11 @@ impl ProcessMessageHandler { .iter() .map(|s| s.to_string_lossy().to_string()) .collect(); + #[allow(clippy::as_conversions, clippy::cast_possible_wrap)] + // Safe: process start times won't overflow i64 let start_time = Some(process.start_time() as i64); - let cpu_usage = Some(process.cpu_usage() as f64); - let memory_usage = Some(process.memory() * 1024); + let cpu_usage = Some(f64::from(process.cpu_usage())); + let memory_usage = Some(process.memory().saturating_mul(1024)); let executable_hash = None; // Would need file hashing implementation let hash_algorithm = None; let user_id = process.user_id().map(|uid| uid.to_string()); @@ -475,6 +490,17 @@ impl ProcessMessageHandler { } #[cfg(test)] +#[allow( + clippy::expect_used, + clippy::unwrap_used, + clippy::panic, + clippy::uninlined_format_args, + clippy::shadow_unrelated, + clippy::wildcard_enum_match_arm, + clippy::str_to_string, + clippy::arithmetic_side_effects, + clippy::indexing_slicing +)] mod tests { use super::*; use collector_core::ProcessEvent; diff --git a/procmond/src/lifecycle.rs b/procmond/src/lifecycle.rs index 0c18304..1c1577e 100644 --- a/procmond/src/lifecycle.rs +++ b/procmond/src/lifecycle.rs @@ -14,6 +14,7 @@ use tracing::{debug, warn}; /// Errors that can occur during lifecycle tracking. #[derive(Debug, Error)] +#[non_exhaustive] pub enum LifecycleTrackingError { /// Invalid process data provided #[error("Invalid process data for PID {pid}: {message}")] @@ -36,6 +37,7 @@ pub type LifecycleTrackingResult = Result; /// These events represent different types of changes in the process landscape /// that can be detected by comparing process snapshots between enumeration cycles. #[derive(Debug, Clone, Serialize, Deserialize, PartialEq)] +#[non_exhaustive] pub enum ProcessLifecycleEvent { /// A new process has started Start { @@ -82,6 +84,7 @@ pub enum ProcessLifecycleEvent { /// Severity levels for suspicious events. #[derive(Debug, Clone, Serialize, Deserialize, PartialEq, Eq)] +#[non_exhaustive] pub enum SuspiciousEventSeverity { /// Low severity - minor anomaly Low, @@ -187,6 +190,7 @@ impl From for ProcessEvent { /// Configuration for process lifecycle tracking. #[derive(Debug, Clone)] +#[allow(clippy::struct_excessive_bools)] // Configuration struct naturally has multiple boolean options pub struct LifecycleTrackingConfig { /// Maximum age of process snapshots before they're considered stale pub max_snapshot_age: Duration, @@ -304,6 +308,9 @@ pub struct LifecycleTrackingStats { /// Number of timestamp anomalies detected pub timestamp_anomalies: u64, + /// Number of invalid start events that failed validation + pub invalid_start_events: u64, + /// Average number of processes tracked per update pub avg_processes_tracked: f64, } @@ -375,8 +382,10 @@ impl ProcessLifecycleTracker { if self.last_update.is_none() { self.current_snapshots = new_snapshots; self.last_update = Some(update_time); - self.stats.total_updates += 1; - self.stats.avg_processes_tracked = self.current_snapshots.len() as f64; + self.stats.total_updates = self.stats.total_updates.saturating_add(1); + #[allow(clippy::as_conversions)] // Safe: usize to f64 for statistics + let count = self.current_snapshots.len() as f64; + self.stats.avg_processes_tracked = count; return Ok(events); } @@ -407,7 +416,7 @@ impl ProcessLifecycleTracker { } /// Returns current tracking statistics. - pub fn stats(&self) -> &LifecycleTrackingStats { + pub const fn stats(&self) -> &LifecycleTrackingStats { &self.stats } @@ -428,8 +437,9 @@ impl ProcessLifecycleTracker { // Private implementation methods impl ProcessLifecycleTracker { /// Detects process start events by finding PIDs in current but not in previous. + #[allow(clippy::unnecessary_wraps)] // Result type for consistency with other detection methods fn detect_start_events( - &self, + &mut self, update_time: &SystemTime, ) -> LifecycleTrackingResult> { let mut events = Vec::with_capacity(self.current_snapshots.len()); @@ -438,7 +448,9 @@ impl ProcessLifecycleTracker { if !self.previous_snapshots.contains_key(pid) { // Validate that this is a legitimate start event if let Err(e) = self.validate_start_event(snapshot) { - warn!("Invalid start event for PID {}: {}", pid, e); + warn!(pid = pid, error = %e, "Invalid start event, skipping"); + self.stats.invalid_start_events = + self.stats.invalid_start_events.saturating_add(1); continue; } @@ -453,6 +465,7 @@ impl ProcessLifecycleTracker { } /// Detects process stop events by finding PIDs in previous but not in current. + #[allow(clippy::unnecessary_wraps)] // Result type for consistency with other detection methods fn detect_stop_events( &self, update_time: &SystemTime, @@ -462,11 +475,9 @@ impl ProcessLifecycleTracker { for (pid, snapshot) in &self.previous_snapshots { if !self.current_snapshots.contains_key(pid) { // Calculate runtime duration if possible - let runtime_duration = if let Some(start_time) = snapshot.start_time { - update_time.duration_since(start_time).ok() - } else { - None - }; + let runtime_duration = snapshot + .start_time + .and_then(|start_time| update_time.duration_since(start_time).ok()); events.push(ProcessLifecycleEvent::Stop { process: Box::new(snapshot.clone()), @@ -480,6 +491,7 @@ impl ProcessLifecycleTracker { } /// Detects process modification events by comparing snapshots. + #[allow(clippy::unnecessary_wraps)] // Result type for consistency with other detection methods fn detect_modification_events( &self, update_time: &SystemTime, @@ -540,18 +552,21 @@ impl ProcessLifecycleTracker { fn validate_start_event(&self, snapshot: &ProcessSnapshot) -> LifecycleTrackingResult<()> { // Check minimum process lifetime if let Some(start_time) = snapshot.start_time { - let lifetime = snapshot - .snapshot_time - .duration_since(start_time) - .map_err(|_| LifecycleTrackingError::TimestampValidationFailed { - pid: snapshot.pid, - message: "Process start time is in the future".to_string(), - })?; + let lifetime = + snapshot + .snapshot_time + .duration_since(start_time) + .map_err( + |_time_err| LifecycleTrackingError::TimestampValidationFailed { + pid: snapshot.pid, + message: "Process start time is in the future".to_owned(), + }, + )?; if lifetime < self.config.min_process_lifetime { return Err(LifecycleTrackingError::TimestampValidationFailed { pid: snapshot.pid, - message: format!("Process lifetime too short: {:?}", lifetime), + message: format!("Process lifetime too short: {lifetime:?}"), }); } } @@ -569,14 +584,14 @@ impl ProcessLifecycleTracker { // Check command line changes if self.config.track_command_line_changes && previous.command_line != current.command_line { - modified_fields.push("command_line".to_string()); + modified_fields.push("command_line".to_owned()); } // Check executable path changes if self.config.track_executable_changes && previous.executable_path != current.executable_path { - modified_fields.push("executable_path".to_string()); + modified_fields.push("executable_path".to_owned()); } // Check memory usage changes @@ -585,13 +600,14 @@ impl ProcessLifecycleTracker { { if prev_mem == 0 { if curr_mem > 0 { - modified_fields.push("memory_usage".to_string()); + modified_fields.push("memory_usage".to_owned()); } } else { + #[allow(clippy::as_conversions)] // Safe: u64 to f64 for percentage calculation let change_percent = ((curr_mem as f64 - prev_mem as f64) / prev_mem as f64).abs() * 100.0; if change_percent > self.config.memory_change_threshold { - modified_fields.push("memory_usage".to_string()); + modified_fields.push("memory_usage".to_owned()); } } } @@ -602,12 +618,12 @@ impl ProcessLifecycleTracker { { if prev_cpu == 0.0 { if curr_cpu > 0.0 { - modified_fields.push("cpu_usage".to_string()); + modified_fields.push("cpu_usage".to_owned()); } } else { let change_percent = ((curr_cpu - prev_cpu) / prev_cpu).abs() * 100.0; if change_percent > self.config.cpu_change_threshold { - modified_fields.push("cpu_usage".to_string()); + modified_fields.push("cpu_usage".to_owned()); } } } @@ -617,18 +633,19 @@ impl ProcessLifecycleTracker { && previous.executable_hash.is_some() && current.executable_hash.is_some() { - modified_fields.push("executable_hash".to_string()); + modified_fields.push("executable_hash".to_owned()); } // Check user ID changes (potential privilege escalation) if previous.user_id != current.user_id { - modified_fields.push("user_id".to_string()); + modified_fields.push("user_id".to_owned()); } modified_fields } /// Detects PID reuse scenarios. + #[allow(clippy::unnecessary_wraps, clippy::unused_self)] // Kept for API consistency and future use fn detect_pid_reuse( &self, previous: &ProcessSnapshot, @@ -664,6 +681,7 @@ impl ProcessLifecycleTracker { } /// Detects timestamp anomalies. + #[allow(clippy::unused_self)] // Kept for API consistency and future use fn detect_timestamp_anomaly( &self, snapshot: &ProcessSnapshot, @@ -675,18 +693,20 @@ impl ProcessLifecycleTracker { return Ok(Some(ProcessLifecycleEvent::Suspicious { process: Box::new(snapshot.clone()), detected_at: *update_time, - reason: "Process start time is in the future".to_string(), + reason: "Process start time is in the future".to_owned(), severity: SuspiciousEventSeverity::High, })); } // Check if process is impossibly old (more than system uptime would allow) - let age = update_time.duration_since(start_time).map_err(|_| { - LifecycleTrackingError::TimestampValidationFailed { - pid: snapshot.pid, - message: "Failed to calculate process age".to_string(), - } - })?; + let age = update_time + .duration_since(start_time) + .map_err( + |_time_err| LifecycleTrackingError::TimestampValidationFailed { + pid: snapshot.pid, + message: "Failed to calculate process age".to_owned(), + }, + )?; // This is a simple heuristic - in practice, you might want to check actual system uptime if age > Duration::from_secs(365 * 24 * 3600) { @@ -694,7 +714,7 @@ impl ProcessLifecycleTracker { return Ok(Some(ProcessLifecycleEvent::Suspicious { process: Box::new(snapshot.clone()), detected_at: *update_time, - reason: format!("Process age seems unrealistic: {:?}", age), + reason: format!("Process age seems unrealistic: {age:?}"), severity: SuspiciousEventSeverity::Medium, })); } @@ -704,40 +724,67 @@ impl ProcessLifecycleTracker { } /// Updates tracking statistics based on detected events. + #[allow(clippy::pattern_type_mismatch)] // Matching on references is intentional here fn update_statistics(&mut self, events: &[ProcessLifecycleEvent]) { - self.stats.total_updates += 1; + self.stats.total_updates = self.stats.total_updates.saturating_add(1); for event in events { match event { - ProcessLifecycleEvent::Start { .. } => self.stats.start_events += 1, - ProcessLifecycleEvent::Stop { .. } => self.stats.stop_events += 1, - ProcessLifecycleEvent::Modified { .. } => self.stats.modification_events += 1, + ProcessLifecycleEvent::Start { .. } => { + self.stats.start_events = self.stats.start_events.saturating_add(1); + } + ProcessLifecycleEvent::Stop { .. } => { + self.stats.stop_events = self.stats.stop_events.saturating_add(1); + } + ProcessLifecycleEvent::Modified { .. } => { + self.stats.modification_events = + self.stats.modification_events.saturating_add(1); + } ProcessLifecycleEvent::Suspicious { reason, .. } => { - self.stats.suspicious_events += 1; + self.stats.suspicious_events = self.stats.suspicious_events.saturating_add(1); if reason.contains("PID reuse") { - self.stats.pid_reuse_events += 1; + self.stats.pid_reuse_events = self.stats.pid_reuse_events.saturating_add(1); } else if reason.contains("timestamp") || reason.contains("time") { - self.stats.timestamp_anomalies += 1; + self.stats.timestamp_anomalies = + self.stats.timestamp_anomalies.saturating_add(1); } } } } // Update average processes tracked + #[allow(clippy::as_conversions)] // Safe: usize/u64 to f64 for statistics let current_count = self.current_snapshots.len() as f64; + #[allow(clippy::as_conversions)] // Safe: u64 to f64 for statistics let total_updates = self.stats.total_updates as f64; - self.stats.avg_processes_tracked = - (self.stats.avg_processes_tracked * (total_updates - 1.0) + current_count) - / total_updates; + self.stats.avg_processes_tracked = self + .stats + .avg_processes_tracked + .mul_add(total_updates - 1.0, current_count) + / total_updates; } /// Cleans up old snapshots to prevent memory growth. + /// + /// This method enforces the `max_snapshots` limit on the previous snapshot + /// collection. When the current process count exceeds the limit, previous + /// snapshots are cleared to free memory while still tracking all current + /// processes. This design allows tracking all running processes while + /// limiting the memory used for historical comparison data. + /// + /// Note: If the system has more processes than `max_snapshots`, consider + /// increasing the limit to avoid repeated clearing of previous snapshots. fn cleanup_old_snapshots(&mut self) { - if self.current_snapshots.len() > self.config.max_snapshots { + let current_count = self.current_snapshots.len(); + let previous_count = self.previous_snapshots.len(); + + // Clear previous snapshots if either collection exceeds limit + if current_count > self.config.max_snapshots || previous_count > self.config.max_snapshots { warn!( - "Process snapshot count ({}) exceeds maximum ({}), clearing previous snapshots", - self.current_snapshots.len(), - self.config.max_snapshots + "Process snapshot count (current: {}, previous: {}) exceeds maximum ({}), \ + clearing previous snapshots. Consider increasing max_snapshots if this \ + occurs frequently.", + current_count, previous_count, self.config.max_snapshots ); self.previous_snapshots.clear(); } @@ -745,6 +792,17 @@ impl ProcessLifecycleTracker { } #[cfg(test)] +#[allow( + clippy::str_to_string, + clippy::uninlined_format_args, + clippy::arithmetic_side_effects, + clippy::expect_used, + clippy::unwrap_used, + clippy::wildcard_enum_match_arm, + clippy::pattern_type_mismatch, + clippy::indexing_slicing, + clippy::panic +)] mod tests { use super::*; use std::time::{Duration, SystemTime}; diff --git a/procmond/src/linux_collector.rs b/procmond/src/linux_collector.rs index f76dd45..33e5e58 100644 --- a/procmond/src/linux_collector.rs +++ b/procmond/src/linux_collector.rs @@ -10,7 +10,7 @@ use collector_core::ProcessEvent; use serde::Serialize; use std::collections::HashMap; use std::fs; -use std::io::{self}; +use std::io; use std::path::Path; use std::time::SystemTime; use sysinfo::{Process, System}; @@ -24,6 +24,7 @@ use crate::process_collector::{ /// Linux-specific errors that can occur during process collection. #[derive(Debug, Error)] +#[non_exhaustive] pub enum LinuxCollectionError { /// Failed to read /proc filesystem #[error("Failed to read /proc filesystem: {message}")] @@ -140,10 +141,15 @@ pub struct LinuxProcessCollector { has_cap_sys_ptrace: bool, /// Cached host namespace IDs for container detection host_namespaces: ProcessNamespaces, + /// Cached system boot time (seconds since Unix epoch) + boot_time_secs: Option, + /// Clock ticks per second for jiffies conversion + clock_ticks_per_sec: u64, } /// Configuration for Linux-specific process collection features. #[derive(Debug, Clone)] +#[allow(clippy::struct_excessive_bools)] // These are independent feature flags pub struct LinuxCollectorConfig { /// Whether to collect process namespace information pub collect_namespaces: bool, @@ -216,11 +222,17 @@ impl LinuxProcessCollector { // Cache host namespace IDs for container detection let host_namespaces = if linux_config.detect_containers { - Self::read_process_namespaces(1).unwrap_or_default() + Self::read_process_namespaces(1) } else { ProcessNamespaces::default() }; + // Cache system boot time for start time calculations + let boot_time_secs = Self::read_boot_time(); + + // Get clock ticks per second (typically 100 on Linux) + let clock_ticks_per_sec = Self::get_clock_ticks_per_sec(); + debug!( has_cap_sys_ptrace = has_cap_sys_ptrace, collect_namespaces = linux_config.collect_namespaces, @@ -228,6 +240,8 @@ impl LinuxProcessCollector { collect_file_descriptors = linux_config.collect_file_descriptors, collect_network_connections = linux_config.collect_network_connections, detect_containers = linux_config.detect_containers, + boot_time_secs = ?boot_time_secs, + clock_ticks_per_sec = clock_ticks_per_sec, "Initialized Linux process collector" ); @@ -236,6 +250,8 @@ impl LinuxProcessCollector { linux_config, has_cap_sys_ptrace, host_namespaces, + boot_time_secs, + clock_ticks_per_sec, }) } @@ -248,7 +264,7 @@ impl LinuxProcessCollector { let status_path = "/proc/self/status"; let content = fs::read_to_string(status_path).map_err(|e| ProcessCollectionError::PlatformError { - message: format!("Failed to read {}: {}", status_path, e), + message: format!("Failed to read {status_path}: {e}"), })?; // Look for CapEff line and check if CAP_SYS_PTRACE (bit 19) is set @@ -271,18 +287,45 @@ impl LinuxProcessCollector { Ok(false) } + /// Reads system boot time from /proc/stat. + /// + /// Returns the boot time as seconds since the Unix epoch. + fn read_boot_time() -> Option { + let content = fs::read_to_string("/proc/stat").ok()?; + for line in content.lines() { + if let Some(btime_str) = line.strip_prefix("btime ") { + return btime_str.trim().parse().ok(); + } + } + None + } + + /// Gets the system clock ticks per second (CLK_TCK). + /// + /// This is used to convert jiffies (clock ticks) to seconds for process + /// start time calculations. + const fn get_clock_ticks_per_sec() -> u64 { + // On Linux, we can read this from sysconf(_SC_CLK_TCK) + // For Rust, we'll try to parse it from /proc/self/stat timing + // or fall back to the common default of 100 + // + // A more robust approach would use libc::sysconf(libc::_SC_CLK_TCK) + // but we avoid libc dependency here. The value is almost always 100. + 100 + } + /// Reads process namespace information from /proc/\[pid\]/ns/. - fn read_process_namespaces(pid: u32) -> ProcessCollectionResult { - let ns_dir = format!("/proc/{}/ns", pid); + fn read_process_namespaces(pid: u32) -> ProcessNamespaces { + let ns_dir = format!("/proc/{pid}/ns"); let mut namespaces = ProcessNamespaces::default(); // Helper function to read namespace ID from symlink let read_ns_id = |ns_name: &str| -> Option { - let ns_path = format!("{}/{}", ns_dir, ns_name); + let ns_path = format!("{ns_dir}/{ns_name}"); fs::read_link(&ns_path).ok().and_then(|target| { target .to_string_lossy() - .strip_prefix(&format!("{}:[", ns_name)) + .strip_prefix(&format!("{ns_name}:[")) .and_then(|s| s.strip_suffix(']')) .and_then(|s| s.parse().ok()) }) @@ -296,7 +339,7 @@ impl LinuxProcessCollector { namespaces.uts_ns = read_ns_id("uts"); namespaces.cgroup_ns = read_ns_id("cgroup"); - Ok(namespaces) + namespaces } /// Reads enhanced process metadata from /proc/\[pid\]/ files. @@ -305,22 +348,22 @@ impl LinuxProcessCollector { // Read namespaces if configured if self.linux_config.collect_namespaces { - metadata.namespaces = Self::read_process_namespaces(pid).unwrap_or_default(); + metadata.namespaces = Self::read_process_namespaces(pid); } // Read memory maps count if configured if self.linux_config.collect_memory_maps { - metadata.memory_maps_count = self.count_memory_maps(pid); + metadata.memory_maps_count = Self::count_memory_maps(pid); } // Read file descriptors count if configured if self.linux_config.collect_file_descriptors { - metadata.open_fds_count = self.count_file_descriptors(pid); + metadata.open_fds_count = Self::count_file_descriptors(pid); } // Read network connections count if configured if self.linux_config.collect_network_connections { - metadata.network_connections_count = self.count_network_connections(pid); + metadata.network_connections_count = Self::count_network_connections(pid); } // Detect container if configured @@ -329,43 +372,43 @@ impl LinuxProcessCollector { } // Read /proc/\[pid\]/stat for additional metadata - if let Ok(stat_data) = self.read_proc_stat(pid) { + if let Ok(stat_data) = Self::read_proc_stat(pid) { metadata.state = stat_data.get("state").and_then(|s| s.chars().next()); metadata.threads = stat_data.get("num_threads").and_then(|s| s.parse().ok()); } // Read /proc/\[pid\]/status for memory information - if let Ok(status_data) = self.read_proc_status(pid) { + if let Ok(status_data) = Self::read_proc_status(pid) { metadata.vm_size = status_data .get("VmSize") - .and_then(|s| self.parse_memory_kb(s)); + .and_then(|s| Self::parse_memory_kb(s)); metadata.vm_rss = status_data .get("VmRSS") - .and_then(|s| self.parse_memory_kb(s)); + .and_then(|s| Self::parse_memory_kb(s)); metadata.vm_peak = status_data .get("VmPeak") - .and_then(|s| self.parse_memory_kb(s)); + .and_then(|s| Self::parse_memory_kb(s)); } metadata } /// Counts memory maps from /proc/\[pid\]/maps. - fn count_memory_maps(&self, pid: u32) -> Option { - let maps_path = format!("/proc/{}/maps", pid); - fs::read_to_string(&maps_path) + fn count_memory_maps(pid: u32) -> Option { + let maps_path = format!("/proc/{pid}/maps"); + fs::read_to_string(maps_path) .ok() .map(|content| content.lines().count()) } /// Counts open file descriptors from /proc/\[pid\]/fd/. - fn count_file_descriptors(&self, pid: u32) -> Option { - let fd_dir = format!("/proc/{}/fd", pid); - fs::read_dir(&fd_dir).ok().map(|entries| entries.count()) + fn count_file_descriptors(pid: u32) -> Option { + let fd_dir = format!("/proc/{pid}/fd"); + fs::read_dir(fd_dir).ok().map(Iterator::count) } /// Counts network connections for a process (simplified implementation). - fn count_network_connections(&self, _pid: u32) -> Option { + const fn count_network_connections(_pid: u32) -> Option { // This is a simplified implementation. A full implementation would // parse /proc/net/tcp, /proc/net/udp, etc. and match by inode // to file descriptors in /proc/\[pid\]/fd/ @@ -384,79 +427,107 @@ impl LinuxProcessCollector { } // Try to extract container ID from cgroup - let cgroup_path = format!("/proc/{}/cgroup", pid); - if let Ok(content) = fs::read_to_string(&cgroup_path) { + let cgroup_path = format!("/proc/{pid}/cgroup"); + if let Ok(content) = fs::read_to_string(cgroup_path) { for line in content.lines() { // Look for Docker container ID pattern - if let Some(docker_id) = self.extract_docker_id(line) { - return Some(format!("docker:{}", docker_id)); + if let Some(docker_id) = Self::extract_docker_id(line) { + return Some(format!("docker:{docker_id}")); } // Look for containerd container ID pattern - if let Some(containerd_id) = self.extract_containerd_id(line) { - return Some(format!("containerd:{}", containerd_id)); + if let Some(containerd_id) = Self::extract_containerd_id(line) { + return Some(format!("containerd:{containerd_id}")); } } } // Generic container detection - Some("container:unknown".to_string()) + Some("container:unknown".to_owned()) } /// Extracts Docker container ID from cgroup line. - fn extract_docker_id(&self, line: &str) -> Option { + fn extract_docker_id(line: &str) -> Option { // Docker cgroup pattern: /docker/[container_id] if let Some(docker_part) = line.split("/docker/").nth(1) { let container_id = docker_part.split('/').next()?; if container_id.len() >= 12 { - let _id = &container_id[..12]; - return Some(_id.to_string()); + // Container IDs are hex strings (ASCII), so char iteration is safe + return Some(container_id.chars().take(12).collect()); } } None } /// Extracts containerd container ID from cgroup line. - fn extract_containerd_id(&self, line: &str) -> Option { + fn extract_containerd_id(line: &str) -> Option { // containerd cgroup pattern: /system.slice/containerd.service/[container_id] if line.contains("containerd.service") { let parts: Vec<&str> = line.split('/').collect(); if let Some(container_part) = parts.last() && container_part.len() >= 12 { - let _id = &container_part[..12]; - return Some(_id.to_string()); + // Container IDs are hex strings (ASCII), so char iteration is safe + return Some(container_part.chars().take(12).collect()); } } None } /// Reads and parses /proc/\[pid\]/stat file. - fn read_proc_stat(&self, pid: u32) -> io::Result> { - let stat_path = format!("/proc/{}/stat", pid); - let content = fs::read_to_string(&stat_path)?; + /// + /// The stat file format is tricky because the comm field (process name) is + /// enclosed in parentheses and can contain spaces and other special characters. + /// We handle this by finding the last ')' to reliably parse fields after comm. + fn read_proc_stat(pid: u32) -> io::Result> { + let stat_path = format!("/proc/{pid}/stat"); + let content = fs::read_to_string(stat_path)?; let mut data = HashMap::new(); - // Parse stat file (space-separated values) - let fields: Vec<&str> = content.split_whitespace().collect(); - if fields.len() >= 20 { - // Field 3 is state, field 4 is ppid, field 20 is num_threads - data.insert("state".to_string(), fields[2].to_string()); - data.insert("ppid".to_string(), fields[3].to_string()); - data.insert("num_threads".to_string(), fields[19].to_string()); + // The comm field is enclosed in parentheses and can contain spaces/parens. + // Find the last ')' to reliably parse the fields after comm. + // Format: pid (comm) state ppid pgrp session tty_nr tpgid flags ... + // 0 1 2 3 4 5 6 7 8 ... + // Field 22 (0-indexed 21 from the start, but 19 after comm) is starttime. + if let Some(comm_end_idx) = content.rfind(')') { + let Some(after_comm) = content.get(comm_end_idx.saturating_add(1)..) else { + return Ok(data); + }; + let fields: Vec<&str> = after_comm.split_whitespace().collect(); + + // Fields are now: state(0) ppid(1) pgrp(2) session(3) ... starttime(19) ... + if fields.len() >= 20 { + data.insert( + "state".to_owned(), + (*fields.first().unwrap_or(&"")).to_owned(), + ); + data.insert( + "ppid".to_owned(), + (*fields.get(1).unwrap_or(&"")).to_owned(), + ); + data.insert( + "num_threads".to_owned(), + (*fields.get(17).unwrap_or(&"")).to_owned(), + ); + // starttime is field 22 in man proc(5), which is index 19 after comm + data.insert( + "starttime".to_owned(), + (*fields.get(19).unwrap_or(&"")).to_owned(), + ); + } } Ok(data) } /// Reads and parses /proc/\[pid\]/status file. - fn read_proc_status(&self, pid: u32) -> io::Result> { - let status_path = format!("/proc/{}/status", pid); - let content = fs::read_to_string(&status_path)?; + fn read_proc_status(pid: u32) -> io::Result> { + let status_path = format!("/proc/{pid}/status"); + let content = fs::read_to_string(status_path)?; let mut data = HashMap::new(); for line in content.lines() { if let Some((key, value)) = line.split_once(':') { - data.insert(key.trim().to_string(), value.trim().to_string()); + data.insert(key.trim().to_owned(), value.trim().to_owned()); } } @@ -464,7 +535,7 @@ impl LinuxProcessCollector { } /// Parses memory value from /proc/status (e.g., "1024 kB" -> Some(1048576)). - fn parse_memory_kb(&self, value: &str) -> Option { + fn parse_memory_kb(value: &str) -> Option { value .split_whitespace() .next() @@ -472,9 +543,26 @@ impl LinuxProcessCollector { .and_then(|kb| kb.checked_mul(1024)) // Convert KB to bytes with overflow check } + /// Calculates process start time from starttime jiffies. + /// + /// The starttime value from `/proc/\[pid\]/stat` is in clock ticks since system boot. + /// We convert this to an absolute SystemTime using the cached boot time. + fn calculate_start_time(&self, starttime_jiffies: u64) -> Option { + let boot_time_secs = self.boot_time_secs?; + + // Convert jiffies to seconds (with overflow protection) + let starttime_secs = starttime_jiffies.checked_div(self.clock_ticks_per_sec)?; + + // Calculate absolute timestamp (boot time + process start offset) + let absolute_secs = boot_time_secs.checked_add(starttime_secs)?; + + // Convert to SystemTime + SystemTime::UNIX_EPOCH.checked_add(std::time::Duration::from_secs(absolute_secs)) + } + /// Reads basic process information from /proc/\[pid\]/ files. fn read_process_info(&self, pid: u32) -> ProcessCollectionResult { - let proc_dir = format!("/proc/{}", pid); + let proc_dir = format!("/proc/{pid}"); // Check if the process directory exists and is actually a directory let proc_path = Path::new(&proc_dir); @@ -483,60 +571,59 @@ impl LinuxProcessCollector { } // Additional check: try to read /proc/\[pid\]/stat to verify the process exists - let stat_path = format!("{}/stat", proc_dir); + let stat_path = format!("{proc_dir}/stat"); if !Path::new(&stat_path).exists() { return Err(ProcessCollectionError::ProcessNotFound { pid }); } // Read command line - let cmdline_path = format!("{}/cmdline", proc_dir); - let command_line = match fs::read(&cmdline_path) { - Ok(bytes) => { + let cmdline_path = format!("{proc_dir}/cmdline"); + let command_line = fs::read(&cmdline_path).map_or_else( + |_| vec![], + |bytes| { if bytes.is_empty() { vec![] } else { bytes .split(|&b| b == 0) .filter(|arg| !arg.is_empty()) - .map(|arg| String::from_utf8_lossy(arg).to_string()) + .map(|arg| String::from_utf8_lossy(arg).into_owned()) .collect() } - } - Err(_) => vec![], - }; + }, + ); // Read executable path - let exe_path = format!("{}/exe", proc_dir); + let exe_path = format!("{proc_dir}/exe"); let executable_path = fs::read_link(&exe_path) .ok() - .map(|path| path.to_string_lossy().to_string()); + .map(|path| path.to_string_lossy().into_owned()); // Read comm (process name) - let comm_path = format!("{}/comm", proc_dir); + let comm_path = format!("{proc_dir}/comm"); let name = fs::read_to_string(&comm_path) - .unwrap_or_else(|_| format!("", pid)) + .unwrap_or_else(|_| format!("")) .trim() - .to_string(); + .to_owned(); // Read stat for basic info - let stat_data = self.read_proc_stat(pid).unwrap_or_default(); + let stat_data = Self::read_proc_stat(pid).unwrap_or_default(); let ppid = stat_data .get("ppid") .and_then(|s| s.parse::().ok()) .filter(|&p| p != 0); // Read status for additional info - let status_data = self.read_proc_status(pid).unwrap_or_default(); + let status_data = Self::read_proc_status(pid).unwrap_or_default(); let user_id = status_data .get("Uid") - .and_then(|uid_line| uid_line.split_whitespace().next().map(|s| s.to_string())); + .and_then(|uid_line| uid_line.split_whitespace().next().map(ToOwned::to_owned)); // Enhanced metadata collection - let enhanced_metadata = if self.base_config.collect_enhanced_metadata { - Some(self.read_enhanced_metadata(pid)) - } else { - None - }; + let enhanced_metadata = self + .base_config + .collect_enhanced_metadata + .then(|| self.read_enhanced_metadata(pid)); // Calculate CPU and memory usage if enhanced metadata is enabled let (cpu_usage, memory_usage) = if self.base_config.collect_enhanced_metadata { @@ -549,29 +636,22 @@ impl LinuxProcessCollector { (None, None) }; - // Determine start time - let start_time = if self.base_config.collect_enhanced_metadata { - // This would require parsing the start time from /proc/\[pid\]/stat - // and converting from jiffies to SystemTime. For now, we'll use None. - None - } else { - None - }; + // Determine start time from /proc/[pid]/stat starttime jiffies + let start_time: Option = stat_data + .get("starttime") + .and_then(|s| s.parse::().ok()) + .and_then(|jiffies| self.calculate_start_time(jiffies)); // Compute executable hash if requested - let executable_hash = if self.base_config.compute_executable_hashes { - // TODO: Implement executable hashing (issue #40) - None - } else { - None - }; + // TODO: Implement executable hashing (issue #40) + let executable_hash: Option = None; // Serialize enhanced metadata for platform_metadata field let platform_metadata = if self.base_config.collect_enhanced_metadata { enhanced_metadata.and_then(|metadata| { serde_json::to_value(metadata) .map_err(|e| { - warn!("Failed to serialize Linux process metadata: {}", e); + warn!("Failed to serialize Linux process metadata: {e}"); }) .ok() }) @@ -601,13 +681,13 @@ impl LinuxProcessCollector { } /// Enumerates all processes by reading /proc directory. - fn enumerate_proc_pids(&self) -> ProcessCollectionResult> { + fn enumerate_proc_pids() -> ProcessCollectionResult> { let proc_dir = Path::new("/proc"); let mut pids = Vec::new(); let entries = fs::read_dir(proc_dir).map_err(|e| { ProcessCollectionError::SystemEnumerationFailed { - message: format!("Failed to read /proc directory: {}", e), + message: format!("Failed to read /proc directory: {e}"), } })?; @@ -621,7 +701,7 @@ impl LinuxProcessCollector { if pids.is_empty() { return Err(ProcessCollectionError::SystemEnumerationFailed { - message: "No process PIDs found in /proc".to_string(), + message: "No process PIDs found in /proc".to_owned(), }); } @@ -679,15 +759,15 @@ impl ProcessCollector for LinuxProcessCollector { }) .await .map_err(|e| ProcessCollectionError::SystemEnumerationFailed { - message: format!("Process enumeration task failed: {}", e), + message: format!("Process enumeration task failed: {e}"), })?; let mut events = Vec::new(); let mut stats = CollectionStats::default(); - let mut processed_count = 0; + let mut processed_count: usize = 0; // Process each process with individual error handling - for (sysinfo_pid, process) in system.processes().iter() { + for (sysinfo_pid, process) in system.processes() { let pid = sysinfo_pid.as_u32(); // Check if we've hit the maximum process limit @@ -701,49 +781,37 @@ impl ProcessCollector for LinuxProcessCollector { break; } - processed_count += 1; - - match self.convert_sysinfo_to_event(pid, process).await { - Ok(event) => { - // Apply filtering based on configuration - let should_skip = if self.base_config.skip_system_processes - && self.is_system_process(&event.name, pid) - { - true - } else { - self.base_config.skip_kernel_threads - && self.is_kernel_thread(&event.name, &event.command_line) - }; - - if should_skip { - debug!( - pid = pid, - name = %event.name, - "Skipping process due to configuration" - ); - stats.inaccessible_processes += 1; - } else { - events.push(event); - stats.successful_collections += 1; - } - } - Err(ProcessCollectionError::ProcessAccessDenied { pid, message }) => { - debug!(pid = pid, reason = %message, "Process access denied"); - stats.inaccessible_processes += 1; - } - Err(ProcessCollectionError::ProcessNotFound { pid }) => { - debug!(pid = pid, "Process no longer exists"); - stats.inaccessible_processes += 1; - } - Err(e) => { - warn!(pid = pid, error = %e, "Error reading process information"); - stats.invalid_processes += 1; - } + processed_count = processed_count.saturating_add(1); + + // convert_sysinfo_to_event doesn't return errors, so wrap in Ok for match + let event = self.convert_sysinfo_to_event(pid, process); + + // Apply filtering based on configuration + let should_skip = if self.base_config.skip_system_processes + && Self::is_system_process(&event.name, pid) + { + true + } else { + self.base_config.skip_kernel_threads + && Self::is_kernel_thread(&event.name, &event.command_line) + }; + + if should_skip { + debug!( + pid = pid, + name = %event.name, + "Skipping process due to configuration" + ); + stats.inaccessible_processes = stats.inaccessible_processes.saturating_add(1); + } else { + events.push(event); + stats.successful_collections = stats.successful_collections.saturating_add(1); } } stats.total_processes = processed_count; - stats.collection_duration_ms = start_time.elapsed().as_millis() as u64; + stats.collection_duration_ms = + u64::try_from(start_time.elapsed().as_millis()).unwrap_or(u64::MAX); debug!( collector = self.name(), @@ -784,18 +852,16 @@ impl ProcessCollector for LinuxProcessCollector { }) .await .map_err(|e| ProcessCollectionError::SystemEnumerationFailed { - message: format!("Process lookup task failed: {}", e), + message: format!("Process lookup task failed: {e}"), })?; let system = sysinfo_result; let sysinfo_pid = sysinfo::Pid::from_u32(pid); - if let Some(process) = system.process(sysinfo_pid) { - // Convert sysinfo process to ProcessEvent and add Linux-specific enhancements - self.convert_sysinfo_to_event(pid, process).await - } else { - Err(ProcessCollectionError::ProcessNotFound { pid }) - } + system.process(sysinfo_pid).map_or( + Err(ProcessCollectionError::ProcessNotFound { pid }), + |process| Ok(self.convert_sysinfo_to_event(pid, process)), + ) } async fn health_check(&self) -> ProcessCollectionResult<()> { @@ -804,29 +870,29 @@ impl ProcessCollector for LinuxProcessCollector { // Check if /proc is accessible if !Path::new("/proc").exists() { return Err(ProcessCollectionError::SystemEnumerationFailed { - message: "/proc filesystem not available".to_string(), + message: "/proc filesystem not available".to_owned(), }); } // Try to read a few processes - let pids = self.enumerate_proc_pids()?; + let pids = Self::enumerate_proc_pids()?; if pids.is_empty() { return Err(ProcessCollectionError::SystemEnumerationFailed { - message: "No processes found in /proc".to_string(), + message: "No processes found in /proc".to_owned(), }); } // Try to read information for the first few processes - let mut successful_reads = 0; + let mut successful_reads: usize = 0; for &pid in pids.iter().take(5) { if self.read_process_info(pid).is_ok() { - successful_reads += 1; + successful_reads = successful_reads.saturating_add(1); } } if successful_reads == 0 { return Err(ProcessCollectionError::SystemEnumerationFailed { - message: "Could not read any process information".to_string(), + message: "Could not read any process information".to_owned(), }); } @@ -844,16 +910,12 @@ impl ProcessCollector for LinuxProcessCollector { impl LinuxProcessCollector { /// Converts a sysinfo process to a ProcessEvent with Linux-specific enhancements. - async fn convert_sysinfo_to_event( - &self, - pid: u32, - process: &Process, - ) -> ProcessCollectionResult { + fn convert_sysinfo_to_event(&self, pid: u32, process: &Process) -> ProcessEvent { // Get basic information from sysinfo - let ppid = process.parent().map(|p| p.as_u32()); + let ppid = process.parent().map(sysinfo::Pid::as_u32); let name = if process.name().is_empty() { - format!("", pid) + format!("") } else { process.name().to_string_lossy().to_string() }; @@ -873,51 +935,38 @@ impl LinuxProcessCollector { let start = process.start_time(); ( - if cpu.is_finite() && cpu >= 0.0 { - Some(cpu as f64) - } else { - None - }, - if memory > 0 { - Some(memory.saturating_mul(1024)) - } else { - None - }, - if start > 0 { - Some(SystemTime::UNIX_EPOCH + std::time::Duration::from_secs(start)) - } else { - None - }, + (cpu.is_finite() && cpu >= 0.0).then_some(f64::from(cpu)), + (memory > 0).then_some(memory.saturating_mul(1024)), + (start > 0).then(|| { + SystemTime::UNIX_EPOCH + .checked_add(std::time::Duration::from_secs(start)) + .unwrap_or(SystemTime::UNIX_EPOCH) + }), ) } else { (None, None, None) }; // Compute executable hash if requested - let executable_hash = if self.base_config.compute_executable_hashes { - // TODO: Implement executable hashing (issue #40) - None - } else { - None - }; + // TODO: Implement executable hashing (issue #40) + let executable_hash: Option = None; let user_id = process.user_id().map(|uid| uid.to_string()); let accessible = true; let file_exists = executable_path.is_some(); // Add Linux-specific enhancements - let enhanced_metadata = if self.base_config.collect_enhanced_metadata { - Some(self.read_enhanced_metadata(pid)) - } else { - None - }; + let enhanced_metadata = self + .base_config + .collect_enhanced_metadata + .then(|| self.read_enhanced_metadata(pid)); // Serialize enhanced metadata for platform_metadata field let platform_metadata = if self.base_config.collect_enhanced_metadata { enhanced_metadata.and_then(|metadata| { serde_json::to_value(metadata) .map_err(|e| { - warn!("Failed to serialize Linux process metadata: {}", e); + warn!("Failed to serialize Linux process metadata: {e}"); }) .ok() }) @@ -925,7 +974,7 @@ impl LinuxProcessCollector { None }; - Ok(ProcessEvent { + ProcessEvent { pid, ppid, name, @@ -940,11 +989,11 @@ impl LinuxProcessCollector { file_exists, timestamp: SystemTime::now(), platform_metadata, - }) + } } /// Determines if a process is a system process based on name and PID. - fn is_system_process(&self, name: &str, pid: u32) -> bool { + fn is_system_process(name: &str, pid: u32) -> bool { // Common system process patterns const SYSTEM_PROCESSES: &[&str] = &[ "kernel", @@ -972,17 +1021,7 @@ impl LinuxProcessCollector { } /// Determines if a process is a kernel thread. - fn is_kernel_thread(&self, name: &str, command_line: &[String]) -> bool { - // Kernel threads typically have no command line arguments - if !command_line.is_empty() { - return false; - } - - // Kernel threads often have names in brackets - if name.starts_with('[') && name.ends_with(']') { - return true; - } - + fn is_kernel_thread(name: &str, command_line: &[String]) -> bool { // Common kernel thread patterns const KERNEL_THREAD_PATTERNS: &[&str] = &[ "kworker", @@ -996,6 +1035,16 @@ impl LinuxProcessCollector { "kauditd", ]; + // Kernel threads typically have no command line arguments + if !command_line.is_empty() { + return false; + } + + // Kernel threads often have names in brackets + if name.starts_with('[') && name.ends_with(']') { + return true; + } + let name_lower = name.to_lowercase(); KERNEL_THREAD_PATTERNS .iter() @@ -1011,11 +1060,14 @@ impl Clone for LinuxProcessCollector { linux_config: self.linux_config.clone(), has_cap_sys_ptrace: self.has_cap_sys_ptrace, host_namespaces: self.host_namespaces.clone(), + boot_time_secs: self.boot_time_secs, + clock_ticks_per_sec: self.clock_ticks_per_sec, } } } #[cfg(test)] +#[allow(clippy::unwrap_used, clippy::uninlined_format_args)] mod tests { use super::*; use crate::process_collector::ProcessCollectionConfig; @@ -1114,87 +1166,84 @@ mod tests { } // Try to read namespaces for init process (PID 1) - let result = LinuxProcessCollector::read_process_namespaces(1); - // Since this test runs in different environments where permissions and - // namespace availability can vary, we'll just verify the function runs - // without panicking and returns a result - assert!( - result.is_ok() || result.is_err(), - "Function should complete without panicking" - ); + // The function returns ProcessNamespaces directly (not Result) + // Just verify it completes without panicking + let _namespaces = LinuxProcessCollector::read_process_namespaces(1); + // Success - function completed without panicking } #[test] fn test_system_process_detection() { - let base_config = ProcessCollectionConfig::default(); - let linux_config = LinuxCollectorConfig::default(); - let collector = LinuxProcessCollector::new(base_config, linux_config).unwrap(); - - // Test system process detection - assert!(collector.is_system_process("init", 1)); - assert!(collector.is_system_process("kernel", 2)); - assert!(collector.is_system_process("kthreadd", 3)); - assert!(!collector.is_system_process("bash", 1000)); - assert!(!collector.is_system_process("firefox", 2000)); + // Test system process detection (static method) + assert!(LinuxProcessCollector::is_system_process("init", 1)); + assert!(LinuxProcessCollector::is_system_process("kernel", 2)); + assert!(LinuxProcessCollector::is_system_process("kthreadd", 3)); + assert!(!LinuxProcessCollector::is_system_process("bash", 1000)); + assert!(!LinuxProcessCollector::is_system_process("firefox", 2000)); } #[test] fn test_kernel_thread_detection() { - let base_config = ProcessCollectionConfig::default(); - let linux_config = LinuxCollectorConfig::default(); - let collector = LinuxProcessCollector::new(base_config, linux_config).unwrap(); - - // Test kernel thread detection - assert!(collector.is_kernel_thread("[kworker/0:0]", &[])); - assert!(collector.is_kernel_thread("ksoftirqd/0", &[])); - assert!(!collector.is_kernel_thread("bash", &["/bin/bash".to_string()])); - assert!(!collector.is_kernel_thread("kworker", &["some".to_string(), "args".to_string()])); + // Test kernel thread detection (static method) + assert!(LinuxProcessCollector::is_kernel_thread( + "[kworker/0:0]", + &[] + )); + assert!(LinuxProcessCollector::is_kernel_thread("ksoftirqd/0", &[])); + assert!(!LinuxProcessCollector::is_kernel_thread( + "bash", + &["/bin/bash".to_owned()] + )); + assert!(!LinuxProcessCollector::is_kernel_thread( + "kworker", + &["some".to_owned(), "args".to_owned()] + )); } #[test] fn test_memory_parsing() { - let base_config = ProcessCollectionConfig::default(); - let linux_config = LinuxCollectorConfig::default(); - let collector = LinuxProcessCollector::new(base_config, linux_config).unwrap(); - - // Test memory parsing - assert_eq!(collector.parse_memory_kb("1024 kB"), Some(1048576)); - assert_eq!(collector.parse_memory_kb("512 kB"), Some(524288)); - assert_eq!(collector.parse_memory_kb("0 kB"), Some(0)); - assert_eq!(collector.parse_memory_kb("invalid"), None); + // Test memory parsing (static method) + assert_eq!( + LinuxProcessCollector::parse_memory_kb("1024 kB"), + Some(1_048_576) + ); + assert_eq!( + LinuxProcessCollector::parse_memory_kb("512 kB"), + Some(524_288) + ); + assert_eq!(LinuxProcessCollector::parse_memory_kb("0 kB"), Some(0)); + assert_eq!(LinuxProcessCollector::parse_memory_kb("invalid"), None); } #[test] fn test_docker_id_extraction() { - let base_config = ProcessCollectionConfig::default(); - let linux_config = LinuxCollectorConfig::default(); - let collector = LinuxProcessCollector::new(base_config, linux_config).unwrap(); - - // Test Docker ID extraction + // Test Docker ID extraction (static method) let docker_line = "1:name=systemd:/docker/1234567890ab"; assert_eq!( - collector.extract_docker_id(docker_line), - Some("1234567890ab".to_string()) + LinuxProcessCollector::extract_docker_id(docker_line), + Some("1234567890ab".to_owned()) ); let non_docker_line = "1:name=systemd:/system.slice/ssh.service"; - assert_eq!(collector.extract_docker_id(non_docker_line), None); + assert_eq!( + LinuxProcessCollector::extract_docker_id(non_docker_line), + None + ); } #[test] fn test_containerd_id_extraction() { - let base_config = ProcessCollectionConfig::default(); - let linux_config = LinuxCollectorConfig::default(); - let collector = LinuxProcessCollector::new(base_config, linux_config).unwrap(); - - // Test containerd ID extraction + // Test containerd ID extraction (static method) let containerd_line = "1:name=systemd:/system.slice/containerd.service/1234567890ab"; assert_eq!( - collector.extract_containerd_id(containerd_line), - Some("1234567890ab".to_string()) + LinuxProcessCollector::extract_containerd_id(containerd_line), + Some("1234567890ab".to_owned()) ); let non_containerd_line = "1:name=systemd:/system.slice/ssh.service"; - assert_eq!(collector.extract_containerd_id(non_containerd_line), None); + assert_eq!( + LinuxProcessCollector::extract_containerd_id(non_containerd_line), + None + ); } } diff --git a/procmond/src/macos_collector.rs b/procmond/src/macos_collector.rs index a091e90..5dd8a68 100644 --- a/procmond/src/macos_collector.rs +++ b/procmond/src/macos_collector.rs @@ -1,3 +1,7 @@ +// Module-level clippy allows for patterns common in this platform-specific collector +#![allow(clippy::unnecessary_wraps)] // Many methods return Result for future error paths +#![allow(clippy::unused_self)] // Methods may use self in future enhancements + //! Enhanced macOS-specific process collector using third-party crates. //! //! This module provides a macOS-optimized process collector that uses well-maintained @@ -37,6 +41,7 @@ use sysinfo::{Pid, ProcessesToUpdate, System}; /// macOS-specific errors that can occur during process collection. #[derive(Debug, Error)] +#[non_exhaustive] pub enum MacOSCollectionError { /// Security framework error #[error("Security framework error: {0}")] @@ -57,7 +62,7 @@ pub enum MacOSCollectionError { impl From for ProcessCollectionError { fn from(err: MacOSCollectionError) -> Self { - ProcessCollectionError::PlatformError { + Self::PlatformError { message: err.to_string(), } } @@ -65,6 +70,7 @@ impl From for ProcessCollectionError { /// Enhanced macOS process entitlements information. #[derive(Debug, Clone, Default, serde::Serialize)] +#[allow(clippy::struct_excessive_bools)] // Entitlements naturally have many boolean flags pub struct ProcessEntitlements { /// Process has debugging entitlements pub can_debug: bool, @@ -117,7 +123,7 @@ pub struct MacOSProcessMetadata { pub entitlements: ProcessEntitlements, /// Process is under SIP protection pub sip_protected: bool, - /// Process architecture (x86_64, arm64, etc.) + /// Process architecture (`x86_64`, arm64, etc.) pub architecture: Option, /// Code signing information pub code_signing: CodeSigningInfo, @@ -190,6 +196,7 @@ pub struct EnhancedMacOSCollector { /// Configuration for macOS-specific process collection features. #[derive(Debug, Clone)] +#[allow(clippy::struct_excessive_bools)] // Configuration flags are naturally boolean pub struct MacOSCollectorConfig { /// Whether to collect process entitlements information pub collect_entitlements: bool, @@ -275,7 +282,10 @@ impl EnhancedMacOSCollector { /// /// This method checks if the Security framework is available and can be used /// for entitlements detection by making a lightweight runtime API call. + #[allow(clippy::unnecessary_wraps)] // Result needed for future error propagation fn detect_entitlements_capability() -> ProcessCollectionResult { + const SECURITY_FRAMEWORK_PATH: &str = "/System/Library/Frameworks/Security.framework"; + // First try a lightweight runtime API call to verify Security framework availability match Self::test_security_framework_api() { Ok(available) => { @@ -292,8 +302,6 @@ impl EnhancedMacOSCollector { e ); // Fallback to path check if API binding is unavailable - const SECURITY_FRAMEWORK_PATH: &str = - "/System/Library/Frameworks/Security.framework"; if Path::new(SECURITY_FRAMEWORK_PATH).exists() { debug!("Security framework capability detected via path check"); Ok(true) @@ -313,6 +321,7 @@ impl EnhancedMacOSCollector { /// # Returns /// /// Returns `Ok(true)` if the Security framework is available, `Ok(false)` otherwise. + #[allow(clippy::unnecessary_wraps)] // Result needed for future error propagation fn test_security_framework_api() -> ProcessCollectionResult { const SECURITY_FRAMEWORK_PATH: &str = "/System/Library/Frameworks/Security.framework"; @@ -337,6 +346,7 @@ impl EnhancedMacOSCollector { /// /// Returns `Ok(true)` if SIP is enabled, `Ok(false)` if disabled, or defaults /// to `Ok(true)` for safety if the command fails. + #[allow(clippy::unnecessary_wraps)] // Result needed for future error propagation fn detect_sip_status() -> ProcessCollectionResult { // Use csrutil command to check SIP status match std::process::Command::new("/usr/bin/csrutil") @@ -388,8 +398,8 @@ impl EnhancedMacOSCollector { system.refresh_all(); let mut events = Vec::new(); - let mut inaccessible_count = 0; - let mut processed_count = 0; + let mut inaccessible_count: usize = 0; + let mut processed_count: usize = 0; let max_processes = base_config.max_processes; for (pid, process) in system.processes() { @@ -403,13 +413,13 @@ impl EnhancedMacOSCollector { break; } - processed_count += 1; + processed_count = processed_count.saturating_add(1); match collector.enhance_process(*pid, process) { Ok(event) => events.push(event), Err(e) => { debug!(pid = pid.as_u32(), error = %e, "Error enhancing process"); - inaccessible_count += 1; + inaccessible_count = inaccessible_count.saturating_add(1); // Continue with other processes } } @@ -427,7 +437,7 @@ impl EnhancedMacOSCollector { }) .await .map_err(|e| ProcessCollectionError::PlatformError { - message: format!("Blocking task failed: {}", e), + message: format!("Blocking task failed: {e}"), })?; Ok((events, stats)) @@ -435,7 +445,7 @@ impl EnhancedMacOSCollector { /// Enhances a process with macOS-specific metadata. /// - /// This method converts a sysinfo Process into a ProcessEvent with additional + /// This method converts a sysinfo Process into a `ProcessEvent` with additional /// metadata collection based on the collector's configuration. /// /// # Arguments @@ -463,7 +473,7 @@ impl EnhancedMacOSCollector { if self.base_config.skip_system_processes && self.is_system_process(pid_u32, process) { return Err(ProcessCollectionError::ProcessAccessDenied { pid: pid_u32, - message: "System process skipped by configuration".to_string(), + message: "System process skipped by configuration".to_owned(), }); } @@ -471,40 +481,31 @@ impl EnhancedMacOSCollector { if self.base_config.skip_kernel_threads && self.is_kernel_thread(&name, &command_line) { return Err(ProcessCollectionError::ProcessAccessDenied { pid: pid_u32, - message: "Kernel thread skipped by configuration".to_string(), + message: "Kernel thread skipped by configuration".to_owned(), }); } - let ppid = process.parent().map(|p| p.as_u32()); + let ppid = process.parent().map(sysinfo::Pid::as_u32); let executable_path = process.exe().map(|p| p.to_string_lossy().to_string()); - let start_time = - Some(SystemTime::UNIX_EPOCH + std::time::Duration::from_secs(process.start_time())); + let start_time = SystemTime::UNIX_EPOCH + .checked_add(std::time::Duration::from_secs(process.start_time())); - let cpu_usage = if self.base_config.collect_enhanced_metadata { - Some(process.cpu_usage() as f64) - } else { - None - }; + let cpu_usage = self + .base_config + .collect_enhanced_metadata + .then(|| f64::from(process.cpu_usage())); let memory_usage = if self.base_config.collect_enhanced_metadata { let memory = process.memory(); - if memory > 0 { - Some(memory.saturating_mul(1024)) - } else { - None - } + (memory > 0).then(|| memory.saturating_mul(1024)) } else { None }; // Compute executable hash if requested - let executable_hash = if self.base_config.compute_executable_hashes { - // TODO: Implement executable hashing - compute SHA-256 hash of executable file - None - } else { - None - }; + // TODO: Implement executable hashing - compute SHA-256 hash of executable file + let executable_hash: Option = None; let user_id = process.user_id().map(|u| u.to_string()); let accessible = true; // If we can read process info, it's accessible @@ -605,9 +606,12 @@ impl EnhancedMacOSCollector { // Use heuristics to determine entitlements based on path and process characteristics let path_str = exe_path.to_string_lossy(); + // All paths share setting can_debug = false at start + entitlements.can_debug = false; + // Check if it's a system process (likely has system access) if path_str.starts_with("/System/") || path_str.starts_with("/usr/") { - entitlements.can_debug = false; // System processes typically can't be debugged + // System processes typically can't be debugged entitlements.system_access = true; // System processes have system access entitlements.sandboxed = false; // System processes are typically not sandboxed entitlements.network_access = true; @@ -616,7 +620,6 @@ impl EnhancedMacOSCollector { entitlements.disable_library_validation = false; } else if path_str.contains(".app/") { // App bundle - likely sandboxed - entitlements.can_debug = false; entitlements.system_access = false; entitlements.sandboxed = true; // Apps are typically sandboxed entitlements.network_access = true; // Most apps have network access @@ -625,7 +628,6 @@ impl EnhancedMacOSCollector { entitlements.disable_library_validation = false; } else { // Other executables - assume minimal entitlements - entitlements.can_debug = false; entitlements.system_access = false; entitlements.sandboxed = false; entitlements.network_access = true; @@ -660,7 +662,7 @@ impl EnhancedMacOSCollector { pid: u32, process: &sysinfo::Process, ) -> ProcessCollectionResult { - if let Some(exe_path) = process.exe() { + process.exe().map_or(Ok(false), |exe_path| { let path_str = exe_path.to_string_lossy(); // Common SIP-protected paths on macOS @@ -685,9 +687,7 @@ impl EnhancedMacOSCollector { ); Ok(is_protected) - } else { - Ok(false) - } + }) } /// Checks if a process has a valid code signature using heuristic/path-based checks. @@ -730,7 +730,7 @@ impl EnhancedMacOSCollector { code_signing.certificate_valid = true; code_signing.team_id = None; // Would need Security framework API calls code_signing.bundle_id = None; - code_signing.authority = Some("Apple".to_string()); // Assume Apple for system processes + code_signing.authority = Some("Apple".to_owned()); // Assume Apple for system processes debug!(pid = pid, path = %path_str, "Process likely has valid code signature (heuristic: system/app)"); } else { @@ -769,12 +769,17 @@ impl EnhancedMacOSCollector { // Check if it's an app bundle if path_str.contains(".app/") { - // Extract bundle name from path + // Extract bundle name from path using safe string operations + #[allow(clippy::string_slice)] + // Safe: using byte indices from find operations on ASCII patterns if let Some(app_start) = path_str.rfind('/') - && let Some(app_end) = path_str[..app_start].rfind(".app") - && let Some(name_start) = path_str[..app_end].rfind('/') + && let Some(app_end) = path_str.get(..app_start).and_then(|s| s.rfind(".app")) + && let Some(name_start) = path_str.get(..app_end).and_then(|s| s.rfind('/')) { - bundle_info.name = Some(path_str[name_start + 1..app_end].to_string()); + let start_idx = name_start.saturating_add(1); + if let Some(name) = path_str.get(start_idx..app_end) { + bundle_info.name = Some(name.to_owned()); + } } debug!( @@ -813,7 +818,7 @@ impl EnhancedMacOSCollector { } #[cfg(target_arch = "aarch64")] { - Some("arm64".to_string()) + Some("arm64".to_owned()) } #[cfg(not(any(target_arch = "x86_64", target_arch = "aarch64")))] { @@ -838,6 +843,42 @@ impl EnhancedMacOSCollector { /// /// Returns `true` if the process is determined to be a system process, `false` otherwise. fn is_system_process(&self, pid: u32, process: &sysinfo::Process) -> bool { + // Common macOS system process patterns - declared at start of scope + const SYSTEM_PROCESSES: &[&str] = &[ + "kernel_task", + "launchd", + "kextd", + "kernelmanagerd", + "UserEventAgent", + "cfprefsd", + "distnoted", + "syslogd", + "logd", + "systemstats", + "WindowServer", + "loginwindow", + "Dock", + "Finder", + "SystemUIServer", + "coreaudiod", + "bluetoothd", + "wifid", + "networkd", + "securityd", + "trustd", + "sandboxd", + "spindump", + "ReportCrash", + "crashreporterd", + "notifyd", + "powerd", + "thermald", + "hidd", + "locationd", + "CommCenter", + "SpringBoard", // iOS/iPadOS + ]; + // Check executable path - most reliable indicator if let Some(exe_path) = process.exe() { let path_str = exe_path.to_string_lossy(); @@ -875,42 +916,6 @@ impl EnhancedMacOSCollector { return true; } - // Common macOS system process patterns - const SYSTEM_PROCESSES: &[&str] = &[ - "kernel_task", - "launchd", - "kextd", - "kernelmanagerd", - "UserEventAgent", - "cfprefsd", - "distnoted", - "syslogd", - "logd", - "systemstats", - "WindowServer", - "loginwindow", - "Dock", - "Finder", - "SystemUIServer", - "coreaudiod", - "bluetoothd", - "wifid", - "networkd", - "securityd", - "trustd", - "sandboxd", - "spindump", - "ReportCrash", - "crashreporterd", - "notifyd", - "powerd", - "thermald", - "hidd", - "locationd", - "CommCenter", - "SpringBoard", // iOS/iPadOS - ]; - // Exact matches if SYSTEM_PROCESSES .iter() @@ -950,7 +955,7 @@ impl EnhancedMacOSCollector { /// /// # Returns /// - /// Returns a tuple containing (name, executable_path, memory_kb, virtual_memory_kb) + /// Returns a tuple containing (name, `executable_path`, `memory_kb`, `virtual_memory_kb`) /// or None if the process information is unavailable. fn get_process_info( &self, @@ -958,7 +963,7 @@ impl EnhancedMacOSCollector { ) -> Option<(String, Option, u64, u64)> { Some(( process.name().to_string_lossy().into_owned(), - process.exe().map(|p| p.to_path_buf()), + process.exe().map(std::path::Path::to_path_buf), process.memory(), process.virtual_memory(), )) @@ -977,7 +982,7 @@ impl EnhancedMacOSCollector { /// # Returns /// /// Always returns `false` as macOS doesn't expose kernel threads like Linux. - fn is_kernel_thread(&self, _name: &str, _command_line: &[String]) -> bool { + const fn is_kernel_thread(&self, _name: &str, _command_line: &[String]) -> bool { // macOS doesn't have kernel threads in the same way as Linux false } @@ -1020,7 +1025,8 @@ impl ProcessCollector for EnhancedMacOSCollector { let (events, mut stats) = self.collect_processes_enhanced().await?; // Set the collection duration - stats.collection_duration_ms = start_time.elapsed().as_millis() as u64; + stats.collection_duration_ms = + u64::try_from(start_time.elapsed().as_millis()).unwrap_or(u64::MAX); debug!( collector = self.name(), @@ -1044,11 +1050,10 @@ impl ProcessCollector for EnhancedMacOSCollector { let mut system = System::new(); system.refresh_processes(ProcessesToUpdate::All, true); let sysinfo_pid = Pid::from_u32(pid); - if let Some(process) = system.process(sysinfo_pid) { - self.enhance_process(sysinfo_pid, process) - } else { - Err(ProcessCollectionError::ProcessNotFound { pid }) - } + system.process(sysinfo_pid).map_or( + Err(ProcessCollectionError::ProcessNotFound { pid }), + |process| self.enhance_process(sysinfo_pid, process), + ) } async fn health_check(&self) -> ProcessCollectionResult<()> { @@ -1064,7 +1069,7 @@ impl ProcessCollector for EnhancedMacOSCollector { let process_count = system.processes().len(); if process_count == 0 { return Err(ProcessCollectionError::SystemEnumerationFailed { - message: "No processes found during health check".to_string(), + message: "No processes found during health check".to_owned(), }); } diff --git a/procmond/src/main.rs b/procmond/src/main.rs index 07b9338..2820851 100644 --- a/procmond/src/main.rs +++ b/procmond/src/main.rs @@ -17,17 +17,15 @@ use tracing::info; fn parse_interval(s: &str) -> Result { let interval: u64 = s .parse() - .map_err(|_| format!("Invalid interval '{}': must be a number", s))?; + .map_err(|_parse_err| format!("Invalid interval '{s}': must be a number"))?; if interval < 5 { Err(format!( - "Interval too small: {} seconds. Minimum allowed is 5 seconds", - interval + "Interval too small: {interval} seconds. Minimum allowed is 5 seconds" )) } else if interval > 3600 { Err(format!( - "Interval too large: {} seconds. Maximum allowed is 3600 seconds (1 hour)", - interval + "Interval too large: {interval} seconds. Maximum allowed is 3600 seconds (1 hour)" )) } else { Ok(interval) @@ -77,27 +75,32 @@ pub async fn main() -> Result<(), Box> { let _config = config_loader.load()?; // Initialize telemetry - let mut telemetry = telemetry::TelemetryCollector::new("procmond".to_string()); + let mut telemetry = telemetry::TelemetryCollector::new("procmond".to_owned()); // Initialize database let db_manager = Arc::new(Mutex::new(storage::DatabaseManager::new(&cli.database)?)); // Record operation in telemetry - let timer = telemetry::PerformanceTimer::start("process_collection".to_string()); + let timer = telemetry::PerformanceTimer::start("process_collection".to_owned()); let duration = timer.finish(); telemetry.record_operation(duration); // Perform health check let health_check = telemetry.health_check(); - println!("Health status: {}", health_check.status); + info!(status = %health_check.status, "Health check completed"); // Get database statistics let stats = db_manager.lock().await.get_stats()?; - println!("Database stats: {:?}", stats); + info!( + processes = stats.processes, + rules = stats.rules, + alerts = stats.alerts, + "Database stats retrieved" + ); // Create collector configuration let mut collector_config = CollectorConfig::new() - .with_component_name("procmond".to_string()) + .with_component_name("procmond".to_owned()) .with_ipc_endpoint(daemoneye_lib::ipc::IpcConfig::default().endpoint_path) .with_max_event_sources(1) .with_event_buffer_size(1000) @@ -112,9 +115,9 @@ pub async fn main() -> Result<(), Box> { collector_config.registration = Some(CollectorRegistrationConfig { enabled: true, broker: None, // Will be set if broker is available via environment/config - collector_id: Some("procmond".to_string()), - collector_type: Some("procmond".to_string()), - topic: "control.collector.registration".to_string(), + collector_id: Some("procmond".to_owned()), + collector_type: Some("procmond".to_owned()), + topic: "control.collector.registration".to_owned(), timeout: Duration::from_secs(10), retry_attempts: 3, heartbeat_interval: Duration::from_secs(30), @@ -138,8 +141,7 @@ pub async fn main() -> Result<(), Box> { let registration_enabled = collector_config .registration .as_ref() - .map(|r| r.enabled) - .unwrap_or(false); + .is_some_and(|r| r.enabled); let collector_id_str = collector_config .registration .as_ref() diff --git a/procmond/src/monitor_collector.rs b/procmond/src/monitor_collector.rs index 6bef60c..1e237e4 100644 --- a/procmond/src/monitor_collector.rs +++ b/procmond/src/monitor_collector.rs @@ -2,7 +2,7 @@ //! //! This module provides a concrete implementation of the Monitor Collector framework //! specifically for procmond, integrating process lifecycle tracking with the -//! collector-core EventSource trait. +//! collector-core `EventSource` trait. use crate::{ lifecycle::{LifecycleTrackingConfig, ProcessLifecycleTracker}, @@ -31,7 +31,7 @@ use tracing::{debug, error, info, instrument, warn}; /// Procmond-specific Monitor Collector configuration. /// -/// This extends the base MonitorCollectorConfig with procmond-specific +/// This extends the base `MonitorCollectorConfig` with procmond-specific /// configuration for process collection and lifecycle tracking. #[derive(Debug, Clone, Default)] pub struct ProcmondMonitorConfig { @@ -84,7 +84,7 @@ pub struct ProcmondMonitorCollector { impl ProcmondMonitorCollector { /// Creates a new Procmond Monitor Collector. - pub async fn new( + pub fn new( database: Arc>, config: ProcmondMonitorConfig, ) -> anyhow::Result { @@ -123,9 +123,8 @@ impl ProcmondMonitorCollector { let event_bus = if config.base_config.enable_event_driven { let bus_config = collector_core::EventBusConfig::default(); let local_bus = LocalEventBus::new(bus_config); - Arc::new(RwLock::new(Some( - Arc::new(local_bus) as Arc - ))) + let bus_arc: Arc = Arc::new(local_bus); + Arc::new(RwLock::new(Some(bus_arc))) } else { Arc::new(RwLock::new(None)) }; @@ -160,7 +159,7 @@ impl ProcmondMonitorCollector { tx: &mpsc::Sender, shutdown_signal: &Arc, ) -> anyhow::Result<()> { - let timer = PerformanceTimer::start("procmond_monitor_collection".to_string()); + let timer = PerformanceTimer::start("procmond_monitor_collection".to_owned()); let collection_start = Instant::now(); // Check for shutdown before starting @@ -205,9 +204,11 @@ impl ProcmondMonitorCollector { // Update statistics self.stats.collection_cycles.fetch_add(1, Ordering::Relaxed); + #[allow(clippy::as_conversions)] // Safe: usize to u64 for counter + let event_count = lifecycle_events.len() as u64; self.stats .lifecycle_events - .fetch_add(lifecycle_events.len() as u64, Ordering::Relaxed); + .fetch_add(event_count, Ordering::Relaxed); // Send process events with backpressure handling for process_event in process_events { @@ -253,9 +254,16 @@ impl ProcmondMonitorCollector { const CIRCUIT_BREAKER_COOLDOWN_SECS: u64 = 10; // Check circuit breaker state + #[allow(clippy::expect_used)] // Mutex poisoning indicates a panic - propagate it { - let cooldown_guard = self.circuit_breaker_until.lock().unwrap(); - if let Some(cooldown_until) = *cooldown_guard { + let cooldown_until_opt = { + let cooldown_guard = self + .circuit_breaker_until + .lock() + .expect("circuit_breaker_until mutex poisoned"); + *cooldown_guard + }; + if let Some(cooldown_until) = cooldown_until_opt { if Instant::now() < cooldown_until { // Circuit breaker is active, increment backpressure metric and return error self.stats @@ -266,14 +274,14 @@ impl ProcmondMonitorCollector { cooldown_until ); return Err(anyhow::anyhow!("Circuit breaker active, event dropped")); - } else { - // Cooldown expired, reset circuit breaker - drop(cooldown_guard); - let mut guard = self.circuit_breaker_until.lock().unwrap(); - *guard = None; - self.consecutive_backpressure_timeouts - .store(0, Ordering::Relaxed); } + // Cooldown expired, reset circuit breaker + *self + .circuit_breaker_until + .lock() + .expect("circuit_breaker_until mutex poisoned") = None; + self.consecutive_backpressure_timeouts + .store(0, Ordering::Relaxed); } } @@ -305,10 +313,10 @@ impl ProcmondMonitorCollector { } Err(_) => { // Timeout acquiring permit - let consecutive = self + let previous = self .consecutive_backpressure_timeouts - .fetch_add(1, Ordering::Relaxed) - + 1; + .fetch_add(1, Ordering::Relaxed); + let consecutive = previous.saturating_add(1); self.stats .backpressure_events @@ -320,9 +328,15 @@ impl ProcmondMonitorCollector { // Activate circuit breaker if threshold reached if consecutive >= CIRCUIT_BREAKER_THRESHOLD { + #[allow(clippy::arithmetic_side_effects)] // Safe: Instant + Duration let cooldown_until = Instant::now() + Duration::from_secs(CIRCUIT_BREAKER_COOLDOWN_SECS); - let mut guard = self.circuit_breaker_until.lock().unwrap(); + #[allow(clippy::expect_used)] + // Mutex poisoning indicates a panic - propagate it + let mut guard = self + .circuit_breaker_until + .lock() + .expect("circuit_breaker_until mutex poisoned"); *guard = Some(cooldown_until); warn!( cooldown_seconds = CIRCUIT_BREAKER_COOLDOWN_SECS, @@ -387,6 +401,8 @@ impl EventSource for ProcmondMonitorCollector { tx: mpsc::Sender, _shutdown_signal: Arc, ) -> anyhow::Result<()> { + const MAX_CONSECUTIVE_FAILURES: u32 = 5; + info!( collection_interval_secs = self.config.base_config.collection_interval.as_secs(), max_events_in_flight = self.config.base_config.max_events_in_flight, @@ -396,8 +412,7 @@ impl EventSource for ProcmondMonitorCollector { // Main collection loop let mut collection_interval = interval(self.config.base_config.collection_interval); - let mut consecutive_failures = 0u32; - const MAX_CONSECUTIVE_FAILURES: u32 = 5; + let mut consecutive_failures = 0_u32; // Skip first tick to avoid immediate collection collection_interval.tick().await; @@ -418,7 +433,7 @@ impl EventSource for ProcmondMonitorCollector { } Err(e) => { error!(error = %e, "Procmond monitor collection failed"); - consecutive_failures += 1; + consecutive_failures = consecutive_failures.saturating_add(1); if consecutive_failures >= MAX_CONSECUTIVE_FAILURES { error!( @@ -426,8 +441,7 @@ impl EventSource for ProcmondMonitorCollector { "Too many consecutive failures, stopping Procmond Monitor Collector" ); return Err(anyhow::anyhow!( - "Procmond Monitor Collector failed {} consecutive times", - consecutive_failures + "Procmond Monitor Collector failed {consecutive_failures} consecutive times" )); } @@ -442,7 +456,7 @@ impl EventSource for ProcmondMonitorCollector { } } - _ = async { + () = async { while !self.shutdown_signal.load(Ordering::Relaxed) { tokio::time::sleep(Duration::from_millis(100)).await; } @@ -479,6 +493,14 @@ impl MonitorCollectorTrait for ProcmondMonitorCollector { } #[cfg(test)] +#[allow( + clippy::expect_used, + clippy::unwrap_used, + clippy::unused_async, + clippy::shadow_reuse, + clippy::shadow_unrelated, + clippy::clone_on_ref_ptr +)] mod tests { use super::*; use daemoneye_lib::storage::DatabaseManager; @@ -498,7 +520,7 @@ mod tests { let db_manager = create_test_database().await; let config = ProcmondMonitorConfig::default(); - let collector = ProcmondMonitorCollector::new(db_manager, config).await; + let collector = ProcmondMonitorCollector::new(db_manager, config); assert!(collector.is_ok()); let collector = collector.unwrap(); @@ -522,9 +544,7 @@ mod tests { ..Default::default() }; - let collector = ProcmondMonitorCollector::new(db_manager.clone(), fast_config) - .await - .unwrap(); + let collector = ProcmondMonitorCollector::new(db_manager.clone(), fast_config).unwrap(); let caps = collector.capabilities(); assert!(caps.contains(SourceCaps::REALTIME)); @@ -537,9 +557,7 @@ mod tests { ..Default::default() }; - let collector = ProcmondMonitorCollector::new(db_manager, slow_config) - .await - .unwrap(); + let collector = ProcmondMonitorCollector::new(db_manager, slow_config).unwrap(); let caps = collector.capabilities(); assert!(!caps.contains(SourceCaps::REALTIME)); } @@ -549,9 +567,7 @@ mod tests { let db_manager = create_test_database().await; let config = ProcmondMonitorConfig::default(); - let collector = ProcmondMonitorCollector::new(db_manager, config) - .await - .unwrap(); + let collector = ProcmondMonitorCollector::new(db_manager, config).unwrap(); // Initial health check should pass let health_result = collector.health_check().await; @@ -563,9 +579,7 @@ mod tests { let db_manager = create_test_database().await; let config = ProcmondMonitorConfig::default(); - let collector = ProcmondMonitorCollector::new(db_manager, config) - .await - .unwrap(); + let collector = ProcmondMonitorCollector::new(db_manager, config).unwrap(); // Initial statistics should be zero let stats = collector.stats(); diff --git a/procmond/src/process_collector.rs b/procmond/src/process_collector.rs index 0b7c11a..6014d11 100644 --- a/procmond/src/process_collector.rs +++ b/procmond/src/process_collector.rs @@ -14,6 +14,7 @@ use tracing::{debug, error, warn}; /// Errors that can occur during process collection. #[derive(Debug, Error)] +#[non_exhaustive] pub enum ProcessCollectionError { /// System-level enumeration failed #[error("System process enumeration failed: {message}")] @@ -78,6 +79,7 @@ pub struct CollectionStats { /// }; /// ``` #[derive(Debug, Clone)] +#[allow(clippy::struct_excessive_bools)] // Config structs naturally have boolean flags pub struct ProcessCollectionConfig { /// Whether to collect enhanced metadata (CPU, memory, etc.) /// @@ -206,6 +208,7 @@ pub trait ProcessCollector: Send + Sync { /// Capabilities that a process collector can provide. #[derive(Debug, Clone, Copy, PartialEq, Eq)] +#[allow(clippy::struct_excessive_bools)] // Capability flags are naturally boolean pub struct ProcessCollectorCapabilities { /// Can collect basic process information (PID, name, etc.) pub basic_info: bool, @@ -241,11 +244,12 @@ pub struct SysinfoProcessCollector { impl SysinfoProcessCollector { /// Creates a new sysinfo-based process collector with the specified configuration. - pub fn new(config: ProcessCollectionConfig) -> Self { + pub const fn new(config: ProcessCollectionConfig) -> Self { Self { config } } - /// Converts a sysinfo process to a ProcessEvent with comprehensive error handling. + /// Converts a sysinfo process to a `ProcessEvent` with comprehensive error handling. + #[allow(clippy::trivially_copy_pass_by_ref)] // Pid reference matches sysinfo API patterns fn convert_process_to_event( &self, pid: &Pid, @@ -257,15 +261,15 @@ impl SysinfoProcessCollector { if pid_u32 == 0 { return Err(ProcessCollectionError::InvalidProcessData { pid: pid_u32, - message: "Invalid PID: 0".to_string(), + message: "Invalid PID: 0".to_owned(), }); } - let ppid = process.parent().map(|p| p.as_u32()); + let ppid = process.parent().map(sysinfo::Pid::as_u32); // Get process name with fallback let name = if process.name().is_empty() { - format!("", pid_u32) + format!("") } else { process.name().to_string_lossy().to_string() }; @@ -274,7 +278,7 @@ impl SysinfoProcessCollector { if self.config.skip_system_processes && self.is_system_process(&name, pid_u32) { return Err(ProcessCollectionError::ProcessAccessDenied { pid: pid_u32, - message: "System process skipped by configuration".to_string(), + message: "System process skipped by configuration".to_owned(), }); } @@ -282,7 +286,7 @@ impl SysinfoProcessCollector { if self.config.skip_kernel_threads && self.is_kernel_thread(&name, process) { return Err(ProcessCollectionError::ProcessAccessDenied { pid: pid_u32, - message: "Kernel thread skipped by configuration".to_string(), + message: "Kernel thread skipped by configuration".to_owned(), }); } @@ -299,7 +303,8 @@ impl SysinfoProcessCollector { let start_time = if self.config.collect_enhanced_metadata { let start_time_secs = process.start_time(); if start_time_secs > 0 { - Some(SystemTime::UNIX_EPOCH + std::time::Duration::from_secs(start_time_secs)) + // Safe: checked_add handles potential overflow + SystemTime::UNIX_EPOCH.checked_add(std::time::Duration::from_secs(start_time_secs)) } else { None } @@ -313,18 +318,10 @@ impl SysinfoProcessCollector { let memory = process.memory(); // Validate CPU usage (should be between 0 and 100 * num_cpus) - let cpu_usage = if cpu.is_finite() && cpu >= 0.0 { - Some(cpu as f64) - } else { - None - }; + let cpu_usage = (cpu.is_finite() && cpu >= 0.0).then(|| f64::from(cpu)); // Memory usage should be reasonable (convert from KB to bytes) - let memory_usage = if memory > 0 { - Some(memory.saturating_mul(1024)) - } else { - None - }; + let memory_usage = (memory > 0).then(|| memory.saturating_mul(1024)); (cpu_usage, memory_usage) } else { @@ -332,13 +329,9 @@ impl SysinfoProcessCollector { }; // Compute executable hash if requested - let executable_hash = if self.config.compute_executable_hashes { - // TODO: Implement executable hashing (issue #40) - // For now, we'll leave this as None until the hashing implementation is added - None - } else { - None - }; + // TODO: Implement executable hashing (issue #40) + // For now, we'll leave this as None until the hashing implementation is added + let executable_hash: Option = None; let user_id = process.user_id().map(|uid| uid.to_string()); let accessible = true; // Process is accessible if we can enumerate it @@ -363,6 +356,7 @@ impl SysinfoProcessCollector { } /// Determines if a process is a system process based on name and PID. + #[allow(clippy::unused_self)] // May use self for configuration in future fn is_system_process(&self, name: &str, pid: u32) -> bool { // Common system process patterns const SYSTEM_PROCESSES: &[&str] = &[ @@ -391,17 +385,8 @@ impl SysinfoProcessCollector { } /// Determines if a process is a kernel thread. + #[allow(clippy::unused_self)] // May use self for configuration in future fn is_kernel_thread(&self, name: &str, process: &Process) -> bool { - // Kernel threads typically have no command line arguments - if !process.cmd().is_empty() { - return false; - } - - // Kernel threads often have names in brackets - if name.starts_with('[') && name.ends_with(']') { - return true; - } - // Common kernel thread patterns const KERNEL_THREAD_PATTERNS: &[&str] = &[ "kworker", @@ -415,6 +400,16 @@ impl SysinfoProcessCollector { "kauditd", ]; + // Kernel threads typically have no command line arguments + if !process.cmd().is_empty() { + return false; + } + + // Kernel threads often have names in brackets + if name.starts_with('[') && name.ends_with(']') { + return true; + } + let name_lower = name.to_lowercase(); KERNEL_THREAD_PATTERNS .iter() @@ -468,7 +463,7 @@ impl ProcessCollector for SysinfoProcessCollector { if system.processes().is_empty() { return Err(ProcessCollectionError::SystemEnumerationFailed { - message: "No processes found during enumeration".to_string(), + message: "No processes found during enumeration".to_owned(), }); } @@ -476,17 +471,17 @@ impl ProcessCollector for SysinfoProcessCollector { }) .await .map_err(|e| ProcessCollectionError::SystemEnumerationFailed { - message: format!("Process enumeration task failed: {}", e), + message: format!("Process enumeration task failed: {e}"), })?; let system = enumeration_result?; let mut events = Vec::new(); let mut stats = CollectionStats::default(); - let mut processed_count = 0; + let mut processed_count: usize = 0; // Process each process with individual error handling - for (pid, process) in system.processes().iter() { + for (pid, process) in system.processes() { // Check if we've hit the maximum process limit if self.config.max_processes > 0 && events.len() >= self.config.max_processes { debug!( @@ -497,20 +492,26 @@ impl ProcessCollector for SysinfoProcessCollector { break; } - processed_count += 1; + processed_count = processed_count.saturating_add(1); match self.convert_process_to_event(pid, process) { Ok(event) => { events.push(event); - stats.successful_collections += 1; + stats.successful_collections = stats.successful_collections.saturating_add(1); } - Err(ProcessCollectionError::ProcessAccessDenied { pid, message }) => { - debug!(pid = pid, reason = %message, "Process access denied"); - stats.inaccessible_processes += 1; + Err(ProcessCollectionError::ProcessAccessDenied { + pid: denied_pid, + message, + }) => { + debug!(pid = denied_pid, reason = %message, "Process access denied"); + stats.inaccessible_processes = stats.inaccessible_processes.saturating_add(1); } - Err(ProcessCollectionError::InvalidProcessData { pid, message }) => { - warn!(pid = pid, reason = %message, "Invalid process data"); - stats.invalid_processes += 1; + Err(ProcessCollectionError::InvalidProcessData { + pid: invalid_pid, + message, + }) => { + warn!(pid = invalid_pid, reason = %message, "Invalid process data"); + stats.invalid_processes = stats.invalid_processes.saturating_add(1); } Err(e) => { error!( @@ -518,13 +519,14 @@ impl ProcessCollector for SysinfoProcessCollector { error = %e, "Unexpected error during process conversion" ); - stats.invalid_processes += 1; + stats.invalid_processes = stats.invalid_processes.saturating_add(1); } } } stats.total_processes = processed_count; - stats.collection_duration_ms = start_time.elapsed().as_millis() as u64; + stats.collection_duration_ms = + u64::try_from(start_time.elapsed().as_millis()).unwrap_or(u64::MAX); debug!( collector = self.name(), @@ -570,16 +572,15 @@ impl ProcessCollector for SysinfoProcessCollector { }) .await .map_err(|e| ProcessCollectionError::SystemEnumerationFailed { - message: format!("Single process lookup task failed: {}", e), + message: format!("Single process lookup task failed: {e}"), })?; let system = lookup_result?; let sysinfo_pid = Pid::from_u32(pid); - if let Some(process) = system.process(sysinfo_pid) { - self.convert_process_to_event(&sysinfo_pid, process) - } else { - Err(ProcessCollectionError::ProcessNotFound { pid }) - } + system.process(sysinfo_pid).map_or( + Err(ProcessCollectionError::ProcessNotFound { pid }), + |process| self.convert_process_to_event(&sysinfo_pid, process), + ) } async fn health_check(&self) -> ProcessCollectionResult<()> { @@ -597,7 +598,7 @@ impl ProcessCollector for SysinfoProcessCollector { let process_count = system.processes().len(); if process_count == 0 { return Err(ProcessCollectionError::SystemEnumerationFailed { - message: "No processes found during health check".to_string(), + message: "No processes found during health check".to_owned(), }); } @@ -605,7 +606,7 @@ impl ProcessCollector for SysinfoProcessCollector { }) .await .map_err(|e| ProcessCollectionError::SystemEnumerationFailed { - message: format!("Health check task failed: {}", e), + message: format!("Health check task failed: {e}"), })?; let process_count = health_result?; @@ -652,7 +653,7 @@ impl FallbackProcessCollector { } /// Detects the current platform name for logging and diagnostics. - fn detect_platform() -> &'static str { + const fn detect_platform() -> &'static str { if cfg!(target_os = "freebsd") { "freebsd" } else if cfg!(target_os = "openbsd") { @@ -724,7 +725,7 @@ impl FallbackProcessCollector { } /// Checks if the platform supports kernel thread enumeration. - fn supports_kernel_threads() -> bool { + const fn supports_kernel_threads() -> bool { // Most BSD variants support kernel threads, but with limitations cfg!(any( target_os = "freebsd", @@ -734,7 +735,8 @@ impl FallbackProcessCollector { )) } - /// Converts a sysinfo process to a ProcessEvent with platform-specific handling. + /// Converts a sysinfo process to a `ProcessEvent` with platform-specific handling. + #[allow(clippy::trivially_copy_pass_by_ref)] // Pid reference matches sysinfo API patterns fn convert_process_to_event( &self, pid: &Pid, @@ -746,15 +748,15 @@ impl FallbackProcessCollector { if pid_u32 == 0 { return Err(ProcessCollectionError::InvalidProcessData { pid: pid_u32, - message: "Invalid PID: 0".to_string(), + message: "Invalid PID: 0".to_owned(), }); } - let ppid = process.parent().map(|p| p.as_u32()); + let ppid = process.parent().map(sysinfo::Pid::as_u32); // Get process name with fallback let name = if process.name().is_empty() { - format!("", pid_u32) + format!("") } else { process.name().to_string_lossy().to_string() }; @@ -763,7 +765,7 @@ impl FallbackProcessCollector { if self.config.skip_system_processes && self.is_system_process(&name, pid_u32) { return Err(ProcessCollectionError::ProcessAccessDenied { pid: pid_u32, - message: "System process skipped by configuration".to_string(), + message: "System process skipped by configuration".to_owned(), }); } @@ -771,7 +773,7 @@ impl FallbackProcessCollector { if self.config.skip_kernel_threads && self.is_kernel_thread(&name, pid_u32) { return Err(ProcessCollectionError::ProcessAccessDenied { pid: pid_u32, - message: "Kernel thread skipped by configuration".to_string(), + message: "Kernel thread skipped by configuration".to_owned(), }); } @@ -792,8 +794,14 @@ impl FallbackProcessCollector { let memory = process.memory(); let start = process.start_time(); + let start_time_opt = if start > 0 { + // Safe: checked_add handles potential overflow + SystemTime::UNIX_EPOCH.checked_add(std::time::Duration::from_secs(start)) + } else { + None + }; ( - if cpu > 0.0 { Some(cpu as f64) } else { None }, + (cpu > 0.0).then(|| f64::from(cpu)), if memory > 0 { // Convert from KiB to bytes, handling potential overflow memory.checked_mul(1024).or_else(|| { @@ -806,24 +814,16 @@ impl FallbackProcessCollector { } else { None }, - if start > 0 { - Some(SystemTime::UNIX_EPOCH + std::time::Duration::from_secs(start)) - } else { - None - }, + start_time_opt, ) } else { (None, None, None) }; // Compute executable hash if configured and path is available - let executable_hash = if self.config.compute_executable_hashes { - // TODO: Implement executable hashing (issue #40) - // For now, we'll leave this as None until the hashing implementation is added - None - } else { - None - }; + // TODO: Implement executable hashing (issue #40) + // For now, we'll leave this as None until the hashing implementation is added + let executable_hash: Option = None; let user_id = process.user_id().map(|uid| uid.to_string()); let accessible = true; // Process is accessible if we can enumerate it @@ -851,6 +851,7 @@ impl FallbackProcessCollector { } /// Determines if a process is a system process based on name and PID. + #[allow(clippy::unused_self)] // May use self for configuration in future fn is_system_process(&self, name: &str, pid: u32) -> bool { // Common system process patterns across BSD variants const SYSTEM_PROCESSES: &[&str] = &[ @@ -886,17 +887,8 @@ impl FallbackProcessCollector { } /// Determines if a process is a kernel thread (platform-specific logic). + #[allow(clippy::unused_self)] // May use self for configuration in future fn is_kernel_thread(&self, name: &str, pid: u32) -> bool { - // Kernel threads typically have very low PIDs on BSD systems - if pid < 5 { - return true; - } - - // Kernel threads often have names in brackets or specific patterns - if name.starts_with('[') && name.ends_with(']') { - return true; - } - // BSD-specific kernel thread patterns const KERNEL_THREAD_PATTERNS: &[&str] = &[ "kworker", @@ -920,6 +912,16 @@ impl FallbackProcessCollector { "usb", ]; + // Kernel threads typically have very low PIDs on BSD systems + if pid < 5 { + return true; + } + + // Kernel threads often have names in brackets or specific patterns + if name.starts_with('[') && name.ends_with(']') { + return true; + } + let name_lower = name.to_lowercase(); KERNEL_THREAD_PATTERNS .iter() @@ -970,8 +972,7 @@ impl ProcessCollector for FallbackProcessCollector { if system.processes().is_empty() { return Err(ProcessCollectionError::SystemEnumerationFailed { message: format!( - "No processes found during enumeration on platform: {}", - platform_name + "No processes found during enumeration on platform: {platform_name}" ), }); } @@ -980,17 +981,17 @@ impl ProcessCollector for FallbackProcessCollector { }) .await .map_err(|e| ProcessCollectionError::SystemEnumerationFailed { - message: format!("Process enumeration task failed: {}", e), + message: format!("Process enumeration task failed: {e}"), })?; let system = enumeration_result?; let mut events = Vec::new(); let mut stats = CollectionStats::default(); - let mut processed_count = 0; + let mut processed_count: usize = 0; // Process each process with individual error handling - for (pid, process) in system.processes().iter() { + for (pid, process) in system.processes() { // Check if we've hit the maximum process limit if self.config.max_processes > 0 && events.len() >= self.config.max_processes { debug!( @@ -1002,29 +1003,35 @@ impl ProcessCollector for FallbackProcessCollector { break; } - processed_count += 1; + processed_count = processed_count.saturating_add(1); match self.convert_process_to_event(pid, process) { Ok(event) => { events.push(event); - stats.successful_collections += 1; + stats.successful_collections = stats.successful_collections.saturating_add(1); } - Err(ProcessCollectionError::ProcessAccessDenied { pid, message }) => { + Err(ProcessCollectionError::ProcessAccessDenied { + pid: denied_pid, + message, + }) => { debug!( - pid = pid, + pid = denied_pid, reason = %message, platform = self.platform_name, "Process access denied" ); - stats.inaccessible_processes += 1; + stats.inaccessible_processes = stats.inaccessible_processes.saturating_add(1); } - Err(ProcessCollectionError::InvalidProcessData { pid, message }) => { + Err(ProcessCollectionError::InvalidProcessData { + pid: invalid_pid, + message, + }) => { warn!( - pid = pid, + pid = invalid_pid, reason = %message, platform = self.platform_name, "Invalid process data" ); - stats.invalid_processes += 1; + stats.invalid_processes = stats.invalid_processes.saturating_add(1); } Err(e) => { error!( @@ -1033,13 +1040,14 @@ impl ProcessCollector for FallbackProcessCollector { platform = self.platform_name, "Unexpected error during process conversion" ); - stats.invalid_processes += 1; + stats.invalid_processes = stats.invalid_processes.saturating_add(1); } } } stats.total_processes = processed_count; - stats.collection_duration_ms = start_time.elapsed().as_millis() as u64; + stats.collection_duration_ms = + u64::try_from(start_time.elapsed().as_millis()).unwrap_or(u64::MAX); debug!( collector = self.name(), @@ -1065,7 +1073,6 @@ impl ProcessCollector for FallbackProcessCollector { // Perform single process lookup in a blocking task let config = self.config.clone(); - let _platform_name = self.platform_name; let lookup_result = tokio::task::spawn_blocking(move || { let mut system = System::new(); @@ -1088,16 +1095,15 @@ impl ProcessCollector for FallbackProcessCollector { }) .await .map_err(|e| ProcessCollectionError::SystemEnumerationFailed { - message: format!("Single process lookup task failed: {}", e), + message: format!("Single process lookup task failed: {e}"), })?; let system = lookup_result?; let sysinfo_pid = Pid::from_u32(pid); - if let Some(process) = system.process(sysinfo_pid) { - self.convert_process_to_event(&sysinfo_pid, process) - } else { - Err(ProcessCollectionError::ProcessNotFound { pid }) - } + system.process(sysinfo_pid).map_or( + Err(ProcessCollectionError::ProcessNotFound { pid }), + |process| self.convert_process_to_event(&sysinfo_pid, process), + ) } async fn health_check(&self) -> ProcessCollectionResult<()> { @@ -1121,8 +1127,7 @@ impl ProcessCollector for FallbackProcessCollector { if process_count == 0 { return Err(ProcessCollectionError::SystemEnumerationFailed { message: format!( - "No processes found during health check on platform: {}", - platform_name + "No processes found during health check on platform: {platform_name}" ), }); } @@ -1131,7 +1136,7 @@ impl ProcessCollector for FallbackProcessCollector { }) .await .map_err(|e| ProcessCollectionError::SystemEnumerationFailed { - message: format!("Health check task failed: {}", e), + message: format!("Health check task failed: {e}"), })?; let process_count = health_result?; @@ -1192,6 +1197,11 @@ pub fn create_process_collector(config: ProcessCollectionConfig) -> Box WalResult<()> { +//! // Create or open WAL +//! let wal = WriteAheadLog::new(PathBuf::from("/var/lib/procmond/wal")).await?; +//! +//! // Write an event and get its sequence number +//! let sequence = wal.write(process_event).await?; +//! +//! // Replay on startup to recover unpublished events +//! let events = wal.replay().await?; +//! +//! // Mark events as published to enable cleanup +//! wal.mark_published(sequence).await?; +//! Ok(()) +//! } +//! ``` + +use collector_core::event::ProcessEvent; +use std::path::{Path, PathBuf}; +use std::sync::Arc; +use std::sync::atomic::{AtomicU64, Ordering}; +use thiserror::Error; +use tokio::fs; +use tokio::io::AsyncReadExt; +use tokio::sync::Mutex; +use tracing::{debug, info, warn}; + +/// WAL-specific error types. +#[derive(Debug, Error)] +#[non_exhaustive] +pub enum WalError { + /// File I/O operation failed. + #[error("I/O error: {0}")] + Io(#[from] std::io::Error), + + /// Serialization or deserialization failed. + #[error("Serialization error: {0}")] + Serialization(String), + + /// CRC32 checksum validation failed. + #[error("Corruption detected in WAL entry (sequence: {sequence}): {message}")] + Corruption { + /// Sequence number of the corrupted entry + sequence: u64, + /// Description of corruption + message: String, + }, + + /// Sequence number mismatch during recovery. + #[error("Invalid sequence during replay: expected {expected}, found {found}")] + InvalidSequence { + /// Expected sequence number + expected: u64, + /// Actual sequence number found + found: u64, + }, + + /// File rotation operation failed. + #[error("File rotation failed: {0}")] + FileRotation(String), + + /// Replay operation encountered an error. + #[error("Replay error: {0}")] + Replay(String), +} + +/// Result type for WAL operations. +pub type WalResult = Result; + +/// Internal result type for file read operations. +/// +/// Used by helper functions to signal EOF vs I/O errors. +enum ReadResult { + /// End of file reached (expected during normal replay) + Eof, + /// I/O error occurred + Io(std::io::Error), +} + +/// A single entry in the WAL. +/// +/// Each entry contains a process event with a monotonically increasing sequence +/// number and a CRC32 checksum for corruption detection. +#[derive(Debug, Clone, serde::Serialize, serde::Deserialize)] +pub struct WalEntry { + /// Monotonically increasing sequence number across all WAL files + pub sequence: u64, + + /// The process event being persisted + pub event: ProcessEvent, + + /// CRC32 checksum of the serialized event for corruption detection + pub checksum: u32, + + /// Optional event type for topic routing (e.g., "start", "stop", "modify"). + /// Added for backward compatibility - older WAL files will deserialize with None. + #[serde(default)] + pub event_type: Option, +} + +impl WalEntry { + /// Create a new WAL entry with automatic checksum computation. + /// + /// # Arguments + /// + /// * `sequence` - Monotonically increasing sequence number + /// * `event` - Process event to persist + /// + /// # Returns + /// + /// A new `WalEntry` with checksum computed from the event data + pub fn new(sequence: u64, event: ProcessEvent) -> Self { + let checksum = Self::compute_checksum(&event); + Self { + sequence, + event, + checksum, + event_type: None, + } + } + + /// Create a new WAL entry with event type for topic routing. + /// + /// # Arguments + /// + /// * `sequence` - Monotonically increasing sequence number + /// * `event` - Process event to persist + /// * `event_type` - Event type string (e.g., "start", "stop", "modify") + /// + /// # Returns + /// + /// A new `WalEntry` with checksum and event type + pub fn with_event_type(sequence: u64, event: ProcessEvent, event_type: String) -> Self { + let checksum = Self::compute_checksum(&event); + Self { + sequence, + event, + checksum, + event_type: Some(event_type), + } + } + + /// Compute CRC32 checksum of the event's serialized form. + fn compute_checksum(event: &ProcessEvent) -> u32 { + use std::hash::Hasher; + postcard::to_allocvec(event).map_or(0, |serialized| { + let mut crc = crc32c::Crc32cHasher::new(0); + for chunk in serialized.chunks(8192) { + crc.write(chunk); + } + #[allow(clippy::as_conversions)] + // Safe: CRC32 hash is always u64, truncation to u32 is expected + { + crc.finish() as u32 + } + }) + } + + /// Verify the integrity of this entry's checksum. + /// + /// # Returns + /// + /// `true` if the checksum matches the event data, `false` otherwise + pub fn verify(&self) -> bool { + let computed = Self::compute_checksum(&self.event); + computed == self.checksum + } +} + +/// Metadata about event sequences within a WAL file. +/// +/// Tracks the minimum and maximum event sequence numbers contained in a WAL file, +/// enabling efficient cleanup of published events. +#[derive(Debug, Clone, Copy, Default)] +pub struct WalFileMetadata { + /// Minimum event sequence number in this file (first entry) + pub min_sequence: u64, + /// Maximum event sequence number in this file (last entry) + pub max_sequence: u64, + /// Number of valid entries in this file + pub entry_count: u64, +} + +/// Write-Ahead Log for event persistence and crash recovery. +/// +/// Manages a set of append-only log files that store process events with +/// automatic rotation, crash recovery, and cleanup capabilities. +pub struct WriteAheadLog { + /// Directory containing WAL files + wal_dir: PathBuf, + + /// Current event sequence number - monotonically increasing across all events (thread-safe) + current_event_sequence: Arc, + + /// Current file sequence number - identifies which WAL file to write to (thread-safe) + current_file_sequence: Arc, + + /// Currently active WAL file handle and current file size (protected by same mutex for atomic rotation) + file_state: Arc>, + + /// Rotation trigger threshold (configurable, default 80MB) + rotation_threshold: u64, +} + +/// Internal state for the active WAL file, protected by a single mutex for atomic operations. +struct WalFileState { + /// Currently active WAL file handle (never None during normal operation) + file: fs::File, + /// Current file size for rotation tracking + size: u64, + /// Minimum event sequence in current file (0 if empty) + min_sequence: u64, + /// Maximum event sequence in current file (0 if empty) + max_sequence: u64, +} + +impl WriteAheadLog { + /// Default rotation threshold (80MB = 83,886,080 bytes) + const DEFAULT_ROTATION_THRESHOLD: u64 = 80 * 1024 * 1024; + + /// Create or open a Write-Ahead Log at the specified directory. + /// + /// # Arguments + /// + /// * `wal_dir` - Directory path for WAL files + /// + /// # Returns + /// + /// A new `WriteAheadLog` instance initialized and ready for operations + /// + /// # Errors + /// + /// Returns `WalError` if directory creation or file scanning fails + pub async fn new(wal_dir: PathBuf) -> WalResult { + Self::with_rotation_threshold(wal_dir, Self::DEFAULT_ROTATION_THRESHOLD).await + } + + /// Create or open a Write-Ahead Log with a custom rotation threshold. + /// + /// This is primarily useful for testing to avoid creating 80MB files. + /// + /// # Arguments + /// + /// * `wal_dir` - Directory path for WAL files + /// * `rotation_threshold` - File size threshold in bytes that triggers rotation + /// + /// # Returns + /// + /// A new `WriteAheadLog` instance initialized and ready for operations + /// + /// # Errors + /// + /// Returns `WalError` if directory creation or file scanning fails + pub async fn with_rotation_threshold( + wal_dir: PathBuf, + rotation_threshold: u64, + ) -> WalResult { + // Create WAL directory if it doesn't exist + fs::create_dir_all(&wal_dir).await.map_err(WalError::Io)?; + + // Scan for existing WAL files to determine next file sequence number + // and find the highest event sequence for monotonic sequencing across restarts + let (next_file_sequence, highest_event_sequence, current_file_metadata) = + Self::scan_wal_state(&wal_dir).await?; + + debug!( + wal_dir = ?wal_dir, + next_file_sequence = next_file_sequence, + highest_event_sequence = highest_event_sequence, + "Initializing WAL" + ); + + // Open or create initial WAL file + #[allow(clippy::as_conversions)] // Safe: file sequence is always within u32 range + let file_sequence_u32 = next_file_sequence as u32; + let file_path = Self::wal_file_path(&wal_dir, file_sequence_u32); + let file = fs::OpenOptions::new() + .append(true) + .create(true) + .open(&file_path) + .await + .map_err(WalError::Io)?; + + // Get initial file size if resuming + let metadata = fs::metadata(&file_path).await.map_err(WalError::Io)?; + let file_size = metadata.len(); + + // Event sequence continues from highest found + 1 to maintain monotonic sequencing + let initial_event_sequence = highest_event_sequence.saturating_add(1); + + // Use metadata from current file if it exists, otherwise start fresh + let (min_seq, max_seq) = + current_file_metadata.map_or((0, 0), |m| (m.min_sequence, m.max_sequence)); + + let file_state = WalFileState { + file, + size: file_size, + min_sequence: min_seq, + max_sequence: max_seq, + }; + + Ok(Self { + wal_dir, + current_event_sequence: Arc::new(AtomicU64::new(initial_event_sequence)), + current_file_sequence: Arc::new(AtomicU64::new(next_file_sequence)), + file_state: Arc::new(Mutex::new(file_state)), + rotation_threshold, + }) + } + + /// Scan existing WAL files to determine the next file sequence and highest event sequence. + /// + /// Returns (next_file_sequence, highest_event_sequence, current_file_metadata) + async fn scan_wal_state(wal_dir: &Path) -> WalResult<(u64, u64, Option)> { + let mut max_file_sequence = 0_u64; + let mut highest_event_sequence = 0_u64; + let mut current_file_metadata = None; + + match fs::read_dir(wal_dir).await { + Ok(mut dir) => { + let mut files = Vec::new(); + while let Ok(Some(entry)) = dir.next_entry().await { + let path = entry.path(); + if let Some(filename) = path.file_name().and_then(|n| n.to_str()) + && let Some(sequence) = Self::parse_wal_filename(filename) + { + files.push((sequence, path)); + max_file_sequence = max_file_sequence.max(u64::from(sequence)); + } + } + + // Sort files by sequence + files.sort_by_key(|f| f.0); + + // Scan each file to find the highest event sequence + for &(file_seq, ref path) in &files { + match Self::scan_file_metadata(path).await { + Ok(metadata) => { + highest_event_sequence = + highest_event_sequence.max(metadata.max_sequence); + + // Track metadata for the current (highest sequence) file + if u64::from(file_seq) == max_file_sequence { + current_file_metadata = Some(metadata); + } + } + Err(e) => { + warn!( + path = ?path, + error = %e, + "Failed to scan WAL file metadata, skipping file" + ); + } + } + } + } + Err(e) => { + debug!( + wal_dir = ?wal_dir, + error = %e, + "WAL directory not readable or does not exist, starting fresh" + ); + } + } + + // Next file sequence: if we have files, continue with the highest; if empty, start at 1 + let next_file_sequence = if max_file_sequence > 0 { + max_file_sequence + } else { + 1 + }; + + Ok(( + next_file_sequence, + highest_event_sequence, + current_file_metadata, + )) + } + + /// Scan a single WAL file to extract metadata about event sequences. + /// + /// Uses helper functions to reduce nesting depth. + async fn scan_file_metadata(path: &Path) -> WalResult { + let mut file = fs::File::open(path) + .await + .map_err(|e| WalError::Replay(format!("Failed to open file for scanning: {e}")))?; + + let mut metadata = WalFileMetadata::default(); + let mut buffer = vec![0_u8; 4]; + let mut first_entry = true; + + loop { + // Read length prefix - handle EOF + let length = match Self::read_length_prefix(&mut file, &mut buffer).await { + Ok(len) => len, + Err(ReadResult::Eof) => break, + Err(ReadResult::Io(e)) => return Err(WalError::Io(e)), + }; + + // Read entry data - handle EOF/errors + let entry_data = match Self::read_entry_data(&mut file, length).await { + Ok(data) => data, + Err(ReadResult::Eof) => break, + Err(ReadResult::Io(e)) => return Err(WalError::Io(e)), + }; + + // Try to deserialize and verify; update metadata if valid + if let Some(entry) = Self::deserialize_and_verify_entry(&entry_data) { + if first_entry { + metadata.min_sequence = entry.sequence; + first_entry = false; + } + metadata.max_sequence = entry.sequence; + metadata.entry_count = metadata.entry_count.saturating_add(1); + } + } + + Ok(metadata) + } + + /// Generate a WAL file path from a directory and sequence number. + fn wal_file_path(wal_dir: &Path, sequence: u32) -> PathBuf { + wal_dir.join(format!("procmond-{sequence:05}.wal")) + } + + /// Parse sequence number from a WAL filename. + fn parse_wal_filename(filename: &str) -> Option { + // Check for .wal extension case-insensitively using Path + let has_wal_ext = std::path::Path::new(filename) + .extension() + .is_some_and(|ext| ext.eq_ignore_ascii_case("wal")); + + if !has_wal_ext || !filename.starts_with("procmond-") { + return None; + } + + filename + .strip_prefix("procmond-") + .and_then(|s| s.strip_suffix(".wal")) + .and_then(|s| s.parse::().ok()) + } + + /// List all WAL files sorted by sequence number. + async fn list_wal_files(&self) -> WalResult> { + let mut entries = Vec::new(); + + let mut dir = fs::read_dir(&self.wal_dir).await.map_err(WalError::Io)?; + + while let Some(entry) = dir.next_entry().await.map_err(WalError::Io)? { + let path = entry.path(); + if let Some(filename) = path.file_name().and_then(|n| n.to_str()) + && let Some(sequence) = Self::parse_wal_filename(filename) + { + entries.push((sequence, path)); + } + } + + entries.sort_by_key(|entry| entry.0); + Ok(entries) + } + + /// Write an event to the WAL with automatic rotation. + /// + /// Rotation is performed atomically with respect to writers - the new file is opened + /// before the old one is closed, all within the same lock, ensuring writers never + /// observe a missing file handle. + /// + /// # Arguments + /// + /// * `event` - Process event to persist + /// + /// # Returns + /// + /// The sequence number assigned to this event + /// + /// # Errors + /// + /// Returns `WalError` if serialization, file I/O, or rotation fails + #[allow(clippy::significant_drop_tightening)] // Lock is intentionally held throughout for atomic rotation + pub async fn write(&self, event: ProcessEvent) -> WalResult { + use tokio::io::AsyncWriteExt; + + // Get the next event sequence number + let sequence = self.current_event_sequence.fetch_add(1, Ordering::SeqCst); + + // Create WAL entry with automatic checksum + let entry = WalEntry::new(sequence, event); + + // Serialize the entry + let serialized = + postcard::to_allocvec(&entry).map_err(|e| WalError::Serialization(e.to_string()))?; + + // Prepare length prefix (little-endian u32) + #[allow(clippy::as_conversions)] // Safe: serialized len is bounded by frame size + let length = serialized.len() as u32; + let length_bytes = length.to_le_bytes(); + + // Calculate size increment safely + let size_increment = length_bytes.len().saturating_add(serialized.len()); + #[allow(clippy::as_conversions)] // Safe: total size is bounded by max frame size + let size_increment_u64 = size_increment as u64; + + // Write to current file - hold lock for entire operation including rotation + let mut state = self.file_state.lock().await; + + // Write length prefix + state + .file + .write_all(&length_bytes) + .await + .map_err(WalError::Io)?; + + // Write serialized entry + state + .file + .write_all(&serialized) + .await + .map_err(WalError::Io)?; + + // Update file size and sequence tracking + state.size = state.size.saturating_add(size_increment_u64); + + // Track min/max sequences for this file + if state.min_sequence == 0 { + state.min_sequence = sequence; + } + state.max_sequence = sequence; + + debug!( + sequence = sequence, + file_size = state.size, + "WAL entry written" + ); + + // Check if rotation is needed - perform atomically within the same lock + if state.size >= self.rotation_threshold { + self.rotate_file_internal(&mut state).await?; + } + + Ok(sequence) + } + + /// Write an event to the WAL with event type metadata for topic routing. + /// + /// Similar to [`write`], but includes an event type string that can be used + /// during replay to determine the correct topic for republishing. + /// + /// # Arguments + /// + /// * `event` - Process event to persist + /// * `event_type` - Event type string (e.g., "start", "stop", "modify") + /// + /// # Returns + /// + /// The sequence number assigned to this event + /// + /// # Errors + /// + /// Returns `WalError` if serialization, file I/O, or rotation fails + #[allow(clippy::significant_drop_tightening)] // Lock is intentionally held throughout for atomic rotation + pub async fn write_with_type(&self, event: ProcessEvent, event_type: String) -> WalResult { + use tokio::io::AsyncWriteExt; + + // Get the next event sequence number + let sequence = self.current_event_sequence.fetch_add(1, Ordering::SeqCst); + + // Create WAL entry with event type + let entry = WalEntry::with_event_type(sequence, event, event_type); + + // Serialize the entry + let serialized = + postcard::to_allocvec(&entry).map_err(|e| WalError::Serialization(e.to_string()))?; + + // Prepare length prefix (little-endian u32) + #[allow(clippy::as_conversions)] // Safe: serialized len is bounded by frame size + let length = serialized.len() as u32; + let length_bytes = length.to_le_bytes(); + + // Calculate size increment safely + let size_increment = length_bytes.len().saturating_add(serialized.len()); + #[allow(clippy::as_conversions)] // Safe: total size is bounded by max frame size + let size_increment_u64 = size_increment as u64; + + // Write to current file - hold lock for entire operation including rotation + let mut state = self.file_state.lock().await; + + // Write length prefix + state + .file + .write_all(&length_bytes) + .await + .map_err(WalError::Io)?; + + // Write serialized entry + state + .file + .write_all(&serialized) + .await + .map_err(WalError::Io)?; + + // Update file size and sequence tracking + state.size = state.size.saturating_add(size_increment_u64); + + // Track min/max sequences for this file + if state.min_sequence == 0 { + state.min_sequence = sequence; + } + state.max_sequence = sequence; + + debug!( + sequence = sequence, + file_size = state.size, + "WAL entry written with event type" + ); + + // Check if rotation is needed - perform atomically within the same lock + if state.size >= self.rotation_threshold { + self.rotate_file_internal(&mut state).await?; + } + + Ok(sequence) + } + + /// Rotate to the next WAL file (internal implementation holding the lock). + /// + /// This method performs rotation atomically by: + /// 1. Opening the new file first + /// 2. Replacing the file handle in the state + /// 3. The old file handle is dropped when the state is updated + /// + /// Writers never observe a missing file handle because the lock is held throughout. + async fn rotate_file_internal(&self, state: &mut WalFileState) -> WalResult<()> { + debug!("Rotating WAL file"); + + // Increment file sequence and open new file BEFORE closing old one + let previous_sequence = self.current_file_sequence.fetch_add(1, Ordering::SeqCst); + let next_file_sequence = previous_sequence.saturating_add(1); + + #[allow(clippy::as_conversions)] // Safe: file sequence is always within u32 range + let file_sequence_u32 = next_file_sequence as u32; + let file_path = Self::wal_file_path(&self.wal_dir, file_sequence_u32); + + let new_file = fs::OpenOptions::new() + .append(true) + .create(true) + .open(&file_path) + .await + .map_err(|e| WalError::FileRotation(format!("Failed to open new WAL file: {e}")))?; + + // Atomically replace the file handle - old file is closed when dropped + state.file = new_file; + state.size = 0; + state.min_sequence = 0; + state.max_sequence = 0; + + info!(file_sequence = next_file_sequence, "WAL file rotated"); + + Ok(()) + } + + /// Replay WAL events for crash recovery. + /// + /// Reads all WAL files in sequence order and recovers unpublished events. + /// Corrupted entries are skipped with warnings. + /// + /// # Returns + /// + /// A vector of recovered events in chronological order + /// + /// # Errors + /// + /// Returns `WalError` if directory scanning or fundamental I/O fails + pub async fn replay(&self) -> WalResult> { + let mut all_events = Vec::new(); + let files = self.list_wal_files().await?; + let total_files = files.len(); + let mut failed_files = 0_usize; + + debug!(file_count = total_files, "Starting WAL replay"); + + for (_sequence, path) in files { + match self.replay_file(&path).await { + Ok(events) => { + debug!( + file = ?path, + event_count = events.len(), + "Replayed WAL file" + ); + all_events.extend(events); + } + Err(e) => { + warn!(file = ?path, error = %e, "Error replaying WAL file"); + failed_files = failed_files.saturating_add(1); + } + } + } + + if failed_files > 0 { + warn!( + failed_files = failed_files, + total_files = total_files, + "WAL replay completed with errors - some events may not have been recovered" + ); + } + + info!( + total_events = all_events.len(), + failed_files = failed_files, + "WAL replay complete" + ); + + Ok(all_events) + } + + /// Replay a single WAL file. + /// + /// Delegates to [`Self::replay_file_entries`] and extracts just the events. + async fn replay_file(&self, path: &Path) -> WalResult> { + let entries = self.replay_file_entries(path).await?; + Ok(entries.into_iter().map(|entry| entry.event).collect()) + } + + /// Replay WAL entries with full metadata for crash recovery. + /// + /// Similar to [`Self::replay`], but returns complete `WalEntry` objects including + /// sequence numbers and event types. Use this when you need to track which + /// events have been published or need event type information for topic routing. + /// + /// # Returns + /// + /// A vector of recovered WAL entries in chronological order + /// + /// # Errors + /// + /// Returns `WalError` if directory scanning or fundamental I/O fails + pub async fn replay_entries(&self) -> WalResult> { + let mut all_entries = Vec::new(); + let files = self.list_wal_files().await?; + + debug!(file_count = files.len(), "Starting WAL entry replay"); + + for (_sequence, path) in files { + match self.replay_file_entries(&path).await { + Ok(entries) => { + debug!( + file = ?path, + entry_count = entries.len(), + "Replayed WAL file entries" + ); + all_entries.extend(entries); + } + Err(e) => { + warn!("Error replaying WAL file {path:?}: {e}"); + } + } + } + + info!( + total_entries = all_entries.len(), + "WAL entry replay complete" + ); + + Ok(all_entries) + } + + /// Replay a single WAL file returning full entries. + /// + /// Uses early-continue pattern to reduce nesting depth. + async fn replay_file_entries(&self, path: &Path) -> WalResult> { + let mut file = fs::File::open(path) + .await + .map_err(|e| WalError::Replay(format!("Failed to open file: {e}")))?; + + let mut entries = Vec::new(); + let mut buffer = vec![0_u8; 4]; + + loop { + // Read length prefix - handle EOF + let length = match Self::read_length_prefix(&mut file, &mut buffer).await { + Ok(len) => len, + Err(ReadResult::Eof) => break, + Err(ReadResult::Io(e)) => return Err(WalError::Io(e)), + }; + + // Read entry data - handle EOF/errors + let entry_data = match Self::read_entry_data(&mut file, length).await { + Ok(data) => data, + Err(ReadResult::Eof) => { + warn!("Skipping partial WAL entry (truncated data)"); + break; + } + Err(ReadResult::Io(e)) => return Err(WalError::Io(e)), + }; + + // Deserialize and verify entry + if let Some(entry) = Self::deserialize_and_verify_entry(&entry_data) { + entries.push(entry); + } + } + + Ok(entries) + } + + /// Read a 4-byte length prefix from the file. + async fn read_length_prefix( + file: &mut fs::File, + buffer: &mut [u8], + ) -> Result { + match file.read_exact(buffer).await { + Ok(_) => { + #[allow(clippy::indexing_slicing)] // Safe: buffer is exactly 4 bytes + let length_bytes: [u8; 4] = [buffer[0], buffer[1], buffer[2], buffer[3]]; + #[allow(clippy::as_conversions)] // Safe: u32 length fits in usize + Ok(u32::from_le_bytes(length_bytes) as usize) + } + Err(e) if e.kind() == std::io::ErrorKind::UnexpectedEof => Err(ReadResult::Eof), + Err(e) => Err(ReadResult::Io(e)), + } + } + + /// Read entry data of the specified length from the file. + async fn read_entry_data(file: &mut fs::File, length: usize) -> Result, ReadResult> { + let mut entry_data = vec![0_u8; length]; + match file.read_exact(&mut entry_data).await { + Ok(_) => Ok(entry_data), + Err(e) if e.kind() == std::io::ErrorKind::UnexpectedEof => Err(ReadResult::Eof), + Err(e) => Err(ReadResult::Io(e)), + } + } + + /// Deserialize entry data and verify checksum. + /// + /// Returns `Some(entry)` if valid, `None` if corrupted (with warning logged). + fn deserialize_and_verify_entry(entry_data: &[u8]) -> Option { + match postcard::from_bytes::(entry_data) { + Ok(entry) => { + if entry.verify() { + Some(entry) + } else { + warn!( + sequence = entry.sequence, + "Skipping corrupted WAL entry (checksum mismatch)" + ); + None + } + } + Err(e) => { + warn!( + error = %e, + "Skipping corrupted WAL entry (deserialization failed)" + ); + None + } + } + } + + /// Mark events as published up to a given sequence number. + /// + /// Scans each WAL file to determine its max event sequence and deletes only + /// files where all events have been published (max_sequence <= up_to_sequence). + /// The currently active file is never deleted, even if all its events are published. + /// + /// # Arguments + /// + /// * `up_to_sequence` - Event sequence number up to which events are marked published + /// + /// # Returns + /// + /// Ok if all eligible files are deleted successfully + /// + /// # Errors + /// + /// Returns `WalError` if file scanning or deletion fails + pub async fn mark_published(&self, up_to_sequence: u64) -> WalResult<()> { + // Get current file sequence to avoid deleting the active file + let current_file_seq = self.current_file_sequence.load(Ordering::SeqCst); + + let files = self.list_wal_files().await?; + + for (file_sequence, path) in files { + // Never delete the current active file + if u64::from(file_sequence) == current_file_seq { + debug!(file_sequence = file_sequence, "Skipping active WAL file"); + continue; + } + + // Scan the file to get its event sequence range + match Self::scan_file_metadata(&path).await { + Ok(metadata) => { + // Only delete if ALL events in this file are published + // (i.e., max_sequence <= up_to_sequence) + if metadata.max_sequence > 0 && metadata.max_sequence <= up_to_sequence { + self.delete_wal_file(file_sequence).await?; + debug!( + file_sequence = file_sequence, + max_event_sequence = metadata.max_sequence, + up_to_sequence = up_to_sequence, + "Deleted published WAL file" + ); + } else if metadata.max_sequence > up_to_sequence { + debug!( + file_sequence = file_sequence, + max_event_sequence = metadata.max_sequence, + up_to_sequence = up_to_sequence, + "Keeping WAL file with unpublished events" + ); + } else { + // Empty file (max_sequence == 0), safe to delete + self.delete_wal_file(file_sequence).await?; + debug!(file_sequence = file_sequence, "Deleted empty WAL file"); + } + } + Err(e) => { + warn!( + file_sequence = file_sequence, + error = %e, + "Failed to scan WAL file metadata, skipping" + ); + } + } + } + + Ok(()) + } + + /// Delete a specific WAL file. + async fn delete_wal_file(&self, sequence: u32) -> WalResult<()> { + let path = Self::wal_file_path(&self.wal_dir, sequence); + + if path.exists() { + fs::remove_file(&path).await.map_err(WalError::Io)?; + } + + Ok(()) + } +} + +#[cfg(test)] +#[allow( + clippy::expect_used, + clippy::unwrap_used, + clippy::panic, + clippy::indexing_slicing, + clippy::str_to_string, + clippy::arithmetic_side_effects, + clippy::redundant_closure_for_method_calls, + clippy::cast_lossless, + clippy::as_conversions, + clippy::let_underscore_must_use, + clippy::uninlined_format_args, + clippy::len_zero, + clippy::semicolon_outside_block +)] +mod tests { + use super::*; + use collector_core::event::ProcessEvent; + use std::time::SystemTime; + use tempfile::TempDir; + use tokio::io::AsyncWriteExt; + + /// Create a test process event with specified PID + fn create_test_event(pid: u32) -> ProcessEvent { + ProcessEvent { + pid, + ppid: None, + name: format!("test_process_{pid}"), + executable_path: None, + command_line: Vec::new(), + start_time: None, + cpu_usage: None, + memory_usage: None, + executable_hash: None, + user_id: None, + accessible: true, + file_exists: true, + timestamp: SystemTime::now(), + platform_metadata: None, + } + } + + #[tokio::test] + async fn test_wal_creation() { + let temp_dir = TempDir::new().expect("Failed to create temp dir"); + let wal = WriteAheadLog::new(temp_dir.path().to_path_buf()) + .await + .expect("Failed to create WAL"); + + assert_eq!(wal.current_event_sequence.load(Ordering::SeqCst), 1); + assert_eq!(wal.current_file_sequence.load(Ordering::SeqCst), 1); + } + + #[tokio::test] + async fn test_write_single_event() { + let temp_dir = TempDir::new().expect("Failed to create temp dir"); + let wal = WriteAheadLog::new(temp_dir.path().to_path_buf()) + .await + .expect("Failed to create WAL"); + + let event = create_test_event(1234); + let sequence = wal + .write(event.clone()) + .await + .expect("Failed to write event"); + + assert_eq!(sequence, 1); + + let events = wal.replay().await.expect("Failed to replay"); + assert_eq!(events.len(), 1); + assert_eq!(events[0].pid, 1234); + } + + #[tokio::test] + async fn test_write_multiple_events() { + let temp_dir = TempDir::new().expect("Failed to create temp dir"); + let wal = WriteAheadLog::new(temp_dir.path().to_path_buf()) + .await + .expect("Failed to create WAL"); + + for i in 1..=5 { + let event = create_test_event(1000 + i); + let seq = wal.write(event).await.expect("Failed to write event"); + assert_eq!(seq, i as u64); + } + + let events = wal.replay().await.expect("Failed to replay"); + assert_eq!(events.len(), 5); + + for (i, event) in events.iter().enumerate() { + assert_eq!(event.pid, 1000 + (i as u32) + 1); + } + } + + #[tokio::test] + async fn test_sequence_numbering() { + let temp_dir = TempDir::new().expect("Failed to create temp dir"); + let wal = WriteAheadLog::new(temp_dir.path().to_path_buf()) + .await + .expect("Failed to create WAL"); + + let mut sequences = Vec::new(); + for i in 0..10 { + let event = create_test_event(2000 + i); + let seq = wal.write(event).await.expect("Failed to write event"); + sequences.push(seq); + } + + // Verify monotonic increase + for i in 1..sequences.len() { + assert!(sequences[i] > sequences[i - 1]); + } + } + + #[tokio::test] + async fn test_replay_empty_wal() { + let temp_dir = TempDir::new().expect("Failed to create temp dir"); + let wal = WriteAheadLog::new(temp_dir.path().to_path_buf()) + .await + .expect("Failed to create WAL"); + + let events = wal.replay().await.expect("Failed to replay"); + assert_eq!(events.len(), 0); + } + + #[tokio::test] + async fn test_replay_single_file() { + let temp_dir = TempDir::new().expect("Failed to create temp dir"); + let wal = WriteAheadLog::new(temp_dir.path().to_path_buf()) + .await + .expect("Failed to create WAL"); + + let event1 = create_test_event(3001); + let event2 = create_test_event(3002); + + wal.write(event1).await.expect("Failed to write event"); + wal.write(event2).await.expect("Failed to write event"); + + let events = wal.replay().await.expect("Failed to replay"); + assert_eq!(events.len(), 2); + assert_eq!(events[0].pid, 3001); + assert_eq!(events[1].pid, 3002); + } + + #[tokio::test] + async fn test_mark_published_honors_sequence_cutoff() { + let temp_dir = TempDir::new().expect("Failed to create temp dir"); + + // Use a very small rotation threshold to force multiple files + let wal = WriteAheadLog::with_rotation_threshold(temp_dir.path().to_path_buf(), 200) + .await + .expect("Failed to create WAL"); + + // Write events that will span multiple files due to small rotation threshold + let mut last_seq = 0; + for i in 1..=20 { + let event = create_test_event(4000 + i); + last_seq = wal.write(event).await.expect("Failed to write event"); + } + + // Should have multiple files now + let files_before = wal.list_wal_files().await.expect("Failed to list files"); + assert!( + files_before.len() > 1, + "Expected multiple files, got {}", + files_before.len() + ); + + // Mark only some events as published (e.g., up to sequence 5) + // This should NOT delete files containing events beyond sequence 5 + wal.mark_published(5) + .await + .expect("Failed to mark published"); + + // Replay should still return events beyond sequence 5 + let events = wal.replay().await.expect("Failed to replay"); + + // Should have events remaining (those with sequence > 5) + assert!( + !events.is_empty(), + "Expected some events to remain after partial publish" + ); + + // The total event count should be last_seq (all events we wrote) + // After mark_published(5), events with seq 1-5 may be deleted if their file + // only contains events <= 5 + assert!( + events.len() >= (last_seq.saturating_sub(5)) as usize, + "Expected at least {} events, got {}", + last_seq.saturating_sub(5), + events.len() + ); + } + + #[tokio::test] + async fn test_mark_published_preserves_unpublished_events() { + let temp_dir = TempDir::new().expect("Failed to create temp dir"); + + // Use small rotation threshold + let wal = WriteAheadLog::with_rotation_threshold(temp_dir.path().to_path_buf(), 150) + .await + .expect("Failed to create WAL"); + + // Write events + for i in 1..=15 { + let event = create_test_event(9000 + i); + wal.write(event).await.expect("Failed to write event"); + } + + // Mark published with sequence 0 (nothing published) + wal.mark_published(0) + .await + .expect("Failed to mark published"); + + // All events should still be recoverable + let events = wal.replay().await.expect("Failed to replay"); + assert_eq!(events.len(), 15, "All events should be preserved"); + } + + #[tokio::test] + async fn test_mark_published_preserves_current() { + let temp_dir = TempDir::new().expect("Failed to create temp dir"); + let wal = WriteAheadLog::new(temp_dir.path().to_path_buf()) + .await + .expect("Failed to create WAL"); + + let event = create_test_event(5001); + let _seq = wal.write(event).await.expect("Failed to write event"); + + // Check initial state - should have one active file + let files_before = wal.list_wal_files().await.expect("Failed to list files"); + assert_eq!(files_before.len(), 1); + let current_file_seq = wal.current_file_sequence.load(Ordering::SeqCst); + assert_eq!(files_before[0].0, current_file_seq as u32); + + // Mark published - should not delete active file even if all events are "published" + wal.mark_published(999) + .await + .expect("Failed to mark published"); + + let files_after = wal.list_wal_files().await.expect("Failed to list files"); + // Current file should still exist + assert_eq!(files_after.len(), 1); + } + + #[tokio::test] + async fn test_wal_entry_checksum() { + let event = create_test_event(6001); + let entry = WalEntry::new(1, event); + + assert!(entry.verify()); + } + + #[tokio::test] + async fn test_concurrent_writes() { + let temp_dir = TempDir::new().expect("Failed to create temp dir"); + let wal = Arc::new( + WriteAheadLog::new(temp_dir.path().to_path_buf()) + .await + .expect("Failed to create WAL"), + ); + + let mut handles = Vec::new(); + + for i in 0..10 { + let wal_clone = Arc::clone(&wal); + let handle = tokio::spawn(async move { + let event = create_test_event(7000 + i); + wal_clone.write(event).await.expect("Failed to write") + }); + handles.push(handle); + } + + for handle in handles { + let _ = handle.await; + } + + let events = wal.replay().await.expect("Failed to replay"); + assert_eq!(events.len(), 10); + } + + #[tokio::test] + async fn test_replay_multiple_files() { + let temp_dir = TempDir::new().expect("Failed to create temp dir"); + let wal = WriteAheadLog::new(temp_dir.path().to_path_buf()) + .await + .expect("Failed to create WAL"); + + // Write enough data to trigger rotation if needed + for i in 0..20 { + let event = create_test_event(8000 + i); + wal.write(event).await.expect("Failed to write"); + } + + let events = wal.replay().await.expect("Failed to replay"); + assert_eq!(events.len(), 20); + } + + // ==================== Rotation Tests ==================== + + #[tokio::test] + async fn test_rotation_with_low_threshold() { + let temp_dir = TempDir::new().expect("Failed to create temp dir"); + + // Use a very small rotation threshold (100 bytes) to trigger rotation quickly + let wal = WriteAheadLog::with_rotation_threshold(temp_dir.path().to_path_buf(), 100) + .await + .expect("Failed to create WAL"); + + // Write multiple events - each event is ~100+ bytes serialized + let mut sequences = Vec::new(); + for i in 1..=10 { + let event = create_test_event(10_000 + i); + let seq = wal.write(event).await.expect("Failed to write event"); + sequences.push(seq); + } + + // Verify we have multiple files due to rotation + let files = wal.list_wal_files().await.expect("Failed to list files"); + assert!( + files.len() > 1, + "Expected rotation to create multiple files, got {} file(s)", + files.len() + ); + + // Verify sequence numbers are monotonically increasing + for i in 1..sequences.len() { + assert!( + sequences[i] > sequences[i - 1], + "Sequences should be monotonically increasing: {} should be > {}", + sequences[i], + sequences[i - 1] + ); + } + + // Verify all events can be replayed correctly + let events = wal.replay().await.expect("Failed to replay"); + assert_eq!(events.len(), 10, "All events should be recoverable"); + + // Verify PIDs are correct + for (i, event) in events.iter().enumerate() { + assert_eq!(event.pid, 10_001 + i as u32, "Event {} has wrong PID", i); + } + } + + #[tokio::test] + async fn test_rotation_sequence_continuity_across_files() { + let temp_dir = TempDir::new().expect("Failed to create temp dir"); + + // Very small threshold to force multiple rotations + let wal = WriteAheadLog::with_rotation_threshold(temp_dir.path().to_path_buf(), 50) + .await + .expect("Failed to create WAL"); + + let mut all_sequences = Vec::new(); + for i in 1..=15 { + let event = create_test_event(11_000 + i); + let seq = wal.write(event).await.expect("Failed to write event"); + all_sequences.push(seq); + } + + // Verify strict monotonic increase (no gaps, no duplicates) + for i in 1..all_sequences.len() { + assert_eq!( + all_sequences[i], + all_sequences[i - 1] + 1, + "Sequence numbers should be consecutive: expected {}, got {}", + all_sequences[i - 1] + 1, + all_sequences[i] + ); + } + + // Should have rotated multiple times + let files = wal.list_wal_files().await.expect("Failed to list files"); + assert!(files.len() > 2, "Expected multiple rotations"); + } + + #[tokio::test] + async fn test_concurrent_writes_during_rotation() { + let temp_dir = TempDir::new().expect("Failed to create temp dir"); + + // Small threshold to increase chance of concurrent rotation + let wal = Arc::new( + WriteAheadLog::with_rotation_threshold(temp_dir.path().to_path_buf(), 150) + .await + .expect("Failed to create WAL"), + ); + + let mut handles = Vec::new(); + + // Spawn many concurrent writers + for i in 0..50 { + let wal_clone = Arc::clone(&wal); + let handle = tokio::spawn(async move { + let event = create_test_event(12_000 + i); + wal_clone.write(event).await + }); + handles.push(handle); + } + + // All writes should succeed (no errors from rotation race) + for handle in handles { + let result = handle.await.expect("Task panicked"); + assert!(result.is_ok(), "Write failed: {:?}", result.err()); + } + + // All events should be recoverable + let events = wal.replay().await.expect("Failed to replay"); + assert_eq!( + events.len(), + 50, + "All concurrent writes should be preserved" + ); + } + + // ==================== Sequence Persistence Across Restarts ==================== + + #[tokio::test] + async fn test_sequence_continuity_across_restart() { + let temp_dir = TempDir::new().expect("Failed to create temp dir"); + let wal_path = temp_dir.path().to_path_buf(); + + // First session: write some events + let last_sequence = { + let wal = WriteAheadLog::new(wal_path.clone()) + .await + .expect("Failed to create WAL"); + + let mut last_seq = 0; + for i in 1..=5 { + let event = create_test_event(13_000 + i); + last_seq = wal.write(event).await.expect("Failed to write event"); + } + last_seq + }; // WAL dropped here, simulating process exit + + // Second session: create new WAL instance (simulating restart) + let wal = WriteAheadLog::new(wal_path) + .await + .expect("Failed to reopen WAL"); + + // Write more events + let new_sequence = { + let event = create_test_event(13_100); + wal.write(event).await.expect("Failed to write event") + }; + + // New sequence should continue from where we left off + assert!( + new_sequence > last_sequence, + "New sequence ({}) should be > last sequence ({})", + new_sequence, + last_sequence + ); + + // All events should be recoverable + let events = wal.replay().await.expect("Failed to replay"); + assert_eq!( + events.len(), + 6, + "All events from both sessions should be recoverable" + ); + } + + #[tokio::test] + async fn test_sequence_continuity_with_multiple_files_across_restart() { + let temp_dir = TempDir::new().expect("Failed to create temp dir"); + let wal_path = temp_dir.path().to_path_buf(); + + // First session: write events with rotation + let (last_sequence, file_count_before) = { + let wal = WriteAheadLog::with_rotation_threshold(wal_path.clone(), 100) + .await + .expect("Failed to create WAL"); + + let mut last_seq = 0; + for i in 1..=10 { + let event = create_test_event(14_000 + i); + last_seq = wal.write(event).await.expect("Failed to write event"); + } + + let files = wal.list_wal_files().await.expect("Failed to list files"); + (last_seq, files.len()) + }; + + assert!( + file_count_before > 1, + "Should have rotated during first session" + ); + + // Second session: continue writing + let wal = WriteAheadLog::with_rotation_threshold(wal_path, 100) + .await + .expect("Failed to reopen WAL"); + + let new_sequence = { + let event = create_test_event(14_100); + wal.write(event).await.expect("Failed to write event") + }; + + // Sequence should strictly continue + assert!( + new_sequence > last_sequence, + "Sequence should continue: {} should be > {}", + new_sequence, + last_sequence + ); + + // All events recoverable + let events = wal.replay().await.expect("Failed to replay"); + assert_eq!(events.len(), 11); + } + + // ==================== Corruption Recovery Tests ==================== + + #[tokio::test] + async fn test_replay_skips_corrupted_checksum() { + let temp_dir = TempDir::new().expect("Failed to create temp dir"); + let wal_path = temp_dir.path().to_path_buf(); + + // Write some valid events + { + let wal = WriteAheadLog::new(wal_path.clone()) + .await + .expect("Failed to create WAL"); + + for i in 1..=3 { + let event = create_test_event(15_000 + i); + wal.write(event).await.expect("Failed to write event"); + } + } + + // Now corrupt the middle entry by modifying bytes in the WAL file + let wal_file_path = wal_path.join("procmond-00001.wal"); + let mut contents = tokio::fs::read(&wal_file_path) + .await + .expect("Failed to read WAL file"); + + // Corrupt some bytes in the middle of the file (after first entry) + // This will cause checksum mismatch for the corrupted entry + if contents.len() > 100 { + contents[80] ^= 0xFF; // Flip bits + contents[81] ^= 0xFF; + } + + tokio::fs::write(&wal_file_path, &contents) + .await + .expect("Failed to write corrupted file"); + + // Replay should skip the corrupted entry and continue + let wal = WriteAheadLog::new(wal_path) + .await + .expect("Failed to reopen WAL"); + + let events = wal.replay().await.expect("Replay should handle corruption"); + + // We wrote 3 events, one was corrupted, so we should have at least 1-2 valid events + // (depending on which entry was corrupted) + assert!( + events.len() >= 1, + "Should recover at least some events after corruption" + ); + assert!( + events.len() <= 3, + "Should not have more events than written" + ); + } + + #[tokio::test] + async fn test_replay_skips_deserialization_error() { + let temp_dir = TempDir::new().expect("Failed to create temp dir"); + let wal_path = temp_dir.path().to_path_buf(); + + // Write a valid event first + { + let wal = WriteAheadLog::new(wal_path.clone()) + .await + .expect("Failed to create WAL"); + + let event = create_test_event(16_001); + wal.write(event).await.expect("Failed to write event"); + } + + // Append garbage data that looks like a valid length but contains invalid postcard + let wal_file_path = wal_path.join("procmond-00001.wal"); + let mut file = tokio::fs::OpenOptions::new() + .append(true) + .open(&wal_file_path) + .await + .expect("Failed to open WAL file"); + + // Write a length prefix followed by garbage (invalid postcard) + let garbage_len: u32 = 50; + file.write_all(&garbage_len.to_le_bytes()) + .await + .expect("Failed to write length"); + file.write_all(&[0xDE; 50]) + .await + .expect("Failed to write garbage"); + + // Write another valid event after the garbage + { + let wal = WriteAheadLog::new(wal_path.clone()) + .await + .expect("Failed to reopen WAL"); + + let event = create_test_event(16_002); + wal.write(event).await.expect("Failed to write event"); + } + + // Replay should skip the garbage entry + let wal = WriteAheadLog::new(wal_path) + .await + .expect("Failed to reopen WAL"); + + let events = wal + .replay() + .await + .expect("Replay should handle invalid entries"); + + // Should have at least the first valid event + assert!( + !events.is_empty(), + "Should recover valid events despite garbage entry" + ); + } + + #[tokio::test] + async fn test_replay_handles_truncated_entry() { + let temp_dir = TempDir::new().expect("Failed to create temp dir"); + let wal_path = temp_dir.path().to_path_buf(); + + // Write some valid events + { + let wal = WriteAheadLog::new(wal_path.clone()) + .await + .expect("Failed to create WAL"); + + for i in 1..=3 { + let event = create_test_event(17_000 + i); + wal.write(event).await.expect("Failed to write event"); + } + } + + // Truncate the file to simulate a partial write (crash during write) + let wal_file_path = wal_path.join("procmond-00001.wal"); + let metadata = tokio::fs::metadata(&wal_file_path) + .await + .expect("Failed to get metadata"); + + // Truncate to remove part of the last entry + let truncated_size = metadata.len().saturating_sub(20); + let file = tokio::fs::OpenOptions::new() + .write(true) + .open(&wal_file_path) + .await + .expect("Failed to open WAL file"); + file.set_len(truncated_size) + .await + .expect("Failed to truncate"); + + // Replay should recover events before the truncation + let wal = WriteAheadLog::new(wal_path) + .await + .expect("Failed to reopen WAL"); + + let events = wal.replay().await.expect("Replay should handle truncation"); + + // Should have recovered at least the first 2 complete events + assert!( + events.len() >= 2, + "Should recover complete events before truncation, got {}", + events.len() + ); + } + + #[tokio::test] + async fn test_replay_handles_truncated_length_prefix() { + let temp_dir = TempDir::new().expect("Failed to create temp dir"); + let wal_path = temp_dir.path().to_path_buf(); + + // Write a valid event + { + let wal = WriteAheadLog::new(wal_path.clone()) + .await + .expect("Failed to create WAL"); + + let event = create_test_event(18_001); + wal.write(event).await.expect("Failed to write event"); + } + + // Append a partial length prefix (only 2 bytes instead of 4) + let wal_file_path = wal_path.join("procmond-00001.wal"); + let mut file = tokio::fs::OpenOptions::new() + .append(true) + .open(&wal_file_path) + .await + .expect("Failed to open WAL file"); + + file.write_all(&[0x10, 0x00]) // Incomplete length prefix + .await + .expect("Failed to write partial length"); + + // Replay should recover the valid event and stop at truncated prefix + let wal = WriteAheadLog::new(wal_path) + .await + .expect("Failed to reopen WAL"); + + let events = wal + .replay() + .await + .expect("Replay should handle partial prefix"); + + assert_eq!(events.len(), 1, "Should recover the one valid event"); + assert_eq!(events[0].pid, 18_001); + } + + #[tokio::test] + async fn test_replay_continues_after_corrupted_file() { + let temp_dir = TempDir::new().expect("Failed to create temp dir"); + let wal_path = temp_dir.path().to_path_buf(); + + // Create multiple WAL files with low rotation threshold + { + let wal = WriteAheadLog::with_rotation_threshold(wal_path.clone(), 100) + .await + .expect("Failed to create WAL"); + + for i in 1..=15 { + let event = create_test_event(19_000 + i); + wal.write(event).await.expect("Failed to write event"); + } + } + + // Find and completely corrupt one of the middle files + let files: Vec<_> = std::fs::read_dir(&wal_path) + .expect("Failed to read dir") + .filter_map(|e| e.ok()) + .filter(|e| e.path().extension().is_some_and(|ext| ext == "wal")) + .collect(); + + if files.len() > 2 { + // Corrupt a middle file completely + let middle_file = &files[1]; + std::fs::write(middle_file.path(), b"completely invalid data") + .expect("Failed to corrupt file"); + } + + // Replay should continue despite the corrupted file + let wal = WriteAheadLog::new(wal_path) + .await + .expect("Failed to reopen WAL"); + + let events = wal + .replay() + .await + .expect("Replay should handle corrupted file"); + + // Should recover events from non-corrupted files + assert!(!events.is_empty(), "Should recover events from valid files"); + } + + // ==================== File Metadata Scanning Tests ==================== + + #[tokio::test] + async fn test_scan_file_metadata() { + let temp_dir = TempDir::new().expect("Failed to create temp dir"); + let wal_path = temp_dir.path().to_path_buf(); + + // Write events with known sequences + { + let wal = WriteAheadLog::new(wal_path.clone()) + .await + .expect("Failed to create WAL"); + + for i in 1..=5 { + let event = create_test_event(20_000 + i); + wal.write(event).await.expect("Failed to write event"); + } + } + + // Scan the file metadata + let wal_file_path = wal_path.join("procmond-00001.wal"); + let metadata = WriteAheadLog::scan_file_metadata(&wal_file_path) + .await + .expect("Failed to scan metadata"); + + assert_eq!(metadata.min_sequence, 1, "Min sequence should be 1"); + assert_eq!(metadata.max_sequence, 5, "Max sequence should be 5"); + assert_eq!(metadata.entry_count, 5, "Should have 5 entries"); + } +} diff --git a/procmond/src/windows_collector.rs b/procmond/src/windows_collector.rs index d2b3dfa..b91b3ae 100644 --- a/procmond/src/windows_collector.rs +++ b/procmond/src/windows_collector.rs @@ -37,6 +37,7 @@ use sysinfo::{Pid, System}; /// Windows-specific errors that can occur during process collection. #[derive(Debug, Error)] +#[non_exhaustive] pub enum WindowsCollectionError { /// Windows API error #[error("Windows API error: {0}")] @@ -1282,7 +1283,8 @@ impl ProcessCollector for WindowsProcessCollector { } stats.total_processes = processed_count; - stats.collection_duration_ms = start_time.elapsed().as_millis() as u64; + stats.collection_duration_ms = + u64::try_from(start_time.elapsed().as_millis()).unwrap_or(u64::MAX); debug!( collector = self.name(), diff --git a/procmond/tests/cross_platform_integration_tests.rs b/procmond/tests/cross_platform_integration_tests.rs index a3ab33b..790a71c 100644 --- a/procmond/tests/cross_platform_integration_tests.rs +++ b/procmond/tests/cross_platform_integration_tests.rs @@ -1,9 +1,36 @@ -//! Cross-platform integration tests for all ProcessCollector implementations. +//! Cross-platform integration tests for all `ProcessCollector` implementations. //! -//! This test suite verifies that all ProcessCollector implementations work correctly +//! This test suite verifies that all `ProcessCollector` implementations work correctly //! across different platforms and configurations, including privilege escalation/dropping //! tests and compatibility tests for different OS versions. +#![allow( + clippy::doc_markdown, + clippy::expect_used, + clippy::unwrap_used, + clippy::unseparated_literal_suffix, + clippy::unreadable_literal, + clippy::shadow_reuse, + clippy::shadow_unrelated, + clippy::print_stdout, + clippy::uninlined_format_args, + clippy::use_debug, + clippy::match_same_arms, + clippy::wildcard_enum_match_arm, + clippy::panic, + clippy::arithmetic_side_effects, + clippy::non_ascii_literal, + clippy::unused_async, + clippy::missing_const_for_fn, + clippy::map_unwrap_or, + clippy::needless_pass_by_value, + clippy::needless_collect, + clippy::clone_on_ref_ptr, + clippy::as_conversions, + clippy::redundant_clone, + clippy::str_to_string +)] + use procmond::process_collector::{ FallbackProcessCollector, ProcessCollectionConfig, ProcessCollectionError, ProcessCollector, SysinfoProcessCollector, diff --git a/procmond/tests/integration_tests.rs b/procmond/tests/integration_tests.rs index 428f4d7..359c64b 100644 --- a/procmond/tests/integration_tests.rs +++ b/procmond/tests/integration_tests.rs @@ -1,8 +1,35 @@ -//! Integration tests for ProcessEventSource with collector-core runtime. +//! Integration tests for `ProcessEventSource` with collector-core runtime. //! -//! These tests verify that the ProcessEventSource properly integrates with the +//! These tests verify that the `ProcessEventSource` properly integrates with the //! collector-core framework and behaves correctly in realistic scenarios. +#![allow( + clippy::doc_markdown, + clippy::expect_used, + clippy::unwrap_used, + clippy::str_to_string, + clippy::uninlined_format_args, + clippy::print_stdout, + clippy::map_unwrap_or, + clippy::non_ascii_literal, + clippy::use_debug, + clippy::shadow_reuse, + clippy::shadow_unrelated, + clippy::needless_pass_by_value, + clippy::redundant_clone, + clippy::as_conversions, + clippy::arithmetic_side_effects, + clippy::panic, + clippy::option_if_let_else, + clippy::wildcard_enum_match_arm, + clippy::missing_const_for_fn, + clippy::match_wild_err_arm, + clippy::single_match_else, + clippy::clone_on_ref_ptr, + clippy::let_underscore_must_use, + clippy::ignored_unit_patterns +)] + use collector_core::{CollectionEvent, Collector, CollectorConfig, EventSource, SourceCaps}; use daemoneye_lib::storage::DatabaseManager; use procmond::{ProcessEventSource, ProcessSourceConfig}; diff --git a/procmond/tests/lifecycle_integration_tests.rs b/procmond/tests/lifecycle_integration_tests.rs index 2015904..e628a99 100644 --- a/procmond/tests/lifecycle_integration_tests.rs +++ b/procmond/tests/lifecycle_integration_tests.rs @@ -1,7 +1,24 @@ -//! Integration tests for process lifecycle tracking with ProcessEventSource. +//! Integration tests for process lifecycle tracking with `ProcessEventSource`. //! -//! These tests verify that the ProcessLifecycleTracker properly integrates with -//! the ProcessEventSource and can detect lifecycle events in realistic scenarios. +//! These tests verify that the `ProcessLifecycleTracker` properly integrates with +//! the `ProcessEventSource` and can detect lifecycle events in realistic scenarios. + +#![allow( + clippy::doc_markdown, + clippy::expect_used, + clippy::unwrap_used, + clippy::str_to_string, + clippy::arithmetic_side_effects, + clippy::needless_pass_by_value, + clippy::redundant_closure_for_method_calls, + clippy::inefficient_to_string, + clippy::shadow_unrelated, + clippy::wildcard_enum_match_arm, + clippy::pattern_type_mismatch, + clippy::indexing_slicing, + clippy::panic, + clippy::needless_collect +)] use collector_core::ProcessEvent; use procmond::lifecycle::{ diff --git a/procmond/tests/linux_integration_tests.rs b/procmond/tests/linux_integration_tests.rs index b62a868..8fdbef5 100644 --- a/procmond/tests/linux_integration_tests.rs +++ b/procmond/tests/linux_integration_tests.rs @@ -1,4 +1,4 @@ -//! Linux-specific integration tests for the LinuxProcessCollector. +//! Linux-specific integration tests for the `LinuxProcessCollector`. //! //! These tests verify the Linux-specific functionality including /proc filesystem //! access, capability detection, namespace handling, and container detection. @@ -11,6 +11,38 @@ //! - Performance and concurrency validation #![cfg(target_os = "linux")] +#![allow( + clippy::doc_markdown, + clippy::expect_used, + clippy::unwrap_used, + clippy::str_to_string, + clippy::uninlined_format_args, + clippy::print_stdout, + clippy::map_unwrap_or, + clippy::non_ascii_literal, + clippy::use_debug, + clippy::shadow_reuse, + clippy::shadow_unrelated, + clippy::needless_pass_by_value, + clippy::redundant_clone, + clippy::as_conversions, + clippy::arithmetic_side_effects, + clippy::panic, + clippy::option_if_let_else, + clippy::wildcard_enum_match_arm, + clippy::missing_const_for_fn, + clippy::match_wild_err_arm, + clippy::single_match_else, + clippy::clone_on_ref_ptr, + clippy::let_underscore_must_use, + clippy::ignored_unit_patterns, + clippy::unreadable_literal, + clippy::separated_literal_suffix, + clippy::panic_in_result_fn, + clippy::match_same_arms, + clippy::unseparated_literal_suffix, + clippy::pattern_type_mismatch +)] use procmond::linux_collector::{LinuxCollectorConfig, LinuxProcessCollector}; use procmond::process_collector::{ diff --git a/procmond/tests/macos_enhanced_integration_tests.rs b/procmond/tests/macos_enhanced_integration_tests.rs index 9607433..d948724 100644 --- a/procmond/tests/macos_enhanced_integration_tests.rs +++ b/procmond/tests/macos_enhanced_integration_tests.rs @@ -5,6 +5,26 @@ //! libc calls. Tests cover Security framework integration, entitlements detection, //! code signing, bundle information, and SIP awareness. +#![allow( + clippy::doc_markdown, + clippy::expect_used, + clippy::unwrap_used, + clippy::unseparated_literal_suffix, + clippy::unreadable_literal, + clippy::shadow_reuse, + clippy::shadow_unrelated, + clippy::arithmetic_side_effects, + clippy::print_stdout, + clippy::uninlined_format_args, + clippy::use_debug, + clippy::match_same_arms, + clippy::wildcard_enum_match_arm, + clippy::panic, + clippy::non_ascii_literal, + clippy::unused_async, + clippy::as_conversions +)] + #[cfg(target_os = "macos")] mod macos_enhanced_tests { use procmond::macos_collector::{EnhancedMacOSCollector, MacOSCollectorConfig}; diff --git a/procmond/tests/macos_integration_tests.rs b/procmond/tests/macos_integration_tests.rs index 11c7136..841d03a 100644 --- a/procmond/tests/macos_integration_tests.rs +++ b/procmond/tests/macos_integration_tests.rs @@ -4,6 +4,29 @@ //! process collector functionality, including libproc and sysctl API usage, //! entitlements detection, SIP awareness, and sandboxed process handling. +#![allow( + clippy::doc_markdown, + clippy::expect_used, + clippy::unwrap_used, + clippy::unseparated_literal_suffix, + clippy::unreadable_literal, + clippy::shadow_reuse, + clippy::shadow_unrelated, + clippy::print_stdout, + clippy::uninlined_format_args, + clippy::use_debug, + clippy::match_same_arms, + clippy::wildcard_enum_match_arm, + clippy::panic, + clippy::arithmetic_side_effects, + clippy::non_ascii_literal, + clippy::unused_async, + clippy::missing_const_for_fn, + clippy::map_unwrap_or, + clippy::needless_pass_by_value, + clippy::as_conversions +)] + #[cfg(target_os = "macos")] mod macos_tests { use procmond::macos_collector::{EnhancedMacOSCollector, MacOSCollectorConfig}; diff --git a/procmond/tests/os_compatibility_comprehensive_tests.rs b/procmond/tests/os_compatibility_comprehensive_tests.rs index d180dc3..488ac93 100644 --- a/procmond/tests/os_compatibility_comprehensive_tests.rs +++ b/procmond/tests/os_compatibility_comprehensive_tests.rs @@ -1,8 +1,33 @@ -//! Comprehensive OS compatibility tests for ProcessCollector implementations. +//! Comprehensive OS compatibility tests for `ProcessCollector` implementations. //! //! This module tests compatibility across different OS versions, configurations, //! and environments to ensure robust cross-platform behavior. +#![allow( + clippy::doc_markdown, + clippy::expect_used, + clippy::unwrap_used, + clippy::uninlined_format_args, + clippy::print_stdout, + clippy::str_to_string, + clippy::map_unwrap_or, + clippy::non_ascii_literal, + clippy::use_debug, + clippy::shadow_reuse, + clippy::shadow_unrelated, + clippy::needless_pass_by_value, + clippy::redundant_clone, + clippy::as_conversions, + clippy::arithmetic_side_effects, + clippy::if_not_else, + clippy::option_if_let_else, + clippy::panic, + clippy::wildcard_enum_match_arm, + clippy::missing_const_for_fn, + clippy::bool_comparison, + clippy::pattern_type_mismatch +)] + use procmond::process_collector::{ FallbackProcessCollector, ProcessCollectionConfig, ProcessCollector, SysinfoProcessCollector, }; diff --git a/procmond/tests/os_compatibility_tests.rs b/procmond/tests/os_compatibility_tests.rs index 64ee48d..c728c2e 100644 --- a/procmond/tests/os_compatibility_tests.rs +++ b/procmond/tests/os_compatibility_tests.rs @@ -1,8 +1,29 @@ //! OS version and configuration compatibility tests. //! -//! This module tests ProcessCollector implementations across different OS versions +//! This module tests `ProcessCollector` implementations across different OS versions //! and system configurations to ensure broad compatibility and graceful degradation. +#![allow( + clippy::doc_markdown, + clippy::expect_used, + clippy::unwrap_used, + clippy::uninlined_format_args, + clippy::print_stdout, + clippy::str_to_string, + clippy::map_unwrap_or, + clippy::non_ascii_literal, + clippy::use_debug, + clippy::shadow_reuse, + clippy::shadow_unrelated, + clippy::needless_pass_by_value, + clippy::redundant_clone, + clippy::as_conversions, + clippy::arithmetic_side_effects, + clippy::if_not_else, + clippy::option_if_let_else, + clippy::panic +)] + use procmond::process_collector::{ FallbackProcessCollector, ProcessCollectionConfig, ProcessCollector, SysinfoProcessCollector, }; diff --git a/procmond/tests/privilege_management_tests.rs b/procmond/tests/privilege_management_tests.rs index 104918f..ddd9ed3 100644 --- a/procmond/tests/privilege_management_tests.rs +++ b/procmond/tests/privilege_management_tests.rs @@ -1,8 +1,29 @@ -//! Privilege management tests for ProcessCollector implementations. +//! Privilege management tests for `ProcessCollector` implementations. //! //! This module tests privilege escalation and dropping behavior across all platforms, //! ensuring that collectors handle privilege boundaries correctly and securely. +#![allow( + clippy::doc_markdown, + clippy::expect_used, + clippy::unwrap_used, + clippy::uninlined_format_args, + clippy::print_stdout, + clippy::map_unwrap_or, + clippy::non_ascii_literal, + clippy::unused_async, + clippy::arithmetic_side_effects, + clippy::panic, + clippy::single_char_pattern, + clippy::as_conversions, + clippy::if_not_else, + clippy::use_debug, + clippy::needless_pass_by_value, + clippy::redundant_clone, + clippy::shadow_reuse, + clippy::shadow_unrelated +)] + use procmond::process_collector::{ FallbackProcessCollector, ProcessCollectionConfig, ProcessCollector, SysinfoProcessCollector, }; @@ -28,7 +49,7 @@ fn is_elevated_privileges() -> bool { { // Use whoami crate to check if running as root // This is completely safe and doesn't require unsafe code - whoami::username() == "root" + whoami::username().map(|u| u == "root").unwrap_or(false) } #[cfg(windows)] @@ -57,7 +78,7 @@ fn get_current_user_info() -> String { #[cfg(windows)] { // Use whoami crate for cross-platform username retrieval - let username = whoami::username(); + let username = whoami::username().unwrap_or_else(|_| String::from("unknown")); format!("User: {}", username) } } diff --git a/procmond/tests/process_enumeration_edge_cases.rs b/procmond/tests/process_enumeration_edge_cases.rs index 016dc96..10fab8c 100644 --- a/procmond/tests/process_enumeration_edge_cases.rs +++ b/procmond/tests/process_enumeration_edge_cases.rs @@ -1,7 +1,25 @@ //! Edge case tests for process enumeration. //! //! This module tests edge cases and boundary conditions in process enumeration -//! across different ProcessCollector implementations without using property-based testing. +//! across different `ProcessCollector` implementations without using property-based testing. + +#![allow( + clippy::doc_markdown, + clippy::expect_used, + clippy::unwrap_used, + clippy::unseparated_literal_suffix, + clippy::unreadable_literal, + clippy::uninlined_format_args, + clippy::print_stdout, + clippy::non_ascii_literal, + clippy::arithmetic_side_effects, + clippy::shadow_reuse, + clippy::shadow_unrelated, + clippy::wildcard_enum_match_arm, + clippy::use_debug, + clippy::needless_pass_by_value, + clippy::redundant_clone +)] use procmond::process_collector::{ FallbackProcessCollector, ProcessCollectionConfig, ProcessCollector, SysinfoProcessCollector, diff --git a/procmond/tests/property_based_process_tests.proptest-regressions b/procmond/tests/property_based_process_tests.proptest-regressions index 03bcd2b..ed29b77 100644 --- a/procmond/tests/property_based_process_tests.proptest-regressions +++ b/procmond/tests/property_based_process_tests.proptest-regressions @@ -5,3 +5,4 @@ # It is recommended to check this file in to source control so that # everyone who runs the test benefits from these saved cases. cc 792fa891f3b044d1bfb53ff0e919fd7df451f91526da0814727ec7d8fc5c754c # shrinks to max_processes = 50 +cc 62dde4ca87a425ea4e3bb5d96b72118e3cf2cccec4701863f8689ac0a38c9ae3 # shrinks to max_processes = 39, collect_enhanced_metadata = false diff --git a/procmond/tests/property_based_process_tests.rs b/procmond/tests/property_based_process_tests.rs index 005d78a..1bb09a8 100644 --- a/procmond/tests/property_based_process_tests.rs +++ b/procmond/tests/property_based_process_tests.rs @@ -1,9 +1,20 @@ //! Property-based tests for process enumeration edge cases. //! //! This module uses proptest to generate test cases that explore edge cases -//! and boundary conditions in process enumeration across different ProcessCollector +//! and boundary conditions in process enumeration across different `ProcessCollector` //! implementations, ensuring robust behavior under various scenarios. +#![allow( + clippy::doc_markdown, + clippy::expect_used, + clippy::unwrap_used, + clippy::unseparated_literal_suffix, + clippy::uninlined_format_args, + clippy::arithmetic_side_effects, + clippy::needless_pass_by_value, + clippy::redundant_clone +)] + use procmond::process_collector::{ FallbackProcessCollector, ProcessCollectionConfig, ProcessCollector, SysinfoProcessCollector, }; @@ -211,8 +222,9 @@ fn test_process_data_validity_properties() { ); for arg in &event.command_line { + // Some processes (language servers, Java, etc.) have very long args assert!( - arg.len() <= 4096, + arg.len() <= 8192, "Command line argument should be reasonable length for {}: {}", name, arg.len() diff --git a/spec/procmond/index.md b/spec/procmond/index.md new file mode 100644 index 0000000..d45950f --- /dev/null +++ b/spec/procmond/index.md @@ -0,0 +1,95 @@ +# Procmond Implementation Epic - Ticket Index + +- **Epic**: Complete Procmond Implementation +- **Related Issues**: #39, #89, #40, #103, #64 + +## Ticket Completion Order + +Execute tickets in order. Each ticket's dependencies must be complete before starting. + +### Phase 1: Event Bus Integration + +- [x] **Ticket 1**: [Implement Write-Ahead Log and Event Bus Connector](./tickets/Implement_Write-Ahead_Log_and_Event_Bus_Connector.md) + - ✅ WAL component (verified existing implementation meets all criteria) + - ✅ EventBusConnector with WAL integration + - ✅ Event buffering (10MB) and replay + - ✅ Dynamic backpressure (70% threshold) + +### Phase 2: RPC and Lifecycle Management + +- [ ] **Ticket 2**: [Implement Actor Pattern and Startup Coordination](./tickets/Implement_Actor_Pattern_and_Startup_Coordination.md) + + - Actor pattern in ProcmondMonitorCollector + - Replace LocalEventBus with EventBusConnector + - Startup coordination ("begin monitoring" wait) + - Dynamic interval adjustment from backpressure + - *Requires: Ticket 1* + +- [ ] **Ticket 3**: [Implement RPC Service and Registration Manager](<./tickets/Implement_RPC_Service_and_Registration_Manager_(procmond).md>) + + - RpcServiceHandler component + - RegistrationManager component + - Lifecycle operations (HealthCheck, UpdateConfig, GracefulShutdown) + - Heartbeat publishing (30s interval) + - *Requires: Ticket 2* + +- [ ] **Ticket 4**: [Implement Agent Loading State and Heartbeat Detection](./tickets/Implement_Agent_Loading_State_and_Heartbeat_Detection.md) + + - Collector configuration format (agent.yaml) + - Loading state machine (Loading → Ready → Steady State) + - Heartbeat failure detection with escalating actions + - **Note**: This is daemoneye-agent work, not procmond + - *Requires: Tickets 2, 3* + +### Phase 3: Testing + +- [ ] **Ticket 5**: [Implement Comprehensive Test Suite](./tickets/Implement_Comprehensive_Test_Suite.md) + - Unit tests (>80% coverage) + - Integration tests (event bus, RPC, cross-platform) + - Chaos tests (connection failures, backpressure) + - Security tests (privilege escalation, injection, DoS) + - *Requires: Tickets 1, 2, 3, 4* + +### Phase 4: Hardening + +- [ ] **Ticket 6**: [Implement Security Hardening and Data Sanitization](./tickets/Implement_Security_Hardening_and_Data_Sanitization.md) + - Privilege detection (Linux caps, Windows tokens, macOS entitlements) + - Command-line and environment variable sanitization + - Security boundary validation + - Security test suite + - *Requires: Ticket 5* + +### Phase 5: Platform and Performance Validation + +- [ ] **Ticket 7**: [Validate FreeBSD Platform Support](./tickets/Validate_FreeBSD_Platform_Support.md) + + - Test FallbackProcessCollector on FreeBSD 13+ + - Document limitations (basic metadata only) + - Platform detection and capability reporting + - *Requires: Ticket 5* + +- [ ] **Ticket 8**: [Validate Performance and Optimize](./tickets/Validate_Performance_and_Optimize.md) + + - Benchmark process enumeration (\<100ms for 1,000 processes) + - Load test with 10,000+ processes + - Memory profiling (\<100MB sustained) + - CPU monitoring (\<5% sustained) + - Regression testing + - *Requires: Tickets 6, 7* + +--- + +## Reference Documents + +- [Epic Brief](./specs/Epic_Brief__Complete_Procmond_Implementation.md) +- [Core Flows](./specs/Core_Flows__Procmond_Process_Monitoring.md) +- [Tech Plan](./specs/Tech_Plan__Complete_Procmond_Implementation.md) + +## Success Criteria + +- [ ] Process enumeration works on Linux, macOS, Windows (full) and FreeBSD (basic) +- [ ] Event bus communication with daemoneye-agent is reliable +- [ ] Service lifecycle (start/stop/health) works via RPC +- [ ] Privilege boundaries enforced and validated +- [ ] Performance targets met (see Ticket 8) +- [ ] >80% unit test coverage, >90% critical path coverage diff --git a/spec/procmond/specs/Core_Flows__Procmond_Process_Monitoring.md b/spec/procmond/specs/Core_Flows__Procmond_Process_Monitoring.md new file mode 100644 index 0000000..a0b37e2 --- /dev/null +++ b/spec/procmond/specs/Core_Flows__Procmond_Process_Monitoring.md @@ -0,0 +1,683 @@ +# Core Flows: Procmond Process Monitoring + +## Overview + +This document describes the core user flows for procmond, the process monitoring daemon in DaemonEye. These flows capture how operators interact with procmond through daemoneye-agent and how the system behaves during normal operation and failure scenarios. + +**Key Principles:** + +- Operators interact through daemoneye-agent/CLI, not directly with procmond +- procmond runs autonomously with minimal operator intervention +- Configuration is centrally managed and pushed from daemoneye-agent +- System validates connectivity before starting and adapts to runtime conditions + +--- + +## Flow 1: Initial Deployment and First-Run Setup + +**Description:** How operators set up procmond for the first time on a new system + +**Trigger:** Operator installs DaemonEye on a new system + +**Steps:** + +01. Operator installs DaemonEye package (deb, rpm, pkg, msi, or homebrew) +02. Installation creates default configuration files in system location +03. Operator reviews and adjusts configuration via `daemoneye-cli config show procmond` +04. Operator sets collection interval, metadata options, and resource limits +05. Operator validates configuration via `daemoneye-cli config validate` +06. Operator starts daemoneye-agent service (systemd, launchd, Windows Service) +07. daemoneye-agent starts embedded event bus broker +08. daemoneye-agent spawns procmond with validated configuration +09. procmond connects to event bus and registers capabilities +10. procmond performs initial process enumeration +11. Operator runs `daemoneye-cli health procmond` to verify setup +12. Operator sees "procmond: healthy" status confirming successful setup + +**First-Run Validation:** + +- Configuration syntax is valid +- procmond can connect to event bus +- Platform-specific collector initializes successfully +- Initial process enumeration completes +- Health check passes + +**Common First-Run Issues:** + +- **Insufficient privileges:** Operator sees permission errors; must run daemoneye-agent with appropriate privileges +- **Invalid configuration:** Operator sees validation errors; must correct configuration file +- **Event bus connection fails:** Operator sees connection timeout; must verify daemoneye-agent is running + +--- + +## Flow 2: System Startup and Initialization + +**Description:** How procmond starts up, connects to the event bus, and begins monitoring (subsequent starts after initial deployment) + +**Trigger:** daemoneye-agent starts procmond as part of system initialization + +**Steps:** + +01. daemoneye-agent starts its embedded event bus broker +02. daemoneye-agent spawns procmond process with configuration +03. procmond initializes logging and loads configuration from daemoneye-agent +04. procmond validates configuration parameters (intervals, limits, metadata options) +05. procmond attempts to connect to daemoneye-agent's event bus broker +06. **Decision Point:** If connection fails, procmond retries with exponential backoff (up to 3 attempts) +07. **Success Path:** procmond registers with broker, publishes registration message with capabilities +08. procmond initializes platform-specific collector (Linux/macOS/Windows/FreeBSD) +09. procmond performs initial health check and reports status to daemoneye-agent +10. procmond begins continuous monitoring loop +11. Operator sees "procmond: healthy" status in daemoneye-agent health report + +**Failure Paths:** + +- **Event bus unreachable:** procmond logs error, retries, then exits if all attempts fail; daemoneye-agent shows "procmond: disconnected" status +- **Invalid configuration:** procmond logs validation errors and exits; operator sees error in daemoneye-agent logs +- **Platform collector initialization fails:** procmond falls back to basic sysinfo collector; operator sees warning in health status + +```mermaid +sequenceDiagram + participant Operator + participant Agent as daemoneye-agent + participant Broker as Event Bus Broker + participant Procmond as procmond + + Operator->>Agent: Start DaemonEye system + Agent->>Broker: Initialize embedded broker + Broker-->>Agent: Broker ready + Agent->>Procmond: Start procmond with config + Procmond->>Procmond: Load configuration + Procmond->>Procmond: Validate parameters + Procmond->>Broker: Connect to event bus + alt Connection successful + Broker-->>Procmond: Connection established + Procmond->>Broker: Publish registration (capabilities) + Procmond->>Procmond: Initialize platform collector + Procmond->>Broker: Publish health status (healthy) + Procmond->>Procmond: Start monitoring loop + Agent->>Operator: Display "procmond: healthy" + else Connection failed + Broker-->>Procmond: Connection timeout + Procmond->>Procmond: Retry with backoff (3 attempts) + Procmond->>Procmond: Exit after retries exhausted + Agent->>Operator: Display "procmond: disconnected" + end +``` + +--- + +## Flow 3: Continuous Process Monitoring + +**Description:** The ongoing cycle of collecting process data and publishing events to the event bus + +**Trigger:** procmond's monitoring loop runs on configured interval (default: 30 seconds) + +**Steps:** + +01. procmond waits for next collection interval tick +02. procmond enumerates all running processes using platform-specific collector +03. procmond collects basic metadata (PID, name, executable path, command line, resource usage) +04. **Decision Point:** If enhanced metadata is enabled, collect platform-specific details (network connections, file descriptors, security contexts) +05. procmond compares current process list with previous snapshot (lifecycle tracking) +06. procmond identifies lifecycle events (process starts, stops, modifications) +07. procmond publishes process events to event bus topic `events.process.batch` +08. procmond publishes lifecycle events to topic `events.process.lifecycle` +09. **Decision Point:** If backpressure detected (event bus queue full), procmond slows down event publishing +10. procmond updates internal statistics (processes collected, events published, errors) +11. procmond stores audit trail in local database +12. Cycle repeats on next interval + +**Operator Visibility:** + +- No real-time visibility during normal operation +- Statistics available through daemoneye-agent health endpoint +- Errors logged and visible in daemoneye-agent status + +**Performance Expectations:** + +- Collection completes within interval (30 seconds default) +- Enumerate 1,000 processes in \<100ms +- Memory usage stays \<100MB +- CPU usage \<5% sustained + +```mermaid +sequenceDiagram + participant Procmond as procmond + participant Collector as Platform Collector + participant Tracker as Lifecycle Tracker + participant Broker as Event Bus + + loop Every collection interval + Procmond->>Collector: Enumerate processes + Collector-->>Procmond: Process list with metadata + Procmond->>Tracker: Compare with previous snapshot + Tracker-->>Procmond: Lifecycle events (starts/stops/changes) + Procmond->>Broker: Publish process batch (events.process.batch) + Procmond->>Broker: Publish lifecycle events (events.process.lifecycle) + alt Backpressure detected + Broker-->>Procmond: Queue full signal + Procmond->>Procmond: Slow down publishing rate + end + Procmond->>Procmond: Update statistics + Procmond->>Procmond: Store audit trail + end +``` + +--- + +## Flow 4: Suspicious Process Detection and Triggering + +**Description:** How procmond identifies suspicious processes and triggers deeper analysis by other collectors + +**Trigger:** Suspicious process detected during lifecycle tracking (PID reuse, unsigned binary, anomalous behavior) + +**Steps:** + +1. procmond detects suspicious process during lifecycle analysis +2. procmond evaluates configured detection rules (operator-defined via daemoneye-agent) +3. **Decision Point:** Rule matches determine if process is suspicious +4. procmond creates trigger request with priority (Low/Normal/High/Critical) +5. procmond publishes trigger request to event bus topic `control.collector.task.{collector_type}.{id}` +6. daemoneye-agent receives trigger request and routes to appropriate collector (e.g., binary hasher) +7. Target collector performs analysis and publishes results back to event bus +8. procmond continues monitoring without waiting for analysis completion +9. Operator can review triggered analyses through daemoneye-cli query interface + +**Operator Configuration:** + +- Operators define detection rules through daemoneye-agent configuration +- Rules specify conditions (unsigned binaries, network connections, privilege escalation) +- Rules specify which collectors to trigger (binary hasher, memory analyzer, etc.) + +**Example Scenarios:** + +- **Unsigned binary detected:** Trigger binary hasher for integrity verification +- **PID reuse detected:** Trigger behavioral analysis for anomaly detection +- **Privilege escalation:** Trigger memory analyzer for credential dumping detection + +```mermaid +sequenceDiagram + participant Procmond as procmond + participant Tracker as Lifecycle Tracker + participant Rules as Detection Rules + participant Broker as Event Bus + participant Agent as daemoneye-agent + participant Analyzer as Analysis Collector + + Procmond->>Tracker: Detect process changes + Tracker-->>Procmond: Suspicious event (PID reuse) + Procmond->>Rules: Evaluate detection rules + Rules-->>Procmond: Rule matched: trigger binary hasher + Procmond->>Broker: Publish trigger request (High priority) + Broker->>Agent: Route trigger request + Agent->>Analyzer: Start binary hash analysis + Analyzer->>Broker: Publish analysis results + Note over Procmond: Continues monitoring independently +``` + +--- + +## Flow 5: Configuration Update + +**Description:** How operators update procmond's configuration through daemoneye-cli + +**Trigger:** Operator modifies procmond configuration (e.g., change collection interval, enable enhanced metadata) + +**Steps:** + +01. Operator updates configuration via daemoneye-cli: `daemoneye-cli config update procmond --interval=60 --enhanced-metadata=true` +02. daemoneye-agent validates new configuration parameters +03. daemoneye-agent publishes configuration update to event bus topic `control.collector.config` +04. procmond receives configuration update message +05. procmond validates new configuration (intervals, limits, feature flags) +06. **Decision Point:** If validation fails, procmond rejects update and reports error +07. **Success Path:** procmond applies new configuration without restarting +08. procmond adjusts monitoring behavior (new interval, metadata collection level) +09. procmond publishes configuration acknowledgment to event bus +10. daemoneye-agent confirms configuration applied successfully +11. Operator sees "Configuration updated successfully" message + +**Configuration Changes Supported:** + +- Collection interval adjustment (5-3600 seconds) +- Enhanced metadata toggle (on/off) +- Executable hashing toggle (on/off) +- Maximum processes per cycle limit +- Detection rule updates + +**No Restart Required:** + +- Configuration changes apply to next collection cycle +- No service interruption or process restart needed + +--- + +## Flow 6: Health Monitoring and Status Reporting + +**Description:** How operators monitor procmond's health and diagnose issues + +**Trigger:** Operator checks system health via daemoneye-cli or daemoneye-agent reports degraded status + +**Steps:** + +1. Operator runs health check command through daemoneye-cli + +2. daemoneye-cli queries daemoneye-agent for component health + +3. daemoneye-agent requests health status from procmond via event bus + +4. procmond performs self-health check: + + - Verify event bus connectivity + - Check collection cycle success rate + - Validate resource usage (memory, CPU) + - Check for consecutive failures + +5. procmond publishes health status to topic `control.health.status` + +6. daemoneye-agent aggregates health data and returns to CLI + +7. Operator sees health report with status indicators: + + - **Healthy:** All checks passing, normal operation + - **Degraded:** Some issues but still functional (e.g., enhanced metadata unavailable) + - **Unhealthy:** Critical issues requiring intervention (e.g., event bus disconnected) + +**Health Indicators:** + +- Event bus connectivity status +- Collection success rate (last 10 cycles) +- Current resource usage (memory, CPU) +- Backpressure events count +- Last successful collection timestamp +- Platform collector status + +**Operator Actions Based on Status:** + +- **Healthy:** No action needed +- **Degraded:** Review warnings, consider configuration adjustments +- **Unhealthy:** Investigate errors, check logs, potentially restart daemoneye-agent + +--- + +## Flow 7: Error Handling and Recovery + +**Description:** How procmond handles failures and recovers gracefully + +### 7.1: Event Bus Connection Failure + +**Trigger:** procmond loses connection to daemoneye-agent's event bus broker + +**Steps:** + +1. procmond detects connection failure during event publishing +2. procmond logs error with connection details +3. procmond enters reconnection mode with exponential backoff +4. procmond buffers events locally (up to configured limit) +5. **Decision Point:** After 3 failed reconnection attempts, procmond reports critical failure +6. daemoneye-agent detects procmond disconnection via missing heartbeats +7. daemoneye-agent attempts to restart procmond +8. **Recovery:** When connection restored, procmond publishes buffered events +9. Operator sees "procmond: reconnected" status update + +### 7.2: Permission/Privilege Failure + +**Trigger:** procmond cannot access process information due to insufficient privileges + +**Steps:** + +1. procmond attempts to collect process metadata +2. Platform collector returns permission denied error +3. procmond logs specific process and permission error +4. **Decision Point:** If error is for single process, skip and continue; if systemic, report degraded status +5. procmond publishes partial results with error metadata +6. procmond reports degraded health status to daemoneye-agent +7. Operator sees warning in health report: "Limited process visibility due to permissions" +8. Operator can review logs to identify privilege requirements + +### 7.3: Performance Degradation and Backpressure + +**Trigger:** Collection takes too long or event bus cannot keep up with event volume + +**Steps:** + +1. procmond detects collection exceeding interval time or event bus backpressure +2. procmond activates circuit breaker after 5 consecutive backpressure events +3. procmond reduces event publishing rate (drops low-priority events) +4. procmond logs performance degradation with metrics +5. procmond publishes degraded health status +6. **Decision Point:** If degradation persists, procmond requests configuration adjustment from daemoneye-agent +7. daemoneye-agent may increase collection interval or disable enhanced metadata +8. Operator sees "procmond: degraded (performance)" in health status +9. Operator can review performance metrics and adjust configuration + +### 7.4: Platform-Specific Enumeration Failure + +**Trigger:** Platform-specific collector fails (e.g., procfs unavailable on Linux, WinAPI error on Windows) + +**Steps:** + +1. Platform-specific collector encounters error during enhanced metadata collection +2. procmond logs platform-specific error details +3. procmond falls back to basic sysinfo collector +4. procmond continues with reduced metadata (no network connections, file descriptors, etc.) +5. procmond publishes events with "degraded_metadata" flag +6. procmond reports degraded health status with reason +7. Operator sees "procmond: degraded (limited metadata)" in health report +8. Operator can investigate platform-specific issues (missing kernel modules, security policies) + +### 7.5: Resource Exhaustion + +**Trigger:** procmond approaches memory or CPU limits + +**Steps:** + +1. procmond monitors its own resource usage +2. procmond detects memory usage approaching limit (>90MB of 100MB budget) +3. procmond reduces buffer sizes and clears old snapshots +4. procmond disables enhanced metadata collection temporarily +5. procmond logs resource exhaustion warning +6. **Decision Point:** If resource usage continues to grow, procmond requests restart from daemoneye-agent +7. daemoneye-agent gracefully restarts procmond +8. Operator sees "procmond: restarted (resource limits)" in event log +9. Operator can review resource usage trends and adjust limits + +```mermaid +flowchart TD + Start[Collection Cycle Start] --> Enumerate[Enumerate Processes] + Enumerate --> CheckSuccess{Collection
Successful?} + + CheckSuccess -->|Yes| Lifecycle[Lifecycle Analysis] + CheckSuccess -->|No| CheckError{Error Type?} + + CheckError -->|Permission| PartialResults[Publish Partial Results] + CheckError -->|Platform| Fallback[Fall Back to Basic Collection] + CheckError -->|Timeout| Retry[Retry with Backoff] + + PartialResults --> ReportDegraded[Report Degraded Status] + Fallback --> ReportDegraded + Retry --> CheckRetries{Retries
Exhausted?} + + CheckRetries -->|No| Enumerate + CheckRetries -->|Yes| ReportUnhealthy[Report Unhealthy Status] + + Lifecycle --> Publish[Publish Events to Bus] + Publish --> CheckBackpressure{Backpressure
Detected?} + + CheckBackpressure -->|Yes| CircuitBreaker{Circuit Breaker
Threshold?} + CheckBackpressure -->|No| UpdateStats[Update Statistics] + + CircuitBreaker -->|Activated| DropEvents[Drop Low-Priority Events] + CircuitBreaker -->|Not Yet| SlowDown[Slow Publishing Rate] + + DropEvents --> UpdateStats + SlowDown --> UpdateStats + + UpdateStats --> CheckResources{Resource Usage
OK?} + + CheckResources -->|Yes| WaitInterval[Wait for Next Interval] + CheckResources -->|No| ReduceLoad[Reduce Memory/CPU Load] + + ReduceLoad --> WaitInterval + ReportDegraded --> WaitInterval + ReportUnhealthy --> RequestRestart[Request Restart from Agent] + + WaitInterval --> Start +``` + +--- + +## Flow 8: Graceful Shutdown + +**Description:** How procmond cleanly stops and releases resources + +**Trigger:** daemoneye-agent sends shutdown signal to procmond (system shutdown, maintenance, restart) + +**Steps:** + +01. daemoneye-agent publishes shutdown command to topic `control.collector.lifecycle` +02. procmond receives shutdown signal +03. procmond stops accepting new collection cycles +04. procmond completes current collection cycle if in progress (with 30-second timeout) +05. procmond publishes any buffered events to event bus +06. procmond flushes audit trail to local database +07. procmond publishes deregistration message to event bus +08. procmond closes event bus connection +09. procmond releases platform-specific resources (file handles, memory) +10. procmond exits with success code +11. daemoneye-agent confirms procmond stopped cleanly +12. Operator sees "procmond: stopped" status + +**Timeout Handling:** + +- If shutdown takes >30 seconds, daemoneye-agent forcefully terminates procmond +- Operator sees "procmond: force stopped" warning in logs + +--- + +## Flow 9: Operator Troubleshooting + +**Description:** How operators diagnose and resolve procmond issues + +**Trigger:** Operator notices degraded or unhealthy procmond status in daemoneye-agent health report + +**Steps:** + +1. Operator runs diagnostic command via daemoneye-cli: `daemoneye-cli health procmond --detailed` + +2. daemoneye-cli queries daemoneye-agent for procmond diagnostics + +3. daemoneye-agent requests detailed health report from procmond via event bus + +4. procmond gathers diagnostic information: + + - Recent error messages and stack traces + - Collection cycle statistics (success rate, latency) + - Resource usage trends (memory, CPU over time) + - Event bus connectivity status + - Platform collector status and capabilities + +5. procmond publishes diagnostic report to topic `control.health.diagnostics` + +6. daemoneye-agent formats and returns diagnostic data to CLI + +7. Operator reviews diagnostic output showing: + + - **Status:** Current health state with reason + - **Statistics:** Collection cycles, events published, errors + - **Resources:** Memory usage, CPU usage, buffer sizes + - **Connectivity:** Event bus connection status, last successful publish + - **Recent Errors:** Last 10 errors with timestamps and context + +8. **Decision Point:** Based on diagnostics, operator takes action: + + - **Permission errors:** Adjust procmond privileges or security policies + - **Performance issues:** Increase collection interval or disable enhanced metadata + +- **Connectivity issues:** Check daemoneye-agent status and event bus health +- **Resource exhaustion:** Increase resource limits or reduce collection scope + +09. Operator applies configuration changes through daemoneye-agent +10. Operator monitors health status to confirm issue resolved + +**Common Troubleshooting Scenarios:** + +| Issue | Diagnostic Indicator | Operator Action | +| ----------------- | ----------------------------------- | ----------------------------------------------- | +| High error rate | Collection success rate \<80% | Review error logs, check permissions | +| Backpressure | Backpressure events >100/hour | Increase interval, reduce metadata | +| Memory growth | Memory usage trending upward | Reduce max processes, disable enhanced metadata | +| Missing events | Events published but not received | Check daemoneye-agent event bus health | +| Platform failures | Platform collector status: degraded | Check OS-specific requirements (procfs, WinAPI) | + +--- + +## Flow 10: Detection Rule Configuration + +**Description:** How operators configure what procmond considers "suspicious" and what analysis to trigger + +**Trigger:** Operator wants to add or modify detection rules for suspicious process behavior + +**Steps:** + +01. Operator defines detection rule via daemoneye-cli: `daemoneye-cli rules add --name="unsigned-binary" --condition="code_signed=false" --trigger="binary_hasher" --priority=high` +02. Rule specifies conditions (e.g., "unsigned binary", "network connection to suspicious IP", "privilege escalation") +03. Rule specifies which collector to trigger (binary hasher, memory analyzer, network analyzer) +04. Rule specifies priority level (Low/Normal/High/Critical) +05. Operator saves rule configuration +06. daemoneye-agent validates rule syntax and feasibility +07. daemoneye-agent publishes rule update to procmond via topic `control.collector.config` +08. procmond receives and validates rule update +09. procmond applies new rules to lifecycle tracker +10. procmond acknowledges rule update to daemoneye-agent +11. Operator sees "Detection rules updated successfully" confirmation +12. procmond begins applying new rules in next collection cycle + +**Rule Examples:** + +- "Trigger binary hasher for any unsigned executable" +- "Trigger network analyzer for processes with >10 network connections" +- "Trigger memory analyzer for processes with privilege escalation" +- "Trigger behavioral analyzer for PID reuse events" + +--- + +## Flow 11: Cross-Platform Behavior + +**Description:** How procmond adapts to different operating systems while maintaining consistent operator experience + +**Trigger:** procmond starts on different platforms (Linux, macOS, Windows, FreeBSD) + +**Platform-Specific Behaviors:** + +### Linux + +1. procmond detects Linux platform during initialization +2. procmond initializes Linux-specific collector with procfs access +3. procmond collects enhanced metadata: network connections, file descriptors, cgroups, namespaces +4. procmond respects SELinux/AppArmor restrictions +5. Operator sees full metadata in process events + +### macOS + +1. procmond detects macOS platform during initialization +2. procmond initializes macOS-specific collector with BSD sysctl +3. procmond collects enhanced metadata: code signing info, sandbox profiles +4. procmond respects System Integrity Protection (SIP) boundaries +5. Operator sees macOS-specific metadata (code signing, sandboxing) + +### Windows + +1. procmond detects Windows platform during initialization +2. procmond initializes Windows-specific collector with WinAPI +3. procmond collects enhanced metadata: session IDs, handle counts, integrity levels, WOW64 status +4. procmond respects UAC and Windows security boundaries +5. Operator sees Windows-specific metadata (sessions, integrity levels) + +### FreeBSD + +1. procmond detects FreeBSD platform during initialization +2. procmond initializes basic sysinfo collector (enhanced collector not available) +3. procmond collects basic metadata only (PID, name, paths, resource usage) +4. procmond reports degraded status: "Enhanced metadata not available on FreeBSD" +5. Operator sees warning about limited metadata but monitoring continues + +**Operator Experience:** + +- Consistent health status and error reporting across all platforms +- Platform-specific metadata differences documented in health report +- Same configuration interface regardless of platform +- Graceful degradation on secondary platforms (FreeBSD) + +--- + +## Flow 12: Performance Optimization Cycle + +**Description:** How operators tune procmond performance based on system load and requirements + +**Trigger:** Operator notices performance issues or wants to optimize resource usage + +**Steps:** + +1. Operator reviews performance metrics via daemoneye-cli: `daemoneye-cli metrics procmond` +2. daemoneye-cli displays performance dashboard: + +- Collection latency (average, p95, p99) +- Memory usage trend +- CPU usage trend +- Event publishing rate +- Backpressure frequency + +3. Operator identifies performance bottleneck: + +- **High latency:** Collection taking too long +- **High memory:** Too many processes or snapshots +- **Backpressure:** Event bus can't keep up + +4. Operator adjusts configuration through daemoneye-agent: + +- Increase collection interval (reduce frequency) +- Disable enhanced metadata (reduce per-process overhead) +- Reduce max processes per cycle (limit scope) +- Disable executable hashing (reduce CPU usage) + +5. daemoneye-agent pushes configuration update to procmond +6. procmond applies changes and adjusts behavior +7. Operator monitors metrics to confirm improvement +8. **Decision Point:** If performance improves, keep changes; if not, try different adjustments +9. Operator iterates until performance meets targets + +**Performance Targets:** + +- Collection latency \<100ms for 1,000 processes +- Memory usage \<100MB sustained +- CPU usage \<5% sustained +- Zero backpressure events under normal load + +--- + +## Summary of Key Flows + +| Flow | Operator Involvement | Frequency | Criticality | +| --------------------- | ------------------------------------ | ------------------------ | ----------- | +| Initial Deployment | Direct (via daemoneye-cli) | Once per system | Critical | +| System Startup | Indirect (via daemoneye-agent) | Once per deployment | Critical | +| Continuous Monitoring | None (autonomous) | Every 30s (configurable) | Critical | +| Suspicious Detection | None (automatic) | As events occur | High | +| Configuration Update | Direct (via daemoneye-agent) | Occasional | Medium | +| Health Monitoring | Direct (via daemoneye-cli) | On-demand | High | +| Error Recovery | Automatic (with operator escalation) | As failures occur | Critical | +| Graceful Shutdown | Indirect (via daemoneye-agent) | Rare | Medium | +| Troubleshooting | Direct (via daemoneye-cli) | When issues arise | High | +| Rule Configuration | Direct (via daemoneye-agent) | Occasional | Medium | +| Cross-Platform | None (automatic adaptation) | Platform-dependent | High | +| Performance Tuning | Direct (via daemoneye-agent) | Periodic | Medium | + +--- + +## Operator Touchpoints + +**Primary Interface:** daemoneye-cli (for configuration, health checks, diagnostics, and metrics) + +**Secondary Interface:** daemoneye-agent (for lifecycle management - start/stop/restart) + +**No Direct Interface:** Operators do not interact with procmond directly; all interactions are mediated through daemoneye-agent + +**Visibility Channels:** + +- Health status reports (healthy/degraded/unhealthy) +- Error logs (structured logging via daemoneye-agent) +- Performance metrics (via daemoneye-cli) +- Diagnostic reports (detailed troubleshooting data) + +--- + +## References + +- Epic Brief: spec:54226c8a-719a-479a-863b-9c91f43717a9/0fc3298b-37df-4722-a761-66a5a0da16b3 +- Event Bus Architecture: file:docs/embedded-broker-architecture.md +- Topic Hierarchy: file:daemoneye-eventbus/docs/topic-hierarchy.md +- Process Collector Implementation: file:procmond/src/process_collector.rs +- Lifecycle Tracking: file:procmond/src/lifecycle.rs +- Monitor Collector: file:procmond/src/monitor_collector.rs diff --git a/spec/procmond/specs/Epic_Brief__Complete_Procmond_Implementation.md b/spec/procmond/specs/Epic_Brief__Complete_Procmond_Implementation.md new file mode 100644 index 0000000..6731dad --- /dev/null +++ b/spec/procmond/specs/Epic_Brief__Complete_Procmond_Implementation.md @@ -0,0 +1,281 @@ +# Epic Brief: Complete Procmond Implementation + +## Summary + +DaemonEye requires a production-ready process monitoring daemon (procmond) that serves as the foundation for security monitoring across Linux, macOS, Windows, and FreeBSD platforms. While core process enumeration functionality exists for primary platforms, the implementation needs architectural refinement, FreeBSD support, security hardening, performance validation, and integration with the daemoneye-agent service orchestrator. This Epic covers completing the procmond implementation as a Monitor Collector that continuously observes system processes, detects suspicious activity, and triggers analysis collectors through an event-driven architecture. The work includes finishing platform-specific features, implementing IPC communication with daemoneye-agent, hardening security boundaries, validating performance against targets, and achieving comprehensive test coverage to deliver a reliable, secure, and performant process monitoring foundation for the DaemonEye security platform. + +## Context & Problem + +### Who's Affected + +**Primary Users:** + +- **Security Operations Teams**: Need reliable, real-time process monitoring to detect threats and investigate incidents across heterogeneous infrastructure +- **System Administrators**: Require low-overhead monitoring that works consistently across Linux, macOS, Windows, and FreeBSD environments +- **Compliance Officers**: Need tamper-evident audit trails and comprehensive process metadata for regulatory requirements +- **DevOps Engineers**: Need monitoring that operates in air-gapped, containerized, and high-security environments + +**Secondary Users:** + +- **Incident Response Teams**: Depend on accurate process genealogy and metadata for forensic analysis +- **Threat Hunters**: Need rich process context (network connections, file descriptors, security contexts) for proactive threat hunting + +### Current Pain Points + +**Incomplete Platform Coverage** + +- FreeBSD support is missing or incomplete, limiting deployment options for security-conscious organizations that standardize on BSD systems +- Platform-specific metadata collection varies in depth, creating inconsistent detection capabilities across environments +- Secondary platform support (Solaris, AIX) is undefined, blocking enterprise adoption in heterogeneous data centers + +**Architectural Gaps** + +- Process collectors exist but lack full integration with the daemoneye-agent service orchestrator, preventing complete lifecycle management and health monitoring +- Event bus communication between procmond and daemoneye-agent needs refinement for the event-driven architecture +- Monitor Collector behavior (continuous operation, event generation, triggering other collectors) requires completion +- Privilege separation between privileged process enumeration and unprivileged detection logic needs validation + +**Security Concerns** + +- Privilege management is incomplete - unclear when elevated privileges are required and how to drop them safely +- Data sanitization (command-line arguments, environment variables) is not consistently applied, risking exposure of secrets in logs +- Security boundaries between procmond (privileged) and daemoneye-agent (unprivileged) are not enforced +- No validation that the implementation meets least-privilege principles + +**Performance Uncertainty** + +- Performance benchmarks haven't been validated against targets (e.g., enumerate 1,000 processes in \<100ms) +- No load testing with 10,000+ processes to validate scalability claims +- Memory usage and CPU overhead under sustained operation are unverified +- No regression testing to prevent performance degradation + +**Testing Gaps** + +- Test coverage is below target thresholds (\<80% unit, \<90% critical paths) +- Cross-platform integration tests are incomplete or missing +- Security testing (privilege escalation, injection attacks, DoS) is insufficient +- No chaos testing for resilience validation + +### Where in the Product + +This Epic affects the **core monitoring foundation** of DaemonEye: + +**Component Hierarchy:** + +``` +DaemonEye Platform +├── daemoneye-agent (service orchestrator) +│ └── procmond (process monitor) ← THIS EPIC +│ ├── Process Enumeration Engine +│ ├── Platform-Specific Collectors (Linux, macOS, Windows, FreeBSD) +│ ├── Event Detection & Triggering +│ └── Event Bus Integration Layer +├── daemoneye-cli (management interface) +└── Detection Engine (consumes procmond data) +``` + +**Integration Points:** + +- **Upstream**: Operating system APIs (procfs, WinAPI, BSD sysctl) +- **Downstream**: daemoneye-agent (lifecycle management, event bus), detection engine (SQL queries), alert system +- **Lateral**: Other collectors (binary hasher, memory analyzer) triggered by procmond events via event bus + +### Root Cause Analysis + +The current state reflects **incremental development without complete architectural integration**: + +1. **Phase 1 (Complete)**: Core process enumeration implemented using sysinfo crate with platform-specific enhancements +2. **Phase 2 (Incomplete)**: Integration with daemoneye-agent architecture and IPC communication +3. **Phase 3 (Not Started)**: Security hardening, performance validation, comprehensive testing +4. **Phase 4 (Not Started)**: Production readiness (observability, documentation, deployment) + +The gap exists because: + +- Initial focus was on proving cross-platform enumeration feasibility +- Architectural decisions for service orchestration (daemoneye-agent) and event bus integration evolved during development +- Security and performance requirements were deferred to avoid premature optimization +- Testing infrastructure development lagged behind feature implementation + +### Business Impact + +**Without completing this Epic:** + +- ❌ DaemonEye cannot be deployed in production environments (no service lifecycle management) +- ❌ FreeBSD users cannot adopt DaemonEye (platform gap) +- ❌ Security-conscious organizations cannot trust the implementation (unvalidated security boundaries) +- ❌ Performance claims are unverified (risk of production issues) +- ❌ Detection capabilities are inconsistent across platforms (metadata gaps) + +**With this Epic complete:** + +- ✅ Production-ready process monitoring across all target platforms +- ✅ Reliable service lifecycle management through daemoneye-agent +- ✅ Validated security boundaries and privilege separation +- ✅ Proven performance characteristics under load +- ✅ Comprehensive test coverage for confidence in reliability +- ✅ Foundation for event-driven security monitoring architecture + +## Scope + +### In Scope + +**Platform Completion** + +- Complete FreeBSD support with basic process enumeration (best-effort, documented limitations) +- Fill metadata gaps across primary platforms (Linux, macOS, Windows) + +**CLI Features (Basic)** + +- Health check commands for procmond status +- Diagnostic commands for troubleshooting +- Performance metrics display +- Configuration update commands +- Validate cross-platform consistency in data models and behavior + +**Architectural Integration** + +- Complete event bus integration between procmond and daemoneye-agent: + - Implement missing RPC patterns for lifecycle management (start/stop/restart) + - Add comprehensive error handling and reconnection logic for event bus failures + - Implement missing topic subscriptions and publishing patterns + - Performance optimization and load testing of event bus communication +- Integrate procmond with daemoneye-agent service lifecycle management +- Implement Monitor Collector behavior (continuous operation, event generation, triggering) +- Define and enforce privilege boundaries between components + +**Security Hardening** + +- Implement privilege detection and management (capabilities, tokens) +- Add data sanitization for command-line arguments and environment variables +- Validate security boundaries and least-privilege principles +- Create security test suite (privilege escalation, injection, DoS) + +**Performance Validation** + +- Benchmark process enumeration against targets (1,000 processes in \<100ms) +- Load test with 10,000+ processes +- Validate memory usage (\<100MB) and CPU overhead (\<5%) +- Implement performance regression testing + +**Testing & Quality** + +- Achieve >80% unit test coverage, >90% critical path coverage +- Create cross-platform integration test suite +- Implement chaos testing for resilience validation +- Add property-based testing for edge cases + +**Documentation** + +- Architecture documentation (component interactions, privilege model) +- Deployment guides (installation, configuration, troubleshooting) +- API documentation (ProcessCollector trait, data models) +- Security documentation (threat model, security controls) + +### Out of Scope + +**Deferred to Future Epics** + +- Advanced behavioral analysis and machine learning-based anomaly detection +- Real-time process monitoring with sub-second event detection +- Kernel-level monitoring (eBPF on Linux, ETW on Windows) +- Integration with external threat intelligence feeds +- Commercial features (Security Center, federated architecture) +- Support for secondary platforms beyond FreeBSD (Solaris, AIX) + +**Explicitly Not Included** + +- Detection rule authoring and management (separate Epic) +- Alert delivery and notification systems (separate Epic) +- Advanced CLI features (query interface, rule management, advanced diagnostics) (separate Epic) +- Database schema and storage layer (separate Epic) +- Enhanced FreeBSD metadata collection (deferred to future work) + +### Success Criteria + +**Functional Completeness** + +- ✅ Process enumeration works on Linux, macOS, Windows (full support) and FreeBSD (basic support) +- ✅ Platform-specific metadata collection is consistent on primary platforms (Linux, macOS, Windows) +- ✅ Event bus communication with daemoneye-agent is reliable and performant +- ✅ Service lifecycle (start, stop, restart, health checks) works correctly via RPC patterns +- ✅ Monitor Collector behavior (event generation, triggering) is functional +- ✅ Basic CLI commands for health checks and diagnostics are implemented + +**Security Validation** + +- ✅ Privilege boundaries are enforced and validated +- ✅ Data sanitization prevents secret exposure +- ✅ Security test suite passes with no critical vulnerabilities +- ✅ Least-privilege principles are documented and verified + +**Performance Targets** + +- ✅ Enumerate 1,000 processes in \<100ms (average, primary platforms) +- ✅ Support 10,000+ processes without degradation +- ✅ Memory usage \<100MB during normal operation +- ✅ CPU overhead \<5% during continuous monitoring + +**Quality Metrics** + +- ✅ >80% unit test coverage across all modules +- ✅ >90% critical path coverage: + - Process enumeration on all platforms + - Event bus communication (publish/subscribe/reconnection) + - Core monitoring loop and lifecycle detection + - All error handling and recovery paths + - Security boundaries (privilege management, data sanitization) +- ✅ Cross-platform integration tests pass on all target platforms +- ✅ Zero regressions in performance benchmarks + +**Operational Readiness** + +- ✅ Architecture documentation complete and reviewed +- ✅ Deployment guides tested on all platforms +- ✅ API documentation generated and published +- ✅ Security documentation reviewed by security team + +## Key Assumptions + +1. **Platform Support**: FreeBSD 13+ is the only secondary platform in scope; other BSDs and Unix variants are deferred +2. **Performance Targets**: Current targets (100ms for 1,000 processes) are based on typical deployment scenarios; extreme edge cases (100,000+ processes) are out of scope +3. **Security Model**: Privilege separation between procmond (elevated) and daemoneye-agent (unprivileged) is the correct architectural approach +4. **IPC Technology**: The daemoneye-eventbus, as the previous IPC and RPC technologies are not used within procmond. +5. **Testing Infrastructure**: Existing CI/CD pipeline (GitHub Actions) is sufficient for cross-platform testing +6. **Timeline Flexibility**: Milestone dates are flexible and will be updated based on actual progress + +## Constraints + +**Technical Constraints** + +- Must maintain backward compatibility with existing ProcessRecord data model +- Must use Rust 2024 edition with MSRV 1.91+ +- Must follow workspace-level lints (unsafe_code = "forbid", warnings = "deny") +- Must integrate with existing collector-core framework and daemoneye-eventbus + +**Resource Constraints** + +- Single developer (unclesp1d3r) as primary contributor +- Limited access to FreeBSD testing infrastructure +- No dedicated security audit team (self-review required) + +**Operational Constraints** + +- Must support air-gapped deployments (no external dependencies at runtime) +- Must operate in containerized environments (Docker, Kubernetes) +- Must respect platform security boundaries (SELinux, AppArmor, SIP) + +## Related Work + +- **GitHub Issue #39**: Cross-platform process enumeration (foundation) +- **GitHub Issue #89**: Complete procmond implementation (parent issue) +- **GitHub Issue #40**: Binary hashing collector (triggered by procmond) +- **GitHub Issue #103**: daemoneye-agent service architecture (integration point) +- **GitHub Issue #64**: Core Tier Functionality Epic (broader context) + +## References + +- Architecture: file:.kiro/steering/structure.md +- Technical Stack: file:.kiro/steering/tech.md +- Development Guide: file:AGENTS.md +- Existing Implementation: file:procmond/src/ +- Data Models: file:daemoneye-lib/src/models/process.rs diff --git a/spec/procmond/specs/Tech_Plan__Complete_Procmond_Implementation.md b/spec/procmond/specs/Tech_Plan__Complete_Procmond_Implementation.md new file mode 100644 index 0000000..88cd7d4 --- /dev/null +++ b/spec/procmond/specs/Tech_Plan__Complete_Procmond_Implementation.md @@ -0,0 +1,996 @@ +# Tech Plan: Complete Procmond Implementation + +## Architectural Approach + +### 1. Core Architectural Decisions + +**Child Process Model** + +- procmond runs as a child process spawned by daemoneye-agent +- daemoneye-agent's CollectorProcessManager handles lifecycle (start/stop/restart) +- Configuration and broker socket path passed via environment variables +- Single service deployment model (operators manage daemoneye-agent only) + +**Startup Coordination (Agent Loading State)** + +- Broker starts before agent spawns collectors (eliminates race condition) +- Agent spawns all configured collectors with broker socket path via environment variable +- Collectors connect to broker and register via RPC +- Collectors report "ready" status after successful registration +- Agent waits for all collectors to report "ready" before dropping privileges +- Agent remains in "loading state" until all configured collectors are ready +- Agent reads collector configuration from config file (defines which collectors to spawn) +- Agent transitions to "steady state" and broadcasts "begin monitoring" to `control.collector.lifecycle` +- All collectors subscribe to `control.collector.lifecycle` and start collection loops on receiving command +- This ensures: (1) no race conditions, (2) agent drops privileges only when safe, (3) coordinated startup + +**Event-Driven Architecture** + +- Replace LocalEventBus with DaemoneyeEventBus for broker communication +- Use embedded broker pattern: daemoneye-agent runs DaemoneyeBroker, procmond connects as client +- Topic-based pub/sub for events: `events.process.*` hierarchy +- RPC patterns for lifecycle management: `control.collector.procmond` + +**Privilege Separation Model** + +- **daemoneye-agent**: Starts privileged, drops privileges after spawning collectors +- **procmond**: Maintains full privileges throughout runtime (restricted attack surface, no network) +- Rationale: procmond needs persistent elevated access for process enumeration; agent has network connectivity (larger attack surface) so drops privileges after initialization + +### 2. Integration Strategy + +**Phase 1: Event Bus Integration (Foundation)** + +- Direct refactoring: Replace LocalEventBus with DaemoneyeEventBus in ProcmondMonitorCollector +- Implement connection management with retry logic (3 attempts at startup, then exit) +- Add event buffering (10MB limit) with replay on reconnection +- Validate connectivity before starting collection (strict validation) + +**Phase 2: RPC Service Implementation (Lifecycle Management)** + +- Create RPC service handler in procmond to receive lifecycle commands +- Implement operations: Start, Stop, Restart, HealthCheck, UpdateConfig, GracefulShutdown +- Add registration/deregistration with daemoneye-agent on startup/shutdown +- Implement heartbeat publishing to `control.health.heartbeat.procmond` + +**Phase 3: Testing (TDD Approach)** + +- Unit tests for event bus integration (>80% coverage target) +- Integration tests for RPC communication +- Cross-platform tests (Linux, macOS, Windows) +- Chaos testing for resilience (connection failures, backpressure) + +**Phase 4: Security Hardening** + +- Implement privilege detection at startup (capabilities on Linux, tokens on Windows) +- Add data sanitization for command-line arguments and environment variables +- Validate security boundaries between procmond and agent +- Security test suite (privilege escalation, injection, DoS) + +**Phase 5: FreeBSD Support** + +- Validate FallbackProcessCollector on FreeBSD 13+ +- Document limitations (basic metadata only, no enhanced features) +- Add platform detection and capability reporting +- Best-effort support (doesn't block Epic completion) + +**Phase 6: Performance Validation** + +- Benchmark process enumeration (target: 1,000 processes in \<100ms) +- Load testing with 10,000+ processes +- Memory profiling (target: \<100MB sustained) +- CPU monitoring (target: \<5% sustained) +- Regression testing to prevent performance degradation + +### 3. Key Trade-offs and Rationale + +**Trade-off 1: Direct Refactoring vs. Parallel Implementation** + +- **Decision**: Direct refactoring (replace LocalEventBus in place) +- **Rationale**: Faster development velocity, simpler codebase, LocalEventBus is internal-only (no external dependencies) +- **Risk Mitigation**: Comprehensive testing before merging, feature branch development + +**Trade-off 2: Event Buffering with Write-Ahead Log** + +- **Decision**: Write-ahead log (WAL) with 10MB buffer and replay on reconnection +- **Rationale**: Prevents data loss during crashes or non-graceful termination, ensures event durability +- **Implementation**: Events persisted to disk before buffering, replayed on restart if procmond crashes +- **Risk Mitigation**: Bounded buffer size, WAL rotation to prevent disk exhaustion, backpressure when buffer full + +**Trade-off 3: Privilege Model** + +- **Decision**: procmond maintains full privileges, agent drops after spawning +- **Rationale**: procmond needs persistent elevated access; agent has larger attack surface (network connectivity) +- **Risk Mitigation**: procmond has no network access, minimal attack surface, runs as child process (isolated) + +**Trade-off 4: FreeBSD Support Level** + +- **Decision**: Best-effort basic enumeration, documented limitations +- **Rationale**: FreeBSD is secondary platform, full feature parity would delay primary platform completion +- **Risk Mitigation**: Clear documentation of limitations, graceful degradation + +### 4. Technical Constraints + +**Platform Constraints** + +- Must support Linux, macOS, Windows (primary), FreeBSD (secondary) +- Must respect platform security boundaries (SELinux, AppArmor, SIP, UAC) +- Must use platform-native APIs for process enumeration + +**Performance Constraints** + +- CPU usage \<5% sustained during continuous monitoring +- Memory usage \<100MB during normal operation +- Process enumeration \<100ms for 1,000 processes (average) +- Event publishing must handle backpressure gracefully + +**Security Constraints** + +- No unsafe code (workspace-level `unsafe_code = "forbid"`) +- All external inputs must be validated and sanitized +- Privilege boundaries must be enforced and tested +- Audit trail for all security-relevant operations + +**Compatibility Constraints** + +- Must maintain backward compatibility with ProcessRecord data model +- Must integrate with existing collector-core framework +- Must use Rust 2024 edition with MSRV 1.91+ +- Must follow workspace-level lints (`warnings = "deny"`) + +### 5. Deployment Architecture + +```mermaid +sequenceDiagram + participant Operator + participant Agent as daemoneye-agent + participant Broker as DaemoneyeBroker
(embedded) + participant Procmond as procmond
(child process) + participant OS as Operating System + + Note over Operator,OS: System Startup + + Operator->>Agent: Start daemoneye-agent (privileged) + Agent->>Broker: Initialize embedded broker + Broker-->>Agent: Broker ready (socket path) + + Agent->>Procmond: Spawn procmond (privileged)
ENV: DAEMONEYE_BROKER_SOCKET + Procmond->>Broker: Connect to broker + Broker-->>Procmond: Connection established + + Procmond->>Broker: Register (RPC)
Topic: control.collector.procmond + Broker->>Agent: Route registration + Agent->>Agent: Wait for all collectors ready + Agent-->>Broker: Registration accepted + Broker-->>Procmond: Registration response + + Agent->>Agent: Drop privileges (after collectors ready) + Agent->>Broker: Send "begin monitoring" command + Broker->>Procmond: Route start command + + Note over Procmond,OS: Continuous Monitoring + + loop Every collection interval + Procmond->>OS: Enumerate processes (privileged) + OS-->>Procmond: Process list with metadata + Procmond->>Procmond: Lifecycle analysis + Procmond->>Broker: Publish events
Topic: events.process.* + Broker->>Agent: Deliver events + + Procmond->>Broker: Publish heartbeat
Topic: control.health.heartbeat.procmond + end + + Note over Operator,OS: Lifecycle Management + + Operator->>Agent: Request health check (via CLI) + Agent->>Broker: Health check RPC
Topic: control.collector.procmond + Broker->>Procmond: Route health check + Procmond-->>Broker: Health status + Broker-->>Agent: Health response + Agent-->>Operator: Display health status + + Note over Operator,OS: Graceful Shutdown + + Operator->>Agent: Stop daemoneye-agent + Agent->>Broker: Graceful shutdown RPC
Topic: control.collector.procmond + Broker->>Procmond: Route shutdown + Procmond->>Procmond: Complete current cycle + Procmond->>Broker: Flush buffered events + Procmond->>Broker: Deregister + Procmond-->>Agent: Exit (success) + Agent->>Broker: Shutdown broker + Agent-->>Operator: Shutdown complete +``` + +--- + +## Data Model + +### 1. Existing Data Models (No Changes Required) + +**ProcessEvent (collector-core)** + +```rust +// Used for event bus communication +pub struct ProcessEvent { + pub pid: u32, + pub ppid: Option, + pub name: String, + pub executable_path: Option, + pub command_line: Vec, + pub start_time: Option, + pub cpu_usage: Option, + pub memory_usage: Option, + pub executable_hash: Option, + pub user_id: Option, + pub accessible: bool, + pub file_exists: bool, + pub timestamp: SystemTime, + pub platform_metadata: Option, +} +``` + +**ProcessRecord (daemoneye-lib)** + +```rust +// Used for database storage +pub struct ProcessRecord { + pub id: ProcessId, + pub name: String, + pub executable_path: Option, + pub command_line: Option, + pub parent_id: Option, + pub start_time: Option>, + pub cpu_usage: Option, + pub memory_usage: Option, + pub status: ProcessStatus, + pub user_id: Option, + pub executable_hash: Option, + // ... additional fields +} +``` + +**ProcessSnapshot (procmond)** + +```rust +// Used for lifecycle tracking +pub struct ProcessSnapshot { + pub pid: u32, + pub ppid: Option, + pub name: String, + pub executable_path: Option, + pub command_line: Vec, + pub start_time: Option, + pub cpu_usage: Option, + pub memory_usage: Option, + pub executable_hash: Option, + pub user_id: Option, + pub accessible: bool, + pub file_exists: bool, + pub snapshot_time: SystemTime, + pub platform_metadata: Option, +} +``` + +**Conversion Functions (Already Exist)** + +- `ProcessEvent` ↔ `ProcessSnapshot`: Bidirectional conversion via `From` trait +- `ProcessRecord` ← `ProcessEvent`: One-way conversion for database storage + +### 2. New Configuration Models + +**EventBusConfig (New)** + +```rust +// Configuration for event bus connection +pub struct EventBusConfig { + pub broker_socket_path: String, // From DAEMONEYE_BROKER_SOCKET env var + pub connection_timeout: Duration, // Default: 10 seconds + pub event_buffer_size_bytes: usize, // Default: 10MB + pub heartbeat_interval: Duration, // Default: 30 seconds + pub enable_event_buffering: bool, // Default: true + pub wal_directory: PathBuf, // Write-ahead log directory + pub wal_max_size_bytes: usize, // Default: 100MB (10x buffer) + pub wal_rotation_threshold: f64, // Default: 0.8 (80% full) + pub backpressure_buffer_threshold: f64, // Default: 0.7 (70% full triggers backpressure) + pub backpressure_interval_multiplier: f64, // Default: 1.5 (increase interval by 50%) +} +``` + +**RpcServiceConfig (New)** + +```rust +// Configuration for RPC service +pub struct RpcServiceConfig { + pub collector_id: String, // Default: "procmond" + pub collector_type: String, // Default: "process-monitor" + pub registration_timeout: Duration, // Default: 10 seconds + pub health_check_timeout: Duration, // Default: 5 seconds + pub graceful_shutdown_timeout: Duration, // Default: 60 seconds +} +``` + +**ActorMessage (New)** + +```rust +// Messages sent to ProcmondMonitorCollector actor +pub enum ActorMessage { + HealthCheck { + respond_to: oneshot::Sender, + }, + UpdateConfig { + config: ProcmondMonitorConfig, + respond_to: oneshot::Sender>, + }, + GracefulShutdown { + respond_to: oneshot::Sender>, + }, + BeginMonitoring, // From control.collector.lifecycle broadcast + AdjustInterval { + new_interval: Duration, // From EventBusConnector backpressure + reason: BackpressureReason, // BufferFull, Reconnecting, etc. + }, +} + +pub enum BackpressureReason { + BufferFull { level_percent: f64 }, + Reconnecting, + WalRotation, +} +``` + +**WriteAheadLogEntry (New)** + +```rust +// Entry in the write-ahead log (bincode serialization) +pub struct WriteAheadLogEntry { + pub sequence: u64, // Monotonic sequence number + pub timestamp: SystemTime, // When event was written + pub event: ProcessEvent, // The actual event + pub checksum: u32, // CRC32 for corruption detection +} +``` + +**WAL File Format:** + +- Binary format using bincode serialization for efficiency +- Sequence-numbered files: `procmond-{sequence:05}.wal` (e.g., `procmond-00001.wal`) +- Each file contains multiple WriteAheadLogEntry records +- Rotation at 80% of max size (80MB of 100MB default) +- Delete WAL file after all events successfully published to broker +- Corruption handling: Skip corrupted entries (CRC32 validation), log warning, continue with next entry + +### 3. Event Bus Message Schemas + +**Registration Message** + +```rust +// Published to: control.collector.procmond (RPC) +pub struct RegistrationRequest { + pub collector_id: String, // "procmond" + pub collector_type: String, // "process-monitor" + pub hostname: String, // System hostname + pub version: Option, // procmond version + pub pid: Option, // procmond PID + pub capabilities: Vec, // ["process"] + pub attributes: HashMap, // Platform-specific attributes + pub heartbeat_interval_ms: Option, // Requested heartbeat interval +} +``` + +**Heartbeat Message** + +```rust +// Published to: control.health.heartbeat.procmond +pub struct HeartbeatData { + pub collector_id: String, // "procmond" + pub timestamp: SystemTime, // Current time + pub sequence: u64, // Monotonic sequence number + pub status: HealthStatus, // Healthy/Degraded/Unhealthy +} +``` + +**Process Event Message** + +```rust +// Published to: events.process.batch or events.process.lifecycle +// Uses existing ProcessEvent struct (no changes needed) +``` + +### 4. Data Flow + +```mermaid +flowchart TD + A[OS Process APIs] -->|Raw Process Data| B[ProcessCollector] + B -->|ProcessEvent| C[LifecycleTracker] + C -->|ProcessSnapshot| C + C -->|ProcessLifecycleEvent| D[ProcmondMonitorCollector
Actor] + D -->|ProcessEvent| E[EventBusConnector] + E -->|Persist| WAL[Write-Ahead Log
Disk] + E -->|Buffer| F[Event Buffer
10MB Memory] + F -->|Publish| G[DaemoneyeEventBus] + G -->|Topic: events.process.*| H[DaemoneyeBroker] + H -->|Deliver| I[daemoneye-agent] + I -->|ProcessRecord| J[Database] + + K[RPC Commands] -->|control.collector.procmond| H + H -->|Route| L[RpcServiceHandler] + L -->|Actor Messages| D + D -->|Oneshot Responses| L + + D -->|Heartbeat| M[RegistrationManager] + M -->|control.health.heartbeat.procmond| H + + WAL -.->|Replay on Restart| E + F -.->|Backpressure 70%| D + + style WAL fill:#ffa,stroke:#333,stroke-width:2px + style F fill:#f9f,stroke:#333,stroke-width:2px + style H fill:#bbf,stroke:#333,stroke-width:2px + style D fill:#afa,stroke:#333,stroke-width:2px +``` + +--- + +## Component Architecture + +### 1. New Components + +**WriteAheadLog (New)** + +- **Responsibility**: Durable event persistence for crash recovery +- **Location**: procmond/src/wal.rs +- **Key Functions**: + - Persist events to disk using bincode serialization (append-only log) + - Use sequence-numbered files: `procmond-{sequence:05}.wal` + - Rotate log files when size reaches 80% of max (80MB of 100MB default) + - Replay events from WAL on startup (crash recovery) + - Delete WAL files after all events successfully published to broker + - Handle WAL corruption (skip corrupted entries with CRC32 validation, log warning, continue) + - Track which events have been published (mark for deletion) + +**EventBusConnector (New)** + +- **Responsibility**: Manage connection to daemoneye-agent's embedded broker with durable event buffering +- **Location**: procmond/src/event_bus_connector.rs +- **Key Functions**: + - Connect to broker via socket path from `DAEMONEYE_BROKER_SOCKET` env var + - Integrate with WriteAheadLog for event persistence (write before buffering) + - Buffer events (10MB limit) when connection lost + - Replay buffered events (from WAL) on reconnection or restart + - Publish events to topic hierarchy (`events.process.*`) + - Dynamic backpressure: Monitor buffer level (70% threshold triggers backpressure) + - Send ActorMessage::AdjustInterval to MonitorCollector via shared channel reference + - Calculate new interval: current_interval * 1.5 (50% increase) + - Release backpressure when buffer drops below 50% (send AdjustInterval with original interval) + +**RpcServiceHandler (New)** + +- **Responsibility**: Handle incoming RPC requests and coordinate with MonitorCollector via actor pattern +- **Location**: procmond/src/rpc_service.rs +- **Key Functions**: + - Subscribe to `control.collector.procmond` topic (for RPC requests) + - Subscribe to `control.collector.lifecycle` topic (for "begin monitoring" broadcast) + - Handle lifecycle operations: Start, Stop, Restart, HealthCheck, UpdateConfig, GracefulShutdown + - Send messages to MonitorCollector actor via bounded mpsc channel (capacity: 100) + - Wait for MonitorCollector responses via oneshot channels + - Return RPC responses with appropriate status codes + - Handle channel full errors (return RPC error if actor channel full) + - Serialize concurrent RPC requests (process one at a time) + +**RegistrationManager (New)** + +- **Responsibility**: Handle collector registration and heartbeat publishing +- **Location**: procmond/src/registration.rs +- **Key Functions**: + - Register with daemoneye-agent on startup via RPC + - Report "ready" status after successful registration + - Publish periodic heartbeats to `control.health.heartbeat.procmond` (every 30 seconds) + - Include health status in heartbeat (Healthy/Degraded/Unhealthy) + - Deregister on graceful shutdown + - Track registration state and heartbeat sequence number + +**ConfigurationManager (Enhanced)** + +- **Responsibility**: Manage configuration with hot-reload support at cycle boundaries +- **Location**: procmond/src/config.rs (enhance existing) +- **Key Functions**: + - Load configuration from environment variables and config files + - Validate configuration changes via RPC + - Apply configuration updates at next collection cycle boundary (atomic) + - Send configuration change message to MonitorCollector actor + - Document which configurations are hot-reloadable vs. require restart + +### 2. Modified Components + +**ProcmondMonitorCollector (Modified)** + +- **Changes**: + - Replace `LocalEventBus` with `DaemoneyeEventBus` (via EventBusConnector) + - Implement actor pattern: Process messages from bounded mpsc channel (capacity: 100) + - Add configuration hot-reload at cycle boundaries (atomic application) + - Enhance health check to include event bus connectivity status + - Wait for "begin monitoring" broadcast on `control.collector.lifecycle` before starting collection loop + - Respond to dynamic interval adjustments from EventBusConnector backpressure + - Provide shared channel reference to EventBusConnector for backpressure signaling +- **Location**: file:procmond/src/monitor_collector.rs + +**main.rs (Modified)** + +- **Changes**: + - Read `DAEMONEYE_BROKER_SOCKET` environment variable + - Initialize WriteAheadLog with configured directory + - Initialize EventBusConnector with WAL integration + - Create bounded mpsc channel (capacity: 100) for actor messages + - Initialize RpcServiceHandler with channel sender and topic subscriptions + - Initialize RegistrationManager for registration and heartbeat + - Pass channel sender to EventBusConnector for backpressure signaling + - Initialize ProcmondMonitorCollector as actor with channel receiver + - Add graceful shutdown coordination with RPC +- **Location**: file:procmond/src/main.rs + +### 3. Component Interactions + +```mermaid +sequenceDiagram + participant Main as main.rs + participant Config as ConfigurationManager + participant EventBus as EventBusConnector + participant Reg as RegistrationManager + participant RPC as RpcServiceHandler + participant Monitor as ProcmondMonitorCollector + participant Collector as ProcessCollector + participant Lifecycle as LifecycleTracker + + Note over Main,Lifecycle: Startup Sequence + + Main->>Config: Load configuration + Config-->>Main: EventBusConfig + RpcServiceConfig + + Main->>EventBus: Connect to broker + EventBus-->>Main: Connection established + + Main->>Reg: Register with agent + Reg->>EventBus: Publish registration (RPC) + EventBus-->>Reg: Registration accepted + + Main->>Main: Wait for "begin monitoring" command + EventBus->>Main: Receive start command from agent + + Main->>RPC: Start RPC service + RPC->>EventBus: Subscribe to control.collector.procmond + + Main->>Monitor: Create collector + Monitor->>Collector: Initialize platform collector + Monitor->>Lifecycle: Initialize lifecycle tracker + + Main->>Monitor: Start monitoring + + Note over Main,Lifecycle: Runtime Operation + + loop Every collection interval + Monitor->>Collector: Collect processes + Collector-->>Monitor: ProcessEvent list + Monitor->>Lifecycle: Update and detect changes + Lifecycle-->>Monitor: ProcessLifecycleEvent list + Monitor->>EventBus: Publish events + EventBus->>EventBus: Write to WAL, then buffer + EventBus->>EventBus: Check buffer level for backpressure + alt Buffer > 70% full + EventBus->>Monitor: Increase collection interval (backpressure) + end + end + + loop Every heartbeat interval + Reg->>EventBus: Publish heartbeat + end + + Note over Main,Lifecycle: RPC Request Handling + + EventBus->>RPC: Incoming RPC request + RPC->>RPC: Parse request + + alt HealthCheck + RPC->>Monitor: Send health check message (actor) + Monitor-->>RPC: Health data via oneshot + RPC->>EventBus: Publish response + else UpdateConfig + RPC->>Config: Validate config changes + Config->>Monitor: Send config update message (actor) + Note over Monitor: Config applied at next cycle boundary + Monitor-->>RPC: Update result via oneshot + RPC->>EventBus: Publish response + else GracefulShutdown + RPC->>Monitor: Send shutdown message (actor) + Monitor->>Monitor: Complete current cycle + Monitor->>EventBus: Flush buffered events + WAL + Monitor-->>RPC: Shutdown ready via oneshot + RPC->>EventBus: Publish response + RPC->>Reg: Deregister + RPC->>Main: Signal shutdown + end + + Note over Main,Lifecycle: Graceful Shutdown + + Main->>Monitor: Stop monitoring + Monitor->>Collector: Cleanup + Monitor->>Lifecycle: Cleanup + Main->>EventBus: Disconnect + Main->>Main: Exit +``` + +### 4. Actor Pattern Coordination + +**ProcmondMonitorCollector as Actor:** + +- Runs in its own task with message processing loop +- Receives messages via mpsc channel from RpcServiceHandler +- Processes messages sequentially (no concurrent state mutations) +- Responds via oneshot channels for request/response patterns + +**Message Types:** + +```rust +enum ActorMessage { + HealthCheck { + respond_to: oneshot::Sender, + }, + UpdateConfig { + config: Config, + respond_to: oneshot::Sender>, + }, + GracefulShutdown { + respond_to: oneshot::Sender>, + }, + BeginMonitoring, // From agent after loading state + AdjustInterval { + new_interval: Duration, + }, // From EventBusConnector backpressure +} +``` + +**Coordination Benefits:** + +- Eliminates race conditions (single-threaded message processing) +- Simplifies state management (no complex locking) +- Clear request/response semantics via oneshot channels +- Serializes concurrent RPC requests automatically + +**Configuration Hot-Reload at Cycle Boundary:** + +- Config update message queued in actor's message channel +- Actor processes message at start of next collection cycle +- Ensures atomic config application (no mid-cycle changes) +- Some configs may require restart (documented in ConfigurationManager) + +### 5. Integration Points + +**With daemoneye-agent:** + +- **BrokerManager**: Spawns procmond as child process, manages lifecycle +- **CollectorProcessManager**: Monitors procmond process health, handles restarts +- **CollectorRegistry**: Tracks procmond registration and heartbeat status +- **RPC Clients**: Sends lifecycle commands to procmond +- **Loading State Management**: + - Agent initializes broker first (before spawning collectors) + - Agent spawns all configured collectors with `DAEMONEYE_BROKER_SOCKET` env var + - Agent waits for all collectors to register and report "ready" status + - Agent drops privileges only after all collectors are ready + - Agent sends "begin monitoring" command to transition collectors to steady state +- **Heartbeat Monitoring**: Agent detects missed heartbeats (3+ consecutive) and takes escalating actions: + 1. Send health check RPC (timeout: 5 seconds) - verify responsiveness + 2. Send graceful shutdown RPC (timeout: 60 seconds) - attempt clean shutdown + 3. Kill procmond process (force termination) - last resort + 4. Restart procmond via CollectorProcessManager - restore service + +**With daemoneye-eventbus:** + +- **DaemoneyeBroker**: Embedded broker that procmond connects to +- **Topic Hierarchy**: `events.process.*` for events, `control.collector.procmond` for RPC +- **RPC Patterns**: Request/response for lifecycle management + +**With collector-core:** + +- **EventSource trait**: ProcmondMonitorCollector implements this interface +- **MonitorCollector trait**: Provides statistics and health check interface +- **ProcessEvent**: Standard event format for process data + +**AgentCollectorConfig (New)** + +```yaml +# Agent configuration file: /etc/daemoneye/agent.yaml +collectors: + - id: procmond + type: process-monitor + binary_path: /usr/bin/procmond + enabled: true + auto_restart: true + startup_timeout_secs: 60 + config: + collection_interval_secs: 30 + enhanced_metadata: true + compute_hashes: false +``` + +### 6. daemoneye-agent Enhancements Required + +**Collector Configuration Loading (New)** + +- Load collector configuration from `/etc/daemoneye/agent.yaml` on startup +- Parse collector list with binary paths, enabled status, and auto-restart settings +- Validate collector binary paths exist and are executable +- Spawn collectors in order defined in configuration file +- Pass collector-specific configuration via environment variables or config files + +**Loading State Management (New)** + +- Add state machine: Loading → Ready → Steady State +- Track collector readiness: Wait for all collectors to report "ready" +- Privilege dropping: Drop privileges only after all collectors ready +- Transition command: Broadcast "begin monitoring" to `control.collector.lifecycle` when entering steady state +- Timeout: If collectors don't report ready within timeout (60s default), fail startup with error + +**Heartbeat Failure Detection (Enhanced)** + +- Monitor heartbeat messages from all collectors +- Track missed heartbeat count per collector (threshold: 3 consecutive) +- Implement escalating recovery actions: + 1. Health check RPC with 5-second timeout + 2. Graceful shutdown RPC with 60-second timeout + 3. Force kill via CollectorProcessManager + 4. Automatic restart via CollectorProcessManager (if auto_restart enabled in config) +- Log all recovery actions for operator visibility +- Emit alerts for repeated collector failures (e.g., 3+ restarts in 10 minutes) + +**Configuration Push (Enhanced)** + +- Validate configuration changes before pushing to collectors +- Send configuration updates via RPC to `control.collector.{collector_id}` +- Track which configurations require restart vs. hot-reload +- Handle configuration update failures (rollback or retry) +- Support configuration validation without applying (validate_only mode) + +### 7. Error Handling Strategy + +**Connection Failures:** + +- Startup: Broker ready before spawn (no retry needed at startup) +- Runtime: Buffer events (10MB limit) with write-ahead log, attempt reconnection, replay on success +- If buffer full: Dynamic interval adjustment - connector increases collection interval by 50% +- WAL persistence: Events written to disk before buffering, replayed on restart after crash +- Reconnection: Exponential backoff (1s, 2s, 4s, 8s, max 30s) with indefinite retries + +**Heartbeat Failures:** + +- Agent detects missed heartbeats (threshold: 3 consecutive misses) +- Escalating recovery actions: + 1. Health check RPC (timeout: 5s) - verify procmond is responsive + 2. Graceful shutdown RPC (timeout: 60s) - attempt clean shutdown + 3. Force kill - terminate procmond process + 4. Restart - spawn new procmond instance +- Heartbeat independence: Heartbeat publishing runs in separate task (not blocked by collection) + +**RPC Failures:** + +- Invalid requests: Return error response with details +- Timeout: Return timeout error after configured duration +- State conflicts: Return error with current state information +- Concurrent requests: Serialize via actor pattern (process one at a time) +- Actor message failures: Return error if actor channel closed or full + +**Collection Failures:** + +- Permission denied: Log error, skip process, continue with others +- Platform API failure: Fall back to basic sysinfo collector +- Timeout: Cancel collection, report degraded health status +- Cycle boundary: Configuration changes applied only at cycle start (atomic) + +**Resource Exhaustion:** + +- Memory approaching limit: Reduce buffer size, disable enhanced metadata, rotate WAL +- CPU usage high: Increase collection interval, reduce metadata collection +- Event buffer full: Dynamic interval adjustment (increase by 50%), WAL rotation +- WAL disk space low: Rotate and compress old WAL files, alert operator + +### 8. Testing Strategy + +**Unit Tests (>80% coverage target):** + +- WriteAheadLog: Persistence, rotation, replay, corruption recovery, compression +- EventBusConnector: Connection, WAL integration, buffering, replay, dynamic backpressure +- RpcServiceHandler: Request parsing, actor message sending, response handling, concurrent request serialization +- RegistrationManager: Registration, "ready" reporting, heartbeat, deregistration +- ConfigurationManager: Loading, validation, cycle-boundary hot-reload, restart detection +- Actor Pattern: Message processing, oneshot responses, channel handling + +**Integration Tests:** + +- Event bus communication: Publish/subscribe, reconnection, buffering +- RPC communication: Lifecycle operations, health checks, config updates +- Cross-platform: Linux, macOS, Windows process enumeration +- Lifecycle tracking: Start/stop/modification detection + +**Chaos Tests:** + +- Connection failures: Broker restart, network interruption +- Backpressure: Slow consumer, high event volume +- Resource limits: Memory constraints, CPU throttling +- Concurrent operations: Multiple RPC requests, collection during shutdown + +**Security Tests:** + +- Privilege escalation: Attempt to gain unauthorized access +- Injection attacks: Malicious process names, command lines +- DoS attacks: Excessive RPC requests, event flooding +- Data sanitization: Verify secrets are not logged or published + +--- + +## Implementation Phases + +### Phase 1: Event Bus Integration (Week 1-2) + +**Goal**: Replace LocalEventBus with DaemoneyeEventBus with durable buffering + +**Tasks:** + +1. Create WriteAheadLog component for event persistence +2. Create EventBusConnector with WAL integration and dynamic backpressure +3. Implement event buffering (10MB limit) with WAL persistence +4. Implement WAL replay on startup (crash recovery) +5. Update ProcmondMonitorCollector to use EventBusConnector and actor pattern +6. Add environment variable reading for broker socket path +7. Implement startup coordination (wait for "begin monitoring" command) +8. Unit tests for WriteAheadLog and EventBusConnector +9. Integration tests for event publishing, WAL replay, and backpressure + +**Success Criteria:** + +- procmond connects to daemoneye-agent's broker on startup +- Events published to `events.process.*` topics +- WAL persists events before buffering +- WAL replay works after crash (events not lost) +- Dynamic backpressure adjusts collection interval when buffer fills +- procmond waits for agent's "begin monitoring" command before starting collection + +### Phase 2: RPC Service Implementation (Week 3-4) + +**Goal**: Enable lifecycle management via RPC with actor pattern coordination + +**Tasks:** + +**procmond Changes:** + +1. Implement actor pattern in ProcmondMonitorCollector (message processing loop) +2. Create ActorMessage enum for actor communication +3. Create RpcServiceHandler with actor message sending via mpsc channel +4. Implement lifecycle operations: Start, Stop, Restart, HealthCheck, UpdateConfig, GracefulShutdown +5. Implement configuration hot-reload at cycle boundaries +6. Create RegistrationManager for registration, "ready" reporting, and heartbeat +7. Implement "begin monitoring" command handling (wait before starting collection) +8. Unit tests for RpcServiceHandler, actor coordination, and RegistrationManager + +**daemoneye-agent Changes:** + +1. Add collector configuration file format (`/etc/daemoneye/agent.yaml`) +2. Implement configuration loading and validation on agent startup +3. Implement loading state management (Loading → Ready → Steady State) +4. Add collector readiness tracking (wait for all collectors to report "ready") +5. Implement privilege dropping after all collectors ready +6. Add "begin monitoring" broadcast to `control.collector.lifecycle` topic +7. Implement heartbeat failure detection with escalating actions: + +- Track missed heartbeats per collector (threshold: 3 consecutive) + - Action 1: Health check RPC (timeout: 5s) + - Action 2: Graceful shutdown RPC (timeout: 60s) + - Action 3: Force kill via CollectorProcessManager + - Action 4: Automatic restart (if auto_restart enabled) + +8. Integration tests for RPC communication and loading state coordination + +**Success Criteria:** + +- procmond registers with daemoneye-agent on startup +- procmond reports "ready" status after registration +- Agent waits for procmond "ready" before dropping privileges +- Agent sends "begin monitoring" command after all collectors ready +- procmond waits for "begin monitoring" before starting collection loop +- Heartbeats published every 30 seconds +- Agent detects missed heartbeats and takes escalating actions (health check → graceful shutdown → kill → restart) +- Health check RPC returns accurate status via actor pattern +- Graceful shutdown RPC completes within timeout +- Configuration update RPC applies changes at next cycle boundary (atomic) + +### Phase 3: Testing (TDD Approach) (Week 5-6) + +**Goal**: Achieve >80% unit coverage, >90% critical path coverage + +**Tasks:** + +1. Expand unit test coverage for all new components +2. Create integration test suite for event bus and RPC +3. Add cross-platform tests (Linux, macOS, Windows) +4. Implement chaos tests for resilience +5. Add security tests for privilege and injection +6. Performance baseline tests + +**Success Criteria:** + +- Unit test coverage >80% +- Critical path coverage >90% (enumeration, event bus, RPC, security) +- All tests pass on Linux, macOS, Windows +- Chaos tests validate resilience to failures + +### Phase 4: Security Hardening (Week 7) + +**Goal**: Implement privilege management and data sanitization + +**Tasks:** + +1. Add privilege detection at startup (capabilities, tokens) +2. Implement data sanitization for command-line args and env vars +3. Validate security boundaries between procmond and agent +4. Add security test suite (privilege escalation, injection, DoS) +5. Document security model and threat analysis + +**Success Criteria:** + +- Privilege detection works on all platforms +- Sensitive data sanitized before logging/publishing +- Security tests pass with no critical vulnerabilities +- Security documentation complete + +### Phase 5: FreeBSD Support (Week 8) + +**Goal**: Validate basic process enumeration on FreeBSD + +**Tasks:** + +1. Test FallbackProcessCollector on FreeBSD 13+ +2. Document limitations (basic metadata only) +3. Add platform detection and capability reporting +4. Create FreeBSD-specific tests +5. Update documentation with FreeBSD support status + +**Success Criteria:** + +- Basic process enumeration works on FreeBSD +- Limitations documented clearly +- Platform detection reports FreeBSD correctly +- Tests pass on FreeBSD 13+ + +### Phase 6: Performance Validation (Week 9) + +**Goal**: Validate performance against targets + +**Tasks:** + +1. Benchmark process enumeration (1,000 processes target: \<100ms) +2. Load testing with 10,000+ processes +3. Memory profiling (target: \<100MB sustained) +4. CPU monitoring (target: \<5% sustained) +5. Regression testing to prevent degradation +6. Performance optimization if targets not met + +**Success Criteria:** + +- Enumerate 1,000 processes in \<100ms (average) +- Support 10,000+ processes without degradation +- Memory usage \<100MB during normal operation +- CPU usage \<5% during continuous monitoring +- No performance regressions + +--- + +## References + +- Epic Brief: spec:54226c8a-719a-479a-863b-9c91f43717a9/0fc3298b-37df-4722-a761-66a5a0da16b3 +- Core Flows: spec:54226c8a-719a-479a-863b-9c91f43717a9/f086f464-1e81-42e8-89f5-74a8638360d1 +- Event Bus Architecture: file:docs/embedded-broker-architecture.md +- Topic Hierarchy: file:daemoneye-eventbus/docs/topic-hierarchy.md +- RPC Patterns: file:daemoneye-eventbus/docs/rpc-patterns.md +- Process Collector: file:procmond/src/process_collector.rs +- Monitor Collector: file:procmond/src/monitor_collector.rs +- Lifecycle Tracker: file:procmond/src/lifecycle.rs +- Broker Manager: file:daemoneye-agent/src/broker_manager.rs +- Collector Registry: file:daemoneye-agent/src/collector_registry.rs diff --git a/spec/procmond/tickets/Implement_Actor_Pattern_and_Startup_Coordination.md b/spec/procmond/tickets/Implement_Actor_Pattern_and_Startup_Coordination.md new file mode 100644 index 0000000..841d0d9 --- /dev/null +++ b/spec/procmond/tickets/Implement_Actor_Pattern_and_Startup_Coordination.md @@ -0,0 +1,210 @@ +# Implement Actor Pattern and Startup Coordination + +## Overview + +Refactor ProcmondMonitorCollector to use actor pattern for coordinated state management and implement startup coordination with daemoneye-agent. This ticket replaces LocalEventBus with DaemoneyeEventBus (via EventBusConnector) and establishes the message-passing architecture for RPC coordination. + +## Scope + +**In Scope:** + +- Actor pattern implementation in ProcmondMonitorCollector +- ActorMessage enum for message-based coordination +- Bounded mpsc channel (capacity: 100) for actor messages +- Replace LocalEventBus with EventBusConnector +- Startup coordination: wait for "begin monitoring" command +- Dynamic interval adjustment from backpressure +- Configuration hot-reload at cycle boundaries +- Enhanced health check with event bus connectivity +- Update main.rs for actor initialization + +**Out of Scope:** + +- RPC service handler implementation (Ticket 3) +- Registration and heartbeat (Ticket 3) +- Agent-side loading state (Ticket 4) +- Comprehensive testing (Ticket 5) + +## Technical Details + +### Actor Pattern Architecture + +**Modified Component:** `file:procmond/src/monitor_collector.rs` + +**Key Changes:** + +- Run in dedicated task with message processing loop +- Receive messages via bounded mpsc channel (capacity: 100) +- Process messages sequentially (no concurrent state mutations) +- Respond via oneshot channels for request/response patterns +- Maintain collection state without complex locking + +**ActorMessage Enum:** + +```rust +enum ActorMessage { + HealthCheck { + respond_to: oneshot::Sender, + }, + UpdateConfig { + config: Config, + respond_to: oneshot::Sender>, + }, + GracefulShutdown { + respond_to: oneshot::Sender>, + }, + BeginMonitoring, // From agent after loading state + AdjustInterval { + new_interval: Duration, + }, // From EventBusConnector backpressure +} +``` + +### Startup Coordination + +**Flow:** + +1. procmond starts and connects to broker +2. procmond subscribes to `control.collector.lifecycle` topic +3. procmond waits for "begin monitoring" broadcast from agent +4. Upon receiving command, procmond starts collection loop + +**Why:** Ensures agent has completed loading state (all collectors ready, privileges dropped) before procmond begins monitoring. + +### Configuration Hot-Reload + +**Strategy:** Apply configuration changes at cycle boundaries (atomic) + +**Implementation:** + +- Config update message queued in actor's channel +- Actor processes message at start of next collection cycle +- Ensures no mid-cycle configuration changes +- Some configs may require restart (documented) + +```mermaid +sequenceDiagram + participant Main as main.rs + participant Actor as ProcmondMonitorCollector (Actor) + participant EventBus as EventBusConnector + participant Collector as ProcessCollector + participant Lifecycle as LifecycleTracker + + Note over Main,Lifecycle: Initialization + Main->>Main: Create bounded mpsc channel (capacity: 100) + Main->>EventBus: Initialize with WAL + Main->>Actor: Create with channel receiver + Main->>Actor: Pass EventBusConnector + + Note over Main,Lifecycle: Startup Coordination + Main->>EventBus: Subscribe to control.collector.lifecycle + EventBus->>Main: Receive "begin monitoring" broadcast + Main->>Actor: Send BeginMonitoring message + Actor->>Actor: Start collection loop + + Note over Main,Lifecycle: Collection Loop + loop Every collection interval + Actor->>Collector: Collect processes + Collector-->>Actor: ProcessEvent list + Actor->>Lifecycle: Update and detect changes + Lifecycle-->>Actor: ProcessLifecycleEvent list + Actor->>EventBus: Publish events + EventBus->>EventBus: Write to WAL, buffer, publish + end + + Note over Main,Lifecycle: Backpressure + EventBus->>EventBus: Buffer reaches 70% + EventBus->>Actor: Send AdjustInterval message + Actor->>Actor: Increase collection interval (1.5x) + Note over Actor: Collection slows down + EventBus->>EventBus: Buffer drops to 50% + EventBus->>Actor: Send AdjustInterval message (restore) + Actor->>Actor: Restore original interval + + Note over Main,Lifecycle: Configuration Update + Main->>Actor: Send UpdateConfig message + Actor->>Actor: Queue config update + Note over Actor: Wait for cycle boundary + Actor->>Actor: Apply config at start of next cycle + Actor->>Main: Send success response via oneshot + + Note over Main,Lifecycle: Graceful Shutdown + Main->>Actor: Send GracefulShutdown message + Actor->>Actor: Complete current cycle + Actor->>EventBus: Flush buffered events + WAL + Actor->>Main: Send ready response via oneshot + Main->>Main: Exit +``` + +## Dependencies + +**Requires:** + +- ticket:54226c8a-719a-479a-863b-9c91f43717a9/[Ticket 1] - EventBusConnector and WAL must exist + +**Blocks:** + +- ticket:54226c8a-719a-479a-863b-9c91f43717a9/[Ticket 3] - RPC service needs actor pattern +- ticket:54226c8a-719a-479a-863b-9c91f43717a9/[Ticket 4] - Agent needs "begin monitoring" subscription + +## Acceptance Criteria + +### Actor Pattern + +- [ ] ProcmondMonitorCollector runs in dedicated task with message loop +- [ ] Bounded mpsc channel (capacity: 100) created for actor messages +- [ ] ActorMessage enum defined with all message types +- [ ] Messages processed sequentially (no concurrent state mutations) +- [ ] Oneshot channels used for request/response patterns +- [ ] Channel full errors handled gracefully (log warning, return error) + +### Event Bus Integration + +- [ ] LocalEventBus completely replaced with EventBusConnector +- [ ] Events published via EventBusConnector to `events.process.*` topics +- [ ] EventBusConnector integrated with actor pattern +- [ ] No compilation errors or warnings + +### Startup Coordination + +- [ ] procmond subscribes to `control.collector.lifecycle` topic +- [ ] procmond waits for "begin monitoring" broadcast before starting collection +- [ ] BeginMonitoring message triggers collection loop start +- [ ] Startup sequence documented in code comments + +### Dynamic Interval Adjustment + +- [ ] Actor receives AdjustInterval messages from EventBusConnector +- [ ] Collection interval increases by 50% (1.5x) when backpressure triggered +- [ ] Collection interval restored to original when backpressure released +- [ ] Interval adjustment logged at INFO level + +### Configuration Hot-Reload + +- [ ] UpdateConfig message queued in actor channel +- [ ] Config applied at start of next collection cycle (atomic) +- [ ] Config validation performed before application +- [ ] Success/failure response sent via oneshot channel +- [ ] Documentation lists which configs are hot-reloadable vs. require restart + +### Health Check Enhancement + +- [ ] HealthCheck message returns event bus connectivity status +- [ ] Health data includes: collection state, buffer level, connection status +- [ ] Response sent via oneshot channel + +### main.rs Updates + +- [ ] Bounded mpsc channel created (capacity: 100) +- [ ] EventBusConnector initialized with WAL +- [ ] ProcmondMonitorCollector initialized as actor with channel receiver +- [ ] Graceful shutdown coordination implemented +- [ ] `DAEMONEYE_BROKER_SOCKET` environment variable read + +## References + +- **Epic Brief:** spec:54226c8a-719a-479a-863b-9c91f43717a9/0fc3298b-37df-4722-a761-66a5a0da16b3 +- **Core Flows:** spec:54226c8a-719a-479a-863b-9c91f43717a9/f086f464-1e81-42e8-89f5-74a8638360d1 (Flow 2: System Startup) +- **Tech Plan:** spec:54226c8a-719a-479a-863b-9c91f43717a9/f70103e2-e7ef-494f-8638-5a7324565f28 (Phase 1, Actor Pattern) +- **Monitor Collector:** file:procmond/src/monitor_collector.rs +- **Main Entry Point:** file:procmond/src/main.rs diff --git a/spec/procmond/tickets/Implement_Agent_Loading_State_and_Heartbeat_Detection.md b/spec/procmond/tickets/Implement_Agent_Loading_State_and_Heartbeat_Detection.md new file mode 100644 index 0000000..b536fdb --- /dev/null +++ b/spec/procmond/tickets/Implement_Agent_Loading_State_and_Heartbeat_Detection.md @@ -0,0 +1,222 @@ +# Implement Agent Loading State and Heartbeat Detection + +## Overview + +Implement loading state management and heartbeat failure detection in daemoneye-agent. This ticket ensures coordinated startup (broker → collectors → privilege drop → steady state) and robust failure detection with escalating recovery actions. + +## Scope + +**In Scope:** + +- Collector configuration file format (`/etc/daemoneye/agent.yaml`) +- Configuration loading and validation on agent startup +- Loading state machine: Loading → Ready → Steady State +- Collector readiness tracking (wait for all collectors to report "ready") +- Privilege dropping after all collectors ready +- "Begin monitoring" broadcast to `control.collector.lifecycle` topic +- Heartbeat failure detection with escalating actions +- Integration tests for loading state and heartbeat detection + +**Out of Scope:** + +- procmond-side changes (Tickets 1-3) +- Comprehensive testing across all collectors (Ticket 5) +- Security hardening (Ticket 6) + +## Technical Details + +### Collector Configuration Format + +**Location:** `/etc/daemoneye/agent.yaml` + +**Schema:** + +```yaml +collectors: + - id: procmond + type: process-monitor + binary_path: /usr/bin/procmond + enabled: true + auto_restart: true + startup_timeout_secs: 60 + config: + collection_interval_secs: 30 + enhanced_metadata: true + compute_hashes: false +``` + +**Configuration Loading:** + +- Load on agent startup +- Validate collector binary paths exist and are executable +- Parse collector-specific configuration +- Spawn collectors in order defined in configuration file + +### Loading State Machine + +**States:** + +1. **Loading**: Agent starting, broker initializing, spawning collectors +2. **Ready**: All collectors registered and reported "ready", privileges dropped +3. **Steady State**: Normal operation, collectors monitoring + +**Transitions:** + +- Loading → Ready: All collectors report "ready" within timeout (60s default) +- Ready → Steady State: Agent broadcasts "begin monitoring" command +- Any → Loading: Agent restart + +**Timeout Handling:** + +- If collectors don't report "ready" within timeout, fail startup with error +- Log which collectors failed to report ready +- Exit with non-zero status code + +### Heartbeat Failure Detection + +**Strategy:** Escalating recovery actions + +**Detection:** + +- Track missed heartbeat count per collector (threshold: 3 consecutive) +- Heartbeat expected every 30 seconds (allow 90 seconds before action) + +**Escalating Actions:** + +1. **Health Check RPC** (timeout: 5 seconds) + + - Send health check RPC to collector + - If response received, reset missed heartbeat count + - If timeout, proceed to action 2 + +2. **Graceful Shutdown RPC** (timeout: 60 seconds) + + - Send graceful shutdown RPC to collector + - Wait for completion or timeout + - If successful, proceed to action 4 (restart) + - If timeout, proceed to action 3 + +3. **Force Kill** (via CollectorProcessManager) + + - Kill collector process (SIGKILL on Unix, TerminateProcess on Windows) + - Log forced termination + - Proceed to action 4 + +4. **Automatic Restart** (if auto_restart enabled) + + - Restart collector via CollectorProcessManager + - Reset missed heartbeat count + - Log restart event + +```mermaid +stateDiagram-v2 + [*] --> Loading: Agent starts + Loading --> Loading: Spawn collectors + Loading --> Ready: All collectors ready + Loading --> [*]: Timeout (fail startup) + Ready --> SteadyState: Broadcast "begin monitoring" + SteadyState --> SteadyState: Normal operation + + state SteadyState { + [*] --> Monitoring + Monitoring --> HealthCheck: 3 missed heartbeats + HealthCheck --> Monitoring: Response received + HealthCheck --> GracefulShutdown: Timeout (5s) + GracefulShutdown --> Restart: Success + GracefulShutdown --> ForceKill: Timeout (60s) + ForceKill --> Restart: Process killed + Restart --> Monitoring: Collector restarted + } + + SteadyState --> [*]: Agent shutdown +``` + +### Component Changes + +**Modified:** `file:daemoneye-agent/src/broker_manager.rs` + +- Add loading state management +- Track collector readiness +- Implement privilege dropping after all collectors ready +- Broadcast "begin monitoring" command + +**Modified:** `file:daemoneye-agent/src/collector_registry.rs` + +- Track heartbeat timestamps per collector +- Detect missed heartbeats (3+ consecutive) +- Implement escalating recovery actions +- Log all recovery actions + +**New:** `file:daemoneye-agent/src/config.rs` + +- Load collector configuration from YAML file +- Validate configuration +- Provide configuration to BrokerManager + +## Dependencies + +**Requires:** + +- ticket:54226c8a-719a-479a-863b-9c91f43717a9/[Ticket 2] - procmond must wait for "begin monitoring" +- ticket:54226c8a-719a-479a-863b-9c91f43717a9/[Ticket 3] - procmond must publish registration and heartbeat + +**Blocks:** + +- ticket:54226c8a-719a-479a-863b-9c91f43717a9/[Ticket 5] - Integration tests need complete startup flow + +## Acceptance Criteria + +### Configuration Loading + +- [ ] Agent loads collector configuration from `/etc/daemoneye/agent.yaml` +- [ ] Configuration validation checks binary paths exist and are executable +- [ ] Collector-specific configuration parsed correctly +- [ ] Invalid configuration causes agent startup failure with clear error message + +### Loading State Management + +- [ ] Agent implements state machine: Loading → Ready → Steady State +- [ ] Agent spawns collectors in order defined in configuration +- [ ] Agent tracks collector readiness (waits for "ready" status from all collectors) +- [ ] Agent drops privileges only after all collectors report "ready" +- [ ] Agent broadcasts "begin monitoring" to `control.collector.lifecycle` when entering steady state +- [ ] Timeout (60s default) causes startup failure if collectors don't report ready +- [ ] Startup failure logs which collectors failed to report ready + +### Heartbeat Failure Detection + +- [ ] Agent tracks heartbeat timestamps per collector +- [ ] Agent detects 3 consecutive missed heartbeats (90 seconds without heartbeat) +- [ ] Escalating actions implemented: + - [ ] Action 1: Health check RPC with 5-second timeout + - [ ] Action 2: Graceful shutdown RPC with 60-second timeout + - [ ] Action 3: Force kill via CollectorProcessManager + - [ ] Action 4: Automatic restart (if auto_restart enabled) +- [ ] All recovery actions logged at WARN or ERROR level +- [ ] Missed heartbeat count reset on successful health check or restart + +### Integration Tests + +- [ ] Test: Agent waits for collector "ready" before dropping privileges +- [ ] Test: Agent broadcasts "begin monitoring" after all collectors ready +- [ ] Test: Agent fails startup if collector doesn't report ready within timeout +- [ ] Test: Agent detects missed heartbeats and takes escalating actions +- [ ] Test: Health check RPC resets missed heartbeat count +- [ ] Test: Graceful shutdown RPC completes successfully +- [ ] Test: Force kill terminates unresponsive collector +- [ ] Test: Automatic restart restores collector after failure + +### Documentation + +- [ ] Configuration file format documented +- [ ] Loading state machine documented with state diagram +- [ ] Heartbeat failure detection documented with escalating actions +- [ ] Timeout values documented and configurable + +## References + +- **Epic Brief:** spec:54226c8a-719a-479a-863b-9c91f43717a9/0fc3298b-37df-4722-a761-66a5a0da16b3 +- **Core Flows:** spec:54226c8a-719a-479a-863b-9c91f43717a9/f086f464-1e81-42e8-89f5-74a8638360d1 (Flow 2: System Startup, Flow 6: Error Handling) +- **Tech Plan:** spec:54226c8a-719a-479a-863b-9c91f43717a9/f70103e2-e7ef-494f-8638-5a7324565f28 (Phase 2, Agent Enhancements) +- **Broker Manager:** file:daemoneye-agent/src/broker_manager.rs +- **Collector Registry:** file:daemoneye-agent/src/collector_registry.rs diff --git a/spec/procmond/tickets/Implement_Comprehensive_Test_Suite.md b/spec/procmond/tickets/Implement_Comprehensive_Test_Suite.md new file mode 100644 index 0000000..7ba672e --- /dev/null +++ b/spec/procmond/tickets/Implement_Comprehensive_Test_Suite.md @@ -0,0 +1,290 @@ +# Implement Comprehensive Test Suite + +## Overview + +Achieve comprehensive test coverage across all procmond components and integration points. This ticket implements unit tests, integration tests, chaos tests, and security tests to meet the >80% unit coverage and >90% critical path coverage targets. + +## Scope + +**In Scope:** + +- Unit tests for all new components (WAL, EventBusConnector, RpcServiceHandler, RegistrationManager, ConfigurationManager) +- Integration tests for event bus and RPC communication +- Cross-platform tests (Linux, macOS, Windows) +- Chaos tests for resilience validation +- Security tests for privilege and injection attacks +- Performance baseline tests +- Test documentation and coverage reporting + +**Out of Scope:** + +- Security hardening implementation (Ticket 6) +- FreeBSD-specific tests (Ticket 7) +- Performance optimization (Ticket 8) + +## Technical Details + +### Unit Tests (>80% Coverage Target) + +**WriteAheadLog Tests:** + +- Persistence: Events written to disk correctly +- Rotation: Files rotate at 80% capacity +- Replay: Events replayed on startup +- Corruption recovery: Corrupted entries skipped with CRC32 validation +- Deletion: WAL files deleted after successful publish + +**EventBusConnector Tests:** + +- Connection: Connects to broker via socket path +- WAL integration: Events written to WAL before buffering +- Buffering: Events buffered when connection lost (10MB limit) +- Replay: Buffered events replayed on reconnection +- Dynamic backpressure: Triggered at 70%, released at 50% + +**RpcServiceHandler Tests:** + +- Request parsing: RPC requests parsed correctly +- Actor message sending: Messages sent to actor via mpsc channel +- Response handling: Responses published with correct status codes +- Concurrent request serialization: Requests processed one at a time +- Channel full errors: Handled gracefully + +**RegistrationManager Tests:** + +- Registration: Registers with agent on startup +- "Ready" reporting: Reports ready status after registration +- Heartbeat: Publishes heartbeats every 30 seconds +- Deregistration: Deregisters on graceful shutdown + +**ConfigurationManager Tests:** + +- Loading: Configuration loaded from files and env vars +- Validation: Invalid configuration rejected +- Cycle-boundary hot-reload: Config applied at cycle boundary +- Restart detection: Configs requiring restart identified + +**Actor Pattern Tests:** + +- Message processing: Messages processed sequentially +- Oneshot responses: Responses sent via oneshot channels +- Channel handling: Bounded channel (capacity: 100) respected + +### Integration Tests + +**Event Bus Communication:** + +- Publish/subscribe: Events published and received correctly +- Reconnection: Connection restored after broker restart +- Buffering: Events buffered and replayed on reconnection +- Topic hierarchy: Events published to correct topics + +**RPC Communication:** + +- Lifecycle operations: Start, Stop, Restart, HealthCheck, UpdateConfig, GracefulShutdown +- Health checks: Accurate health data returned +- Config updates: Configuration applied at cycle boundary +- Graceful shutdown: Completes within timeout + +**Cross-Platform:** + +- Linux: Process enumeration works correctly +- macOS: Process enumeration works correctly +- Windows: Process enumeration works correctly +- Platform-specific metadata: Enhanced metadata collected on each platform + +**Lifecycle Tracking:** + +- Start detection: New processes detected +- Stop detection: Terminated processes detected +- Modification detection: Process changes detected + +### Chaos Tests + +**Connection Failures:** + +- Broker restart: procmond reconnects and replays events +- Network interruption: Events buffered and replayed +- Socket unavailable: procmond retries connection + +**Backpressure:** + +- Slow consumer: Collection interval increases +- High event volume: Backpressure prevents buffer overflow +- Buffer full: Events written to WAL, no data loss + +**Resource Limits:** + +- Memory constraints: procmond operates within 100MB limit +- CPU throttling: procmond maintains \<5% CPU usage +- Disk space: WAL rotation prevents disk exhaustion + +**Concurrent Operations:** + +- Multiple RPC requests: Serialized correctly +- Collection during shutdown: Completes gracefully +- Config update during collection: Applied at cycle boundary + +### Security Tests + +**Privilege Escalation:** + +- Attempt to gain unauthorized access: Fails with error +- Privilege dropping: Agent drops privileges after collectors ready + +**Injection Attacks:** + +- Malicious process names: Sanitized before logging/publishing +- Malicious command lines: Sanitized before logging/publishing +- SQL injection: Not applicable (no SQL in procmond) + +**DoS Attacks:** + +- Excessive RPC requests: Rate-limited or rejected +- Event flooding: Backpressure prevents resource exhaustion + +**Data Sanitization:** + +- Secrets not logged: Environment variables with secrets sanitized +- Secrets not published: Command-line args with secrets sanitized + +### Test Infrastructure + +**Tools:** + +- cargo-nextest: Parallel test execution +- insta: Snapshot testing for CLI output +- criterion: Performance baseline tests +- llvm-cov: Coverage reporting + +**CI Matrix:** + +- Platforms: Linux, macOS, Windows +- Rust: stable, beta, MSRV (1.91+) +- Architectures: x86_64, ARM64 + +```mermaid +graph TD + subgraph "Unit Tests >80%" + U1[WriteAheadLog] + U2[EventBusConnector] + U3[RpcServiceHandler] + U4[RegistrationManager] + U5[ConfigurationManager] + U6[Actor Pattern] + end + + subgraph "Integration Tests" + I1[Event Bus Communication] + I2[RPC Communication] + I3[Cross-Platform] + I4[Lifecycle Tracking] + end + + subgraph "Chaos Tests" + C1[Connection Failures] + C2[Backpressure] + C3[Resource Limits] + C4[Concurrent Operations] + end + + subgraph "Security Tests" + S1[Privilege Escalation] + S2[Injection Attacks] + S3[DoS Attacks] + S4[Data Sanitization] + end + + U1 --> I1 + U2 --> I1 + U3 --> I2 + U4 --> I2 + U5 --> I2 + U6 --> I2 + + I1 --> C1 + I1 --> C2 + I2 --> C4 + + I1 --> S2 + I2 --> S3 + I4 --> S4 +``` + +## Dependencies + +**Requires:** + +- ticket:54226c8a-719a-479a-863b-9c91f43717a9/[Ticket 1] - WAL and EventBusConnector must exist +- ticket:54226c8a-719a-479a-863b-9c91f43717a9/[Ticket 2] - Actor pattern must exist +- ticket:54226c8a-719a-479a-863b-9c91f43717a9/[Ticket 3] - RPC service must exist +- ticket:54226c8a-719a-479a-863b-9c91f43717a9/[Ticket 4] - Agent loading state must exist + +**Blocks:** + +- ticket:54226c8a-719a-479a-863b-9c91f43717a9/[Ticket 6] - Security hardening needs test baseline +- ticket:54226c8a-719a-479a-863b-9c91f43717a9/[Ticket 7] - FreeBSD support needs test framework +- ticket:54226c8a-719a-479a-863b-9c91f43717a9/[Ticket 8] - Performance validation needs baseline tests + +## Acceptance Criteria + +### Unit Test Coverage + +- [ ] WriteAheadLog: >80% coverage +- [ ] EventBusConnector: >80% coverage +- [ ] RpcServiceHandler: >80% coverage +- [ ] RegistrationManager: >80% coverage +- [ ] ConfigurationManager: >80% coverage +- [ ] Actor Pattern: >80% coverage +- [ ] Overall unit test coverage: >80% + +### Critical Path Coverage (>90%) + +- [ ] Process enumeration on all platforms: >90% coverage +- [ ] Event bus communication (publish/subscribe/reconnection): >90% coverage +- [ ] Core monitoring loop and lifecycle detection: >90% coverage +- [ ] All error handling and recovery paths: >90% coverage +- [ ] Security boundaries (privilege management, data sanitization): >90% coverage + +### Integration Tests + +- [ ] Event bus communication tests pass on Linux, macOS, Windows +- [ ] RPC communication tests pass on Linux, macOS, Windows +- [ ] Cross-platform tests pass on Linux, macOS, Windows +- [ ] Lifecycle tracking tests pass on all platforms + +### Chaos Tests + +- [ ] Connection failure tests validate resilience +- [ ] Backpressure tests validate adaptive behavior +- [ ] Resource limit tests validate constraints +- [ ] Concurrent operation tests validate correctness + +### Security Tests + +- [ ] Privilege escalation tests pass (no unauthorized access) +- [ ] Injection attack tests pass (sanitization works) +- [ ] DoS attack tests pass (rate limiting/backpressure works) +- [ ] Data sanitization tests pass (secrets not leaked) + +### Test Infrastructure + +- [ ] cargo-nextest configured for parallel execution +- [ ] insta configured for snapshot testing +- [ ] criterion configured for performance baselines +- [ ] llvm-cov configured for coverage reporting +- [ ] CI matrix configured for Linux, macOS, Windows + +### Documentation + +- [ ] Test strategy documented +- [ ] Coverage targets documented +- [ ] Test execution instructions documented +- [ ] CI/CD integration documented + +## References + +- **Epic Brief:** spec:54226c8a-719a-479a-863b-9c91f43717a9/0fc3298b-37df-4722-a761-66a5a0da16b3 +- **Tech Plan:** spec:54226c8a-719a-879a-863b-9c91f43717a9/f70103e2-e7ef-494f-8638-5a7324565f28 (Phase 3, Testing Strategy) +- **Testing Standards:** file:.cursor/rules/testing/testing-standards.mdc +- **Existing Tests:** file:procmond/tests/ diff --git a/spec/procmond/tickets/Implement_RPC_Service_and_Registration_Manager_(procmond).md b/spec/procmond/tickets/Implement_RPC_Service_and_Registration_Manager_(procmond).md new file mode 100644 index 0000000..5f08dd4 --- /dev/null +++ b/spec/procmond/tickets/Implement_RPC_Service_and_Registration_Manager_(procmond).md @@ -0,0 +1,216 @@ +# Implement RPC Service and Registration Manager (procmond) + +## Overview + +Implement RPC service handling and collector registration for procmond. This ticket enables lifecycle management via RPC (health checks, config updates, graceful shutdown) and establishes registration/heartbeat communication with daemoneye-agent. + +## Scope + +**In Scope:** + +- RpcServiceHandler component with actor coordination +- RegistrationManager component for registration and heartbeat +- RPC operation handling: HealthCheck, UpdateConfig, GracefulShutdown +- Subscription to `control.collector.procmond` topic +- Registration via RPC on startup +- "Ready" status reporting after registration +- Periodic heartbeat publishing (every 30 seconds) +- Deregistration on graceful shutdown +- Unit tests for RPC and registration + +**Out of Scope:** + +- Agent-side loading state management (Ticket 4) +- Agent-side heartbeat detection (Ticket 4) +- Comprehensive integration testing (Ticket 5) +- Security hardening (Ticket 6) + +## Technical Details + +### RpcServiceHandler Component + +**Location:** `file:procmond/src/rpc_service.rs` + +**Key Responsibilities:** + +- Subscribe to `control.collector.procmond` topic for RPC requests +- Parse incoming RPC requests +- Send ActorMessage to ProcmondMonitorCollector via mpsc channel +- Wait for responses via oneshot channels +- Publish RPC responses with appropriate status codes +- Handle channel full errors gracefully +- Serialize concurrent RPC requests (process one at a time) + +**Supported Operations:** + +- **HealthCheck**: Query collector health and status +- **UpdateConfig**: Apply configuration changes at cycle boundary +- **GracefulShutdown**: Initiate clean shutdown with event flush + +### RegistrationManager Component + +**Location:** `file:procmond/src/registration.rs` + +**Key Responsibilities:** + +- Register with daemoneye-agent on startup via RPC +- Report "ready" status after successful registration +- Publish periodic heartbeats to `control.health.heartbeat.procmond` (every 30 seconds) +- Include health status in heartbeat: Healthy/Degraded/Unhealthy +- Track registration state and heartbeat sequence number +- Deregister on graceful shutdown + +**Registration Message Schema:** + +```rust +struct RegistrationRequest { + collector_id: String, // "procmond" + collector_type: String, // "process-monitor" + version: String, + capabilities: Vec, + pid: u32, +} + +struct RegistrationResponse { + status: RegistrationStatus, // Accepted/Rejected + message: Option, +} +``` + +**Heartbeat Message Schema:** + +```rust +struct HeartbeatMessage { + collector_id: String, + sequence: u64, + timestamp: DateTime, + health_status: HealthStatus, // Healthy/Degraded/Unhealthy + metrics: HeartbeatMetrics, +} + +struct HeartbeatMetrics { + processes_collected: u64, + events_published: u64, + buffer_level_percent: f64, + connection_status: ConnectionStatus, +} +``` + +```mermaid +sequenceDiagram + participant Main as main.rs + participant Reg as RegistrationManager + participant RPC as RpcServiceHandler + participant Actor as ProcmondMonitorCollector + participant EventBus as EventBusConnector + participant Agent as daemoneye-agent + + Note over Main,Agent: Startup Registration + Main->>Reg: Initialize + Reg->>EventBus: Publish registration request (RPC) + EventBus->>Agent: Forward registration + Agent-->>EventBus: Registration accepted + EventBus->>Reg: Registration response + Reg->>EventBus: Publish "ready" status + Reg->>Reg: Start heartbeat task + + Note over Main,Agent: Heartbeat Loop + loop Every 30 seconds + Reg->>Actor: Query health metrics + Actor-->>Reg: Health data + Reg->>EventBus: Publish heartbeat + end + + Note over Main,Agent: RPC Request Handling + Agent->>EventBus: Send health check RPC + EventBus->>RPC: Receive request + RPC->>RPC: Parse request + RPC->>Actor: Send HealthCheck message (actor) + Actor-->>RPC: Health data via oneshot + RPC->>EventBus: Publish RPC response + EventBus->>Agent: Forward response + + Note over Main,Agent: Configuration Update + Agent->>EventBus: Send config update RPC + EventBus->>RPC: Receive request + RPC->>RPC: Validate config + RPC->>Actor: Send UpdateConfig message (actor) + Note over Actor: Config applied at cycle boundary + Actor-->>RPC: Update result via oneshot + RPC->>EventBus: Publish RPC response + + Note over Main,Agent: Graceful Shutdown + Agent->>EventBus: Send graceful shutdown RPC + EventBus->>RPC: Receive request + RPC->>Actor: Send GracefulShutdown message (actor) + Actor->>Actor: Complete current cycle + Actor->>EventBus: Flush buffered events + WAL + Actor-->>RPC: Shutdown ready via oneshot + RPC->>EventBus: Publish RPC response + RPC->>Reg: Deregister + Reg->>EventBus: Publish deregistration + RPC->>Main: Signal shutdown + Main->>Main: Exit +``` + +## Dependencies + +**Requires:** + +- ticket:54226c8a-719a-479a-863b-9c91f43717a9/[Ticket 2] - Actor pattern must exist for message coordination + +**Blocks:** + +- ticket:54226c8a-719a-479a-863b-9c91f43717a9/[Ticket 4] - Agent needs registration/heartbeat handling +- ticket:54226c8a-719a-879a-863b-9c91f43717a9/[Ticket 5] - Integration tests need RPC functionality + +## Acceptance Criteria + +### RpcServiceHandler + +- [ ] Subscribes to `control.collector.procmond` topic on startup +- [ ] Parses incoming RPC requests correctly +- [ ] Sends ActorMessage to ProcmondMonitorCollector via mpsc channel +- [ ] Waits for responses via oneshot channels +- [ ] Publishes RPC responses with correct status codes +- [ ] Handles channel full errors gracefully (logs warning, returns error) +- [ ] Serializes concurrent RPC requests (processes one at a time) +- [ ] Unit tests cover: request parsing, actor coordination, response handling, error cases + +### RegistrationManager + +- [ ] Registers with daemoneye-agent on startup via RPC +- [ ] Reports "ready" status after successful registration +- [ ] Publishes heartbeats every 30 seconds to `control.health.heartbeat.procmond` +- [ ] Includes health status in heartbeat: Healthy/Degraded/Unhealthy +- [ ] Includes metrics in heartbeat: processes collected, events published, buffer level, connection status +- [ ] Tracks registration state and heartbeat sequence number +- [ ] Deregisters on graceful shutdown +- [ ] Unit tests cover: registration, heartbeat publishing, deregistration, state tracking + +### RPC Operations + +- [ ] **HealthCheck**: Returns accurate health data including event bus connectivity +- [ ] **UpdateConfig**: Validates config, sends to actor, returns success/failure +- [ ] **GracefulShutdown**: Coordinates with actor, waits for completion, signals main + +### Integration with Actor + +- [ ] RPC operations correctly coordinate with actor via messages +- [ ] Oneshot channels used for request/response patterns +- [ ] No race conditions or deadlocks +- [ ] Graceful handling of actor channel full errors + +### main.rs Updates + +- [ ] RpcServiceHandler initialized and started +- [ ] RegistrationManager initialized and started +- [ ] Graceful shutdown coordination includes RPC and registration cleanup + +## References + +- **Epic Brief:** spec:54226c8a-719a-479a-863b-9c91f43717a9/0fc3298b-37df-4722-a761-66a5a0da16b3 +- **Core Flows:** spec:54226c8a-719a-479a-863b-9c91f43717a9/f086f464-1e81-42e8-89f5-74a8638360d1 (Flow 5: Configuration Update, Flow 7: Graceful Shutdown) +- **Tech Plan:** spec:54226c8a-719a-479a-863b-9c91f43717a9/f70103e2-e7ef-494f-8638-5a7324565f28 (Phase 2, RPC Service) +- **RPC Patterns:** file:daemoneye-eventbus/docs/rpc-patterns.md +- **Topic Hierarchy:** file:daemoneye-eventbus/docs/topic-hierarchy.md diff --git a/spec/procmond/tickets/Implement_Security_Hardening_and_Data_Sanitization.md b/spec/procmond/tickets/Implement_Security_Hardening_and_Data_Sanitization.md new file mode 100644 index 0000000..c5874df --- /dev/null +++ b/spec/procmond/tickets/Implement_Security_Hardening_and_Data_Sanitization.md @@ -0,0 +1,251 @@ +# Implement Security Hardening and Data Sanitization + +## Overview + +Implement privilege management and data sanitization for procmond. This ticket ensures procmond operates with appropriate privileges, detects privilege requirements at startup, and sanitizes sensitive data before logging or publishing. + +## Scope + +**In Scope:** + +- Privilege detection at startup (capabilities on Linux, tokens on Windows, entitlements on macOS) +- Data sanitization for command-line arguments and environment variables +- Security boundary validation between procmond and agent +- Security test suite (privilege escalation, injection, DoS) +- Security documentation and threat analysis + +**Out of Scope:** + +- FreeBSD privilege management (Ticket 7) +- Performance optimization (Ticket 8) +- Advanced security features (kernel monitoring, sandboxing) + +## Technical Details + +### Privilege Detection + +**Linux:** + +- Detect CAP_SYS_PTRACE capability for full process access +- Detect CAP_DAC_READ_SEARCH for reading /proc +- Log detected capabilities at startup +- Gracefully degrade if capabilities insufficient (basic enumeration only) + +**macOS:** + +- Detect task_for_pid() entitlements +- Check for root privileges +- Log detected privileges at startup +- Gracefully degrade if privileges insufficient + +**Windows:** + +- Detect SeDebugPrivilege token +- Check for Administrator privileges +- Log detected privileges at startup +- Gracefully degrade if privileges insufficient + +**Implementation:** + +```rust +struct PrivilegeStatus { + platform: Platform, + has_full_access: bool, + capabilities: Vec, + degraded_mode: bool, +} + +fn detect_privileges() -> Result { + #[cfg(target_os = "linux")] + return detect_linux_capabilities(); + + #[cfg(target_os = "macos")] + return detect_macos_privileges(); + + #[cfg(target_os = "windows")] + return detect_windows_privileges(); +} +``` + +### Data Sanitization + +**Sensitive Data Patterns:** + +- Environment variables: `PASSWORD`, `SECRET`, `TOKEN`, `KEY`, `API_KEY`, `AUTH` +- Command-line arguments: `--password`, `--secret`, `--token`, `--api-key` +- File paths: `/home/*/.ssh/`, `/home/*/.aws/`, `C:\Users\*\.ssh\` + +**Sanitization Strategy:** + +- Replace sensitive values with `[REDACTED]` +- Log sanitization events at DEBUG level +- Apply sanitization before logging and before publishing to event bus + +**Implementation:** + +```rust +fn sanitize_command_line(cmd: &str) -> String { + let sensitive_patterns = [ + "--password", + "--secret", + "--token", + "--api-key", + "-p", + "-s", + "-t", + "-k", + ]; + + // Replace sensitive argument values with [REDACTED] + // Example: "--password secret123" -> "--password [REDACTED]" +} + +fn sanitize_env_vars(env: &HashMap) -> HashMap { + let sensitive_keys = ["PASSWORD", "SECRET", "TOKEN", "KEY", "API_KEY", "AUTH"]; + + // Replace sensitive values with [REDACTED] + // Example: {"API_KEY": "abc123"} -> {"API_KEY": "[REDACTED]"} +} +``` + +### Security Boundaries + +**Validation:** + +- procmond runs with elevated privileges (full process access) +- daemoneye-agent runs with minimal privileges (dropped after spawning collectors) +- Event bus communication uses Unix domain sockets (Linux/macOS) or named pipes (Windows) +- No network communication from procmond (only local IPC) +- WAL files protected with appropriate permissions (0600) + +**Threat Model:** + +- **Threat 1**: Attacker gains access to procmond process → Limited impact (no network, read-only process data) +- **Threat 2**: Attacker gains access to agent process → Cannot access privileged process data (privilege separation) +- **Threat 3**: Attacker intercepts event bus communication → Mitigated by local IPC (no network exposure) +- **Threat 4**: Attacker reads WAL files → Mitigated by file permissions (0600) + +### Security Test Suite + +**Privilege Escalation Tests:** + +- Attempt to gain unauthorized access to processes +- Verify privilege detection works correctly +- Verify graceful degradation when privileges insufficient + +**Injection Attack Tests:** + +- Malicious process names with special characters +- Malicious command lines with SQL injection attempts +- Malicious environment variables with code injection attempts + +**DoS Attack Tests:** + +- Excessive RPC requests (rate limiting) +- Event flooding (backpressure) +- Resource exhaustion (memory/CPU limits) + +**Data Sanitization Tests:** + +- Verify sensitive data sanitized in logs +- Verify sensitive data sanitized in published events +- Verify sanitization patterns cover common secrets + +```mermaid +graph TD + subgraph "Privilege Detection" + P1[Linux: CAP_SYS_PTRACE] + P2[macOS: task_for_pid] + P3[Windows: SeDebugPrivilege] + P1 --> P4[Log Capabilities] + P2 --> P4 + P3 --> P4 + P4 --> P5{Full Access?} + P5 -->|Yes| P6[Full Enumeration] + P5 -->|No| P7[Degraded Mode] + end + + subgraph "Data Sanitization" + S1[Command-Line Args] + S2[Environment Variables] + S3[File Paths] + S1 --> S4[Detect Sensitive Patterns] + S2 --> S4 + S3 --> S4 + S4 --> S5[Replace with REDACTED] + S5 --> S6[Log Sanitization] + S5 --> S7[Publish Sanitized Data] + end + + subgraph "Security Boundaries" + B1[procmond: Elevated] + B2[agent: Minimal] + B3[Event Bus: Local IPC] + B4[WAL: 0600 Permissions] + B1 --> B3 + B2 --> B3 + B3 --> B5[No Network Exposure] + B4 --> B5 + end +``` + +## Dependencies + +**Requires:** + +- ticket:54226c8a-719a-479a-863b-9c91f43717a9/[Ticket 5] - Test framework must exist + +**Blocks:** + +- ticket:54226c8a-719a-479a-863b-9c91f43717a9/[Ticket 8] - Performance validation needs security baseline + +## Acceptance Criteria + +### Privilege Detection + +- [ ] Linux: CAP_SYS_PTRACE and CAP_DAC_READ_SEARCH detected correctly +- [ ] macOS: task_for_pid() entitlements and root privileges detected correctly +- [ ] Windows: SeDebugPrivilege and Administrator privileges detected correctly +- [ ] Detected privileges logged at startup (INFO level) +- [ ] Graceful degradation when privileges insufficient (basic enumeration only) +- [ ] Degraded mode logged at WARN level + +### Data Sanitization + +- [ ] Command-line arguments sanitized before logging +- [ ] Command-line arguments sanitized before publishing to event bus +- [ ] Environment variables sanitized before logging +- [ ] Environment variables sanitized before publishing to event bus +- [ ] Sensitive patterns detected: PASSWORD, SECRET, TOKEN, KEY, API_KEY, AUTH +- [ ] Sanitization events logged at DEBUG level +- [ ] Sanitized values replaced with `[REDACTED]` + +### Security Boundaries + +- [ ] procmond runs with elevated privileges (full process access) +- [ ] daemoneye-agent runs with minimal privileges (dropped after spawning) +- [ ] Event bus communication uses local IPC (no network) +- [ ] WAL files protected with 0600 permissions +- [ ] No network communication from procmond + +### Security Test Suite + +- [ ] Privilege escalation tests pass (no unauthorized access) +- [ ] Injection attack tests pass (malicious data sanitized) +- [ ] DoS attack tests pass (rate limiting/backpressure works) +- [ ] Data sanitization tests pass (secrets not leaked in logs or events) + +### Documentation + +- [ ] Privilege detection documented for all platforms +- [ ] Data sanitization patterns documented +- [ ] Security boundaries documented with threat model +- [ ] Security test suite documented +- [ ] Threat analysis documented + +## References + +- **Epic Brief:** spec:54226c8a-719a-479a-863b-9c91f43717a9/0fc3298b-37df-4722-a761-66a5a0da16b3 +- **Tech Plan:** spec:54226c8a-719a-479a-863b-9c91f43717a9/f70103e2-e7ef-494f-8638-5a7324565f28 (Phase 4, Security Hardening) +- **Security Standards:** file:.cursor/rules/security/security-standards.mdc +- **Security Design:** file:docs/src/technical/security_design_overview.md diff --git a/spec/procmond/tickets/Implement_Write-Ahead_Log_and_Event_Bus_Connector.md b/spec/procmond/tickets/Implement_Write-Ahead_Log_and_Event_Bus_Connector.md new file mode 100644 index 0000000..16c942c --- /dev/null +++ b/spec/procmond/tickets/Implement_Write-Ahead_Log_and_Event_Bus_Connector.md @@ -0,0 +1,154 @@ +# Implement Write-Ahead Log and Event Bus Connector + +## Overview + +Implement durable event persistence and broker connectivity for procmond. This ticket establishes the foundation for reliable event delivery with crash recovery by creating the Write-Ahead Log (WAL) component and EventBusConnector that integrates with daemoneye-eventbus. + +## Scope + +**In Scope:** + +- WriteAheadLog component with bincode serialization +- Sequence-numbered WAL files with rotation at 80% capacity +- CRC32 corruption detection and recovery +- EventBusConnector with WAL integration +- Event buffering (10MB limit) with replay capability +- Connection to broker via `DAEMONEYE_BROKER_SOCKET` environment variable +- Dynamic backpressure monitoring (70% threshold) +- Unit tests for WAL and EventBusConnector + +**Out of Scope:** + +- Actor pattern implementation (Ticket 2) +- RPC service handling (Ticket 3) +- Agent-side changes (Ticket 4) +- Integration testing (Ticket 5) + +## Technical Details + +### WriteAheadLog Component + +**Location:** `file:procmond/src/wal.rs` + +**Key Responsibilities:** + +- Persist events to disk before buffering (durability guarantee) +- Use sequence-numbered files: `procmond-{sequence:05}.wal` +- Rotate when file reaches 80MB (80% of 100MB max) +- Replay events on startup for crash recovery +- Delete WAL files after successful publish +- Handle corruption with CRC32 validation + +**File Format:** + +```rust +// Bincode-serialized records with CRC32 checksums +struct WalEntry { + sequence: u64, + timestamp: DateTime, + event: ProcessEvent, + crc32: u32, +} +``` + +### EventBusConnector Component + +**Location:** `file:procmond/src/event_bus_connector.rs` + +**Key Responsibilities:** + +- Connect to daemoneye-agent's embedded broker +- Integrate with WriteAheadLog for event persistence +- Buffer events (10MB limit) when connection lost +- Replay buffered events from WAL on reconnection +- Publish to topic hierarchy: `events.process.*` +- Monitor buffer level for backpressure (70% threshold) +- Provide shared channel reference for backpressure signaling + +**Backpressure Strategy:** + +- Trigger at 70% buffer capacity +- Release at 50% buffer capacity +- Signal via shared mpsc channel (to be used by Ticket 2) + +```mermaid +sequenceDiagram + participant WAL as WriteAheadLog + participant Connector as EventBusConnector + participant Broker as DaemoneyeBroker + + Note over WAL,Broker: Normal Operation + Connector->>WAL: Write event to WAL + WAL-->>Connector: Persisted (sequence number) + Connector->>Broker: Publish event + Broker-->>Connector: Acknowledged + Connector->>WAL: Mark for deletion + + Note over WAL,Broker: Connection Lost + Connector->>WAL: Write event to WAL + WAL-->>Connector: Persisted + Connector->>Connector: Buffer event (in-memory) + Note over Connector: Buffer reaches 70% + Connector->>Connector: Trigger backpressure + + Note over WAL,Broker: Reconnection + Connector->>Broker: Reconnect + Connector->>WAL: Read unpublished events + WAL-->>Connector: Event list + Connector->>Broker: Replay events + Broker-->>Connector: Acknowledged + Connector->>WAL: Delete WAL files + Note over Connector: Buffer drops below 50% + Connector->>Connector: Release backpressure +``` + +## Dependencies + +**Requires:** + +- daemoneye-eventbus client library +- Existing ProcessEvent data model from `file:daemoneye-lib/src/models/process.rs` + +**Blocks:** + +- ticket:54226c8a-719a-479a-863b-9c91f43717a9/[Ticket 2] - Actor pattern needs EventBusConnector +- ticket:54226c8a-719a-479a-863b-9c91f43717a9/[Ticket 3] - RPC service needs event bus connectivity + +## Acceptance Criteria + +### WriteAheadLog + +- [ ] Events persisted to disk using bincode serialization +- [ ] Sequence-numbered files created: `procmond-00001.wal`, `procmond-00002.wal`, etc. +- [ ] File rotation occurs at 80MB (80% of 100MB max) +- [ ] WAL replay works correctly on startup (all unpublished events recovered) +- [ ] Corrupted entries detected via CRC32 and skipped with warning log +- [ ] WAL files deleted after all events successfully published +- [ ] Unit tests cover: persistence, rotation, replay, corruption recovery + +### EventBusConnector + +- [ ] Connects to broker using `DAEMONEYE_BROKER_SOCKET` environment variable +- [ ] Events written to WAL before buffering +- [ ] Events buffered (10MB limit) when connection lost +- [ ] Buffered events replayed on reconnection +- [ ] Events published to correct topics: `events.process.start`, `events.process.stop`, `events.process.modify` +- [ ] Backpressure triggered at 70% buffer capacity +- [ ] Backpressure released at 50% buffer capacity +- [ ] Shared channel reference provided for backpressure signaling +- [ ] Unit tests cover: connection, WAL integration, buffering, replay, backpressure + +### Integration + +- [ ] EventBusConnector successfully integrates with WriteAheadLog +- [ ] Events survive procmond crash and are replayed on restart +- [ ] No data loss during connection failures +- [ ] Backpressure mechanism ready for actor integration (Ticket 2) + +## References + +- **Epic Brief:** spec:54226c8a-719a-479a-863b-9c91f43717a9/0fc3298b-37df-4722-a761-66a5a0da16b3 +- **Tech Plan:** spec:54226c8a-719a-479a-863b-9c91f43717a9/f70103e2-e7ef-494f-8638-5a7324565f28 (Phase 1, Component Architecture) +- **Event Bus Architecture:** file:docs/embedded-broker-architecture.md +- **Topic Hierarchy:** file:daemoneye-eventbus/docs/topic-hierarchy.md +- **Process Models:** file:daemoneye-lib/src/models/process.rs diff --git a/spec/procmond/tickets/Validate_FreeBSD_Platform_Support.md b/spec/procmond/tickets/Validate_FreeBSD_Platform_Support.md new file mode 100644 index 0000000..465fc11 --- /dev/null +++ b/spec/procmond/tickets/Validate_FreeBSD_Platform_Support.md @@ -0,0 +1,187 @@ +# Validate FreeBSD Platform Support + +## Overview + +Validate basic process enumeration on FreeBSD and document platform limitations. This ticket ensures procmond works on FreeBSD 13+ with basic metadata collection using the FallbackProcessCollector. + +## Scope + +**In Scope:** + +- Test FallbackProcessCollector on FreeBSD 13+ +- Document FreeBSD limitations (basic metadata only, no enhanced features) +- Add platform detection and capability reporting +- Create FreeBSD-specific tests +- Update documentation with FreeBSD support status + +**Out of Scope:** + +- Enhanced metadata collection for FreeBSD (deferred to future work) +- FreeBSD-specific privilege management (basic only) +- Performance optimization for FreeBSD + +## Technical Details + +### FreeBSD Support Status + +**Current State:** + +- FallbackProcessCollector uses sysinfo crate for basic enumeration +- Basic metadata: PID, PPID, name, executable path, CPU usage, memory usage +- No enhanced metadata: network connections, file descriptors, security contexts + +**Limitations:** + +- No platform-specific collector (unlike Linux, macOS, Windows) +- Limited metadata compared to primary platforms +- Performance may be lower than platform-specific collectors + +**Acceptance:** + +- FreeBSD support is "best-effort" with documented limitations +- Basic enumeration is sufficient for FreeBSD use cases +- Enhanced features deferred to future work + +### Platform Detection + +**Implementation:** + +```rust +#[cfg(target_os = "freebsd")] +fn detect_platform_capabilities() -> PlatformCapabilities { + PlatformCapabilities { + platform: Platform::FreeBSD, + collector_type: CollectorType::Fallback, + enhanced_metadata: false, + network_connections: false, + file_descriptors: false, + security_contexts: false, + } +} +``` + +**Capability Reporting:** + +- Report platform capabilities at startup +- Log degraded status for FreeBSD (INFO level) +- Include capabilities in registration message to agent + +### FreeBSD-Specific Tests + +**Test Coverage:** + +- Basic process enumeration works +- PID, PPID, name, executable path collected correctly +- CPU usage and memory usage collected correctly +- Process lifecycle detection works (start/stop/modify) +- Event publishing works correctly +- No crashes or panics on FreeBSD + +**Test Environment:** + +- FreeBSD 13.0+ (latest stable) +- x86_64 and ARM64 architectures +- CI/CD integration (if FreeBSD runner available) + +```mermaid +graph TD + subgraph "Platform Detection" + D1[Detect OS: FreeBSD] + D1 --> D2[Use FallbackProcessCollector] + D2 --> D3[Report Capabilities] + D3 --> D4{Enhanced Metadata?} + D4 -->|No| D5[Log Degraded Status] + D4 -->|Yes| D6[Full Features] + end + + subgraph "FreeBSD Testing" + T1[Basic Enumeration] + T2[Lifecycle Detection] + T3[Event Publishing] + T1 --> T4[Validate Metadata] + T2 --> T4 + T3 --> T4 + T4 --> T5{Tests Pass?} + T5 -->|Yes| T6[FreeBSD Supported] + T5 -->|No| T7[Document Issues] + end + + subgraph "Documentation" + DOC1[Support Status] + DOC2[Limitations] + DOC3[Capabilities] + DOC1 --> DOC4[Update Docs] + DOC2 --> DOC4 + DOC3 --> DOC4 + end +``` + +## Dependencies + +**Requires:** + +- ticket:54226c8a-719a-479a-863b-9c91f43717a9/[Ticket 5] - Test framework must exist + +**Blocks:** + +- ticket:54226c8a-719a-479a-863b-9c91f43717a9/[Ticket 8] - Performance validation includes FreeBSD + +## Acceptance Criteria + +### Platform Detection + +- [ ] FreeBSD detected correctly at runtime +- [ ] FallbackProcessCollector used on FreeBSD +- [ ] Platform capabilities reported at startup +- [ ] Degraded status logged for FreeBSD (INFO level) +- [ ] Capabilities included in registration message + +### Basic Enumeration + +- [ ] Process enumeration works on FreeBSD 13+ +- [ ] PID collected correctly +- [ ] PPID collected correctly +- [ ] Process name collected correctly +- [ ] Executable path collected correctly +- [ ] CPU usage collected correctly +- [ ] Memory usage collected correctly + +### Lifecycle Detection + +- [ ] Process start events detected +- [ ] Process stop events detected +- [ ] Process modification events detected +- [ ] Events published to event bus correctly + +### FreeBSD-Specific Tests + +- [ ] Basic enumeration tests pass on FreeBSD +- [ ] Lifecycle detection tests pass on FreeBSD +- [ ] Event publishing tests pass on FreeBSD +- [ ] No crashes or panics on FreeBSD +- [ ] Tests run on x86_64 and ARM64 (if available) + +### Documentation + +- [ ] FreeBSD support status documented (best-effort, basic metadata only) +- [ ] Limitations documented clearly: + - No enhanced metadata + - No network connections + - No file descriptors + - No security contexts +- [ ] Platform capabilities documented +- [ ] Future work documented (enhanced FreeBSD support) + +### CI/CD Integration + +- [ ] FreeBSD tests added to CI/CD pipeline (if runner available) +- [ ] FreeBSD tests run on pull requests +- [ ] FreeBSD test failures reported clearly + +## References + +- **Epic Brief:** spec:54226c8a-719a-479a-863b-9c91f43717a9/0fc3298b-37df-4722-a761-66a5a0da16b3 +- **Core Flows:** spec:54226c8a-719a-479a-863b-9c91f43717a9/f086f464-1e81-42e8-89f5-74a8638360d1 (Flow 10: Cross-Platform Behavior) +- **Tech Plan:** spec:54226c8a-719a-479a-863b-9c91f43717a9/f70103e2-e7ef-494f-8638-5a7324565f28 (Phase 5, FreeBSD Support) +- **Process Collector:** file:procmond/src/process_collector.rs +- **Existing Tests:** file:procmond/tests/os_compatibility_tests.rs diff --git a/spec/procmond/tickets/Validate_Performance_and_Optimize.md b/spec/procmond/tickets/Validate_Performance_and_Optimize.md new file mode 100644 index 0000000..47e53f7 --- /dev/null +++ b/spec/procmond/tickets/Validate_Performance_and_Optimize.md @@ -0,0 +1,226 @@ +# Validate Performance and Optimize + +## Overview + +Validate procmond performance against targets and optimize if needed. This ticket ensures procmond meets performance requirements: enumerate 1,000 processes in \<100ms, support 10,000+ processes, use \<100MB memory, and maintain \<5% CPU usage. + +## Scope + +**In Scope:** + +- Benchmark process enumeration (1,000 processes target: \<100ms) +- Load testing with 10,000+ processes +- Memory profiling (target: \<100MB sustained) +- CPU monitoring (target: \<5% sustained) +- Regression testing to prevent degradation +- Performance optimization if targets not met +- Performance documentation + +**Out of Scope:** + +- Advanced performance features (kernel monitoring, eBPF) +- Performance tuning for specific workloads +- Distributed performance testing + +## Technical Details + +### Performance Targets + +| Metric | Target | Measurement Method | +| ----------------------- | ------------------------------- | --------------------------------- | +| Process Enumeration | \<100ms for 1,000 processes | Criterion benchmark | +| Large-Scale Support | 10,000+ processes without issue | Load testing with synthetic procs | +| Memory Usage | \<100MB sustained | Memory profiler (heaptrack, etc.) | +| CPU Usage | \<5% sustained | System monitoring (top, htop) | +| Event Publishing | >1,000 events/sec | Throughput benchmark | +| WAL Write Performance | >500 writes/sec | WAL-specific benchmark | +| Backpressure Activation | \<1s to adjust interval | Chaos test measurement | + +### Benchmarking Strategy + +**Criterion Benchmarks:** + +- Process enumeration on all platforms (Linux, macOS, Windows, FreeBSD) +- Event publishing throughput +- WAL write performance +- Configuration hot-reload latency +- RPC request/response latency + +**Load Testing:** + +- Spawn 10,000+ synthetic processes +- Monitor procmond behavior under load +- Validate no degradation or crashes +- Measure memory and CPU usage + +**Memory Profiling:** + +- Use heaptrack (Linux), Instruments (macOS), or similar tools +- Identify memory leaks or excessive allocations +- Validate \<100MB sustained usage +- Profile WAL and event buffer memory usage + +**CPU Monitoring:** + +- Monitor CPU usage during continuous operation +- Validate \<5% sustained usage +- Identify CPU hotspots with profiler +- Optimize hot paths if needed + +### Optimization Strategies + +**If Targets Not Met:** + +1. **Process Enumeration Optimization:** + + - Reduce syscall overhead + - Batch process queries + - Cache frequently accessed data + - Use platform-specific optimizations + +2. **Memory Optimization:** + + - Reduce event buffer size if excessive + - Optimize WAL file rotation + - Use more efficient data structures + - Profile and eliminate memory leaks + +3. **CPU Optimization:** + + - Reduce collection frequency if needed + - Optimize hot paths (profiler-guided) + - Use more efficient algorithms + - Reduce logging overhead + +4. **Event Publishing Optimization:** + + - Batch event publishing + - Optimize serialization (bincode) + - Reduce event size if possible + - Optimize topic matching + +```mermaid +graph TD + subgraph "Benchmarking" + B1[Process Enumeration] + B2[Event Publishing] + B3[WAL Performance] + B4[RPC Latency] + B1 --> B5[Criterion Results] + B2 --> B5 + B3 --> B5 + B4 --> B5 + end + + subgraph "Load Testing" + L1[Spawn 10k Processes] + L2[Monitor Memory] + L3[Monitor CPU] + L4[Monitor Throughput] + L1 --> L5[Load Test Results] + L2 --> L5 + L3 --> L5 + L4 --> L5 + end + + subgraph "Profiling" + P1[Memory Profiler] + P2[CPU Profiler] + P3[Identify Hotspots] + P1 --> P3 + P2 --> P3 + P3 --> P4[Optimization Targets] + end + + B5 --> O1{Targets Met?} + L5 --> O1 + O1 -->|Yes| O2[Document Performance] + O1 -->|No| P4 + P4 --> O3[Optimize] + O3 --> B1 +``` + +### Regression Testing + +**Strategy:** + +- Establish performance baselines with criterion +- Run benchmarks on every pull request +- Fail CI if performance regresses >10% +- Document acceptable performance ranges + +**Baseline Storage:** + +- Store criterion baselines in repository +- Update baselines when intentional changes made +- Track performance trends over time + +## Dependencies + +**Requires:** + +- ticket:54226c8a-719a-479a-863b-9c91f43717a9/[Ticket 6] - Security hardening must be complete +- ticket:54226c8a-719a-479a-863b-9c91f43717a9/[Ticket 7] - FreeBSD support must be validated + +**Blocks:** + +- None (final ticket in Epic) + +## Acceptance Criteria + +### Process Enumeration Performance + +- [ ] Enumerate 1,000 processes in \<100ms (average) on Linux +- [ ] Enumerate 1,000 processes in \<100ms (average) on macOS +- [ ] Enumerate 1,000 processes in \<100ms (average) on Windows +- [ ] Enumerate 1,000 processes in \<200ms (average) on FreeBSD (degraded acceptable) + +### Large-Scale Support + +- [ ] Support 10,000+ processes without crashes +- [ ] Support 10,000+ processes without memory leaks +- [ ] Support 10,000+ processes without performance degradation +- [ ] Load testing validates stability under high process count + +### Memory Usage + +- [ ] Memory usage \<100MB during normal operation (1,000 processes) +- [ ] Memory usage \<200MB during high load (10,000 processes) +- [ ] No memory leaks detected by profiler +- [ ] WAL and event buffer memory usage within limits + +### CPU Usage + +- [ ] CPU usage \<5% during continuous monitoring (1,000 processes) +- [ ] CPU usage \<10% during high load (10,000 processes) +- [ ] No CPU hotspots identified by profiler +- [ ] Collection interval adjustment reduces CPU usage under backpressure + +### Event Publishing Performance + +- [ ] Publish >1,000 events/sec to event bus +- [ ] WAL write performance >500 writes/sec +- [ ] Backpressure activation \<1s to adjust interval +- [ ] Event publishing throughput validated by benchmark + +### Regression Testing + +- [ ] Criterion baselines established for all benchmarks +- [ ] Benchmarks run on every pull request +- [ ] CI fails if performance regresses >10% +- [ ] Performance trends tracked over time + +### Documentation + +- [ ] Performance targets documented +- [ ] Benchmarking methodology documented +- [ ] Optimization strategies documented +- [ ] Performance baselines documented +- [ ] Regression testing documented + +## References + +- **Epic Brief:** spec:54226c8a-719a-479a-863b-9c91f43717a9/0fc3298b-37df-4722-a761-66a5a0da16b3 +- **Tech Plan:** spec:54226c8a-719a-479a-863b-9c91f43717a9/f70103e2-e7ef-494f-8638-5a7324565f28 (Phase 6, Performance Validation) +- **Performance Standards:** file:.cursor/rules/rust/performance-optimization.mdc +- **Existing Benchmarks:** file:procmond/benches/process_collector_benchmarks.rs