Skip to content

Conversation

@cadenkoj
Copy link
Owner

@cadenkoj cadenkoj commented May 22, 2025

TL;DR

Added PostgreSQL database integration with Diesel ORM for persistent storage of threat indicators.

What changed?

  • Integrated Diesel ORM with PostgreSQL for persistent storage
  • Added database migrations for encrypted indicators table
  • Modified the Node class to store indicators in the database instead of in-memory
  • Updated the symmetric key manager to store keys on disk
  • Added UUID v5 support for deterministic indicator IDs based on content
  • Refactored the EncryptedThreatIndicator to use raw bytes instead of base64 strings
  • Updated environment variables to support Diesel CLI

How to test?

  1. Copy the new environment variables from .env.example to your .env file
  2. Ensure PostgreSQL is running and accessible with the configured credentials
  3. Run diesel setup to create the database
  4. Run diesel migration run to apply the migrations
  5. Start the application and verify indicators are persisted between restarts

Why make this change?

This change moves the application from an in-memory storage model to a persistent database, allowing threat indicators to survive application restarts. Using Diesel ORM provides type-safe database interactions and migrations for schema management. The deterministic UUID generation ensures consistent IDs for the same indicator content, which helps with deduplication and referential integrity.

Summary by CodeRabbit

  • New Features

    • Introduced persistent encrypted storage of threat indicators in a PostgreSQL database, with automatic encryption and decryption.
    • Added support for deterministic UUID generation for indicators based on their content.
    • Implemented file-based symmetric key management with key rotation and persistence.
    • Added new API endpoints and improved error handling for database operations.
  • Refactor

    • Migrated from in-memory to database-backed storage for indicators.
    • Unified encryption/decryption logic and data models.
    • Updated UUID generation methods and indicator creation flow.
  • Bug Fixes

    • Improved error propagation and handling in API and encryption routines.
  • Chores

    • Updated environment variable configuration and added database migration scripts.
    • Added and reorganized database, schema, and model modules for better maintainability.

@coderabbitai
Copy link

coderabbitai bot commented May 22, 2025

Walkthrough

This update introduces persistent encrypted storage for threat indicators using a PostgreSQL database via Diesel ORM. It adds database schema migrations, connection pooling, and new data models. The symmetric key manager now supports on-disk key persistence and raw byte encryption. The API, node logic, and indicator models are refactored for database-backed operations with improved error handling and deterministic UUIDs.

Changes

File(s) Change Summary
.env.example Added DATABASE_URL for Diesel CLI; changed DTIM__DEFAULT__STORAGE__DATABASE_URL to reference it.
Cargo.toml Added/updated dependencies: enabled uuid v5 feature, added diesel with PostgreSQL, r2d2, and chrono features.
diesel.toml New Diesel CLI config: schema output path, custom derives, migrations directory.
migrations/00000000000000_diesel_initial_setup/up.sql, down.sql Added initial Diesel migration: creates/drops functions for automatic updated_at management via triggers.
migrations/2025-05-21-234457_create_encrypted_indicators/up.sql, down.sql Migration to create/drop encrypted_indicators table with columns for encrypted data, nonce, MAC, TLP level, and primary key.
src/api.rs Refactored handlers: now use EncryptedIndicator, propagate DB errors, update imports, and standardize error handling for indicator endpoints.
src/crypto/symmetric_key_manager.rs Key manager now supports disk persistence (load/save), key rotation saves to disk, encryption/decryption use raw bytes, error handling improved.
src/db/connection.rs New: provides function to create a Diesel PostgreSQL connection pool.
src/db/mod.rs New: declares and re-exports DB submodules and connection pool function.
src/db/models.rs New: defines EncryptedIndicator struct for ORM and JSON (id, ciphertext, nonce, mac, tlp_level).
src/db/schema.rs New: Diesel schema for encrypted_indicators table.
src/logging.rs Log encryption components are now Base64-encoded for storage; decoding is performed on read.
src/main.rs Main initializes DB pool, loads or generates keys, updates node construction, encrypts and stores indicators, and adapts to new error handling and method signatures.
src/models.rs ThreatIndicator uses deterministic UUIDv5, new/updated methods for ID computation, encryption returns EncryptedIndicator, decryption updated, and uses unified encrypted type.
src/node.rs Node now uses DB pool and key manager, all indicator operations are DB-backed, encryption/decryption integrated, error handling updated, and in-memory storage removed.
src/uuid.rs Refactored and expanded UUID generation: added v5 from hash, explicit v7 with timestamp context, clarified method names and locking.

Sequence Diagram(s)

sequenceDiagram
    participant Main
    participant KeyManager
    participant DB as Database (Diesel/Postgres)
    participant Node
    participant API

    Main->>KeyManager: load_or_generate(rotation_days)
    Main->>DB: get_connection_pool(DATABASE_URL)
    Main->>Node: new(db_pool, key_mgr, logger, privacy)
    API->>Node: add_or_increment_indicator(indicator)
    Node->>KeyManager: encrypt(indicator)
    Node->>DB: insert or update EncryptedIndicator
    Node->>API: return EncryptedIndicator
    API->>Node: get_indicator_by_id(id)
    Node->>DB: fetch EncryptedIndicator
    Node->>KeyManager: decrypt(ciphertext, nonce, mac)
    Node->>API: return ThreatIndicator
Loading

Possibly related PRs

  • chore(lint): fix linter warnings #18: Adds the initial .env.example file with database URL and API key placeholders, which is built upon in this PR by introducing a new DATABASE_URL variable and referencing it for Diesel CLI and application configuration.

Poem

In the warren, bytes now sleep,
Encrypted safe, in Postgres deep.
Keys on disk, not lost in night,
Indicators stored out of sight.
Diesel runs, migrations grow,
Rabbits cheer as data flows!
🐇🔑📦

Note

⚡️ AI Code Reviews for VS Code, Cursor, Windsurf

CodeRabbit now has a plugin for VS Code, Cursor and Windsurf. This brings AI code reviews directly in the code editor. Each commit is reviewed immediately, finding bugs before the PR is raised. Seamless context handoff to your AI code agent ensures that you can easily incorporate review feedback.
Learn more here.


Note

⚡️ Faster reviews with caching

CodeRabbit now supports caching for code and dependencies, helping speed up reviews. This means quicker feedback, reduced wait times, and a smoother review experience overall. Cached data is encrypted and stored securely. This feature will be automatically enabled for all accounts on May 30th. To opt out, configure Review - Disable Cache at either the organization or repository level. If you prefer to disable all data retention across your organization, simply turn off the Data Retention setting under your Organization Settings.
Enjoy the performance boost—your workflow just got faster.

✨ Finishing Touches
  • 📝 Generate Docstrings

🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Explain this complex logic.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai explain this code block.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and explain its main purpose.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai generate docstrings to generate docstrings for this PR.
  • @coderabbitai generate sequence diagram to generate a sequence diagram of the changes in this PR.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

Copy link
Owner Author

cadenkoj commented May 22, 2025

This stack of pull requests is managed by Graphite. Learn more about stacking.

@cadenkoj cadenkoj marked this pull request as ready for review May 22, 2025 02:40
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 10

🔭 Outside diff range comments (2)
src/main.rs (1)

80-99: ⚠️ Potential issue

Key manager is cloned – resulting instances rotate independently

load_or_generate returns a value that you subsequently clone() for the Logger, Node, and API layers.
Because SymmetricKeyManager owns its key material, each clone maintains its own current_key, rotation_time, etc.
After the first call to key_mgr.rotate_key() the original instance diverges from the clones, breaking decryption for requests processed through the API.

Recommended fix:

  1. Wrap the key manager in Arc<Mutex<…>> (or Arc<RwLock<…>>) and share the single instance:
- let mut key_mgr = crypto::SymmetricKeyManager::load_or_generate(...)?
+ let key_mgr = Arc::new(Mutex::new(
+     crypto::SymmetricKeyManager::load_or_generate(settings.tls.key_rotation_days)?
+ ));
  1. Pass Arc<…> everywhere (Node, Logger, API state) and acquire a mutable guard only when encrypt/rotate_key are needed.

This guarantees consistent key material across all threads & tasks.

src/models.rs (1)

100-114: ⚠️ Potential issue

decrypt returns String errors – this breaks callers using ?.
Node::{add_or_increment_indicator,get_indicator_by_id,list_indicators_by_tlp} propagate the error with ? into a Box<dyn Error> context. Because String does not implement std::error::Error, compilation fails.

Change the signature to return a proper error type, e.g.:

-pub fn decrypt( … ) -> Result<Self, String> {
+pub fn decrypt( … ) -> Result<Self, std::io::Error> {
     let decrypted = key_mgr
         .decrypt( … )
-        .map_err(|e| format!("Decryption failed: {}", e))?;
+        .map_err(|e| std::io::Error::other(format!("Decryption failed: {e}")))?;
 
-    serde_json::from_slice(&decrypted)
-        .map_err(|e| format!("Failed to deserialize ThreatIndicator: {}", e))
+    serde_json::from_slice(&decrypted)
+        .map_err(|e| std::io::Error::other(format!("Failed to deserialize ThreatIndicator: {e}")))
 }

This makes the method composable with ?, fixes the compile error in src/node.rs, and keeps error types uniform across encrypt/decrypt.

♻️ Duplicate comments (2)
src/api.rs (1)

248-251: Same concern as the public-indicator endpoint: internal database errors are discarded, making troubleshooting harder.
Refer to previous comment for suggested handling.

src/node.rs (1)

167-184: Same compile-error cascade for decrypt as above.
Once decrypt returns an io::Error, these ? uses will compile.

🧹 Nitpick comments (14)
migrations/2025-05-21-234457_create_encrypted_indicators/up.sql (1)

1-7: Consider adding constraints on tlp_level values

The migration creates an appropriate table structure with NOT NULL constraints. The cryptographic data is correctly stored as BYTEA.

Consider adding a CHECK constraint to ensure only valid TLP levels are stored:

CREATE TABLE encrypted_indicators (
    id CHAR(64) PRIMARY KEY,
    ciphertext BYTEA NOT NULL,
    nonce BYTEA NOT NULL,
    mac BYTEA NOT NULL,
-   tlp_level TEXT NOT NULL
+   tlp_level TEXT NOT NULL CHECK (tlp_level IN ('WHITE', 'GREEN', 'AMBER', 'RED'))
);
src/db/connection.rs (2)

7-13: Good implementation, consider additional pool configuration

The connection pool implementation is correct and follows Rust idioms with proper error handling. However, the function uses default pool settings which might not be optimal for production use.

Consider adding configuration parameters for the connection pool:

 pub fn get_connection_pool(
     conn_str: &str,
+    max_size: u32,
+    min_idle: Option<u32>,
+    connection_timeout: std::time::Duration,
 ) -> Result<Pool<ConnectionManager<PgConnection>>, Box<dyn Error + Send + Sync>> {
     let manager = ConnectionManager::<PgConnection>::new(conn_str);
-    let pool = Pool::builder().build(manager)?;
+    let pool = Pool::builder()
+        .max_size(max_size)
+        .min_idle(min_idle)
+        .connection_timeout(connection_timeout)
+        .build(manager)?;
     Ok(pool)
 }

If you prefer to keep the function signature simple, consider adding a separate function that accepts a configuration struct.


10-12: Consider adding connection health checks

The current implementation doesn't verify if connections in the pool are healthy before returning them to callers.

Add connection health checks to ensure the pool returns valid connections:

 let manager = ConnectionManager::<PgConnection>::new(conn_str);
-let pool = Pool::builder().build(manager)?;
+let pool = Pool::builder()
+    .test_on_check_out(true)
+    .build(manager)?;
 Ok(pool)
src/db/models.rs (1)

9-13: Consider adding timestamp fields for better auditing

The model currently lacks created_at/updated_at timestamp fields which would be useful for audit trails and data lifecycle management.

Add timestamp fields to the model:

 pub struct EncryptedIndicator {
     pub id: String,
     pub ciphertext: Vec<u8>,
     pub nonce: Vec<u8>,
     pub mac: Vec<u8>,
     pub tlp_level: String,
+    pub created_at: chrono::NaiveDateTime,
+    pub updated_at: chrono::NaiveDateTime,
 }

You'll also need to update the corresponding database schema to include these fields. The timestamps can leverage the diesel_manage_updated_at trigger you've already set up in the migrations.

migrations/00000000000000_diesel_initial_setup/up.sql (1)

1-37: Consider documenting the purpose of the migration file

While this is a standard Diesel setup file, it would be helpful to include comments explaining why this migration is needed and how it relates to the overall database design.

Add a comment at the top of the file explaining the purpose of these functions in the context of your application:

 -- This file was automatically created by Diesel to setup helper functions
 -- and other internal bookkeeping. This file is safe to edit, any future
 -- changes will be added to existing projects as new migrations.
+--
+-- In our application, these functions support automatic timestamp management
+-- for our encrypted_indicators table, ensuring we can track when records were
+-- last modified without manual intervention.
src/api.rs (2)

231-234: Expose underlying error instead of collapsing to 500

list_indicators_by_tlp can fail for various reasons (connection-pool exhaustion, deserialisation, etc.).
All failures are currently mapped to INTERNAL_SERVER_ERROR, hiding actionable information from both client and logs.

Consider:

  1. Logging the concrete error (error!("{:?}", err)) before mapping.
  2. Returning SERVICE_UNAVAILABLE for pool errors, BAD_REQUEST for illegal TLP, etc., to aid clients.

This pattern recurs in other handlers; a small helper like map_db_err(err)? would de-duplicate the logic.


326-331: Avoid printing to stdout inside a request handler

The println! call will write to stdout synchronously and bypass your structured logger.
Replace with:

log::error!("Error while listing objects by TLP: {}", error);

or propagate the error to the caller if it should surface as an HTTP failure.

src/main.rs (1)

155-156: Handle potential DB errors when inserting the indicator

add_or_increment_indicator returns a Result, yet the error is dropped with let _ = …;.
Silently ignoring insertion failures defeats the purpose of the earlier encryption-test logic.

Propagate or log the error as done in the API layer.

src/crypto/symmetric_key_manager.rs (2)

65-77: save_keys omits errors when writing the previous key

If writing symm_prev.key fails the manager continues silently, leaving the system without a recovery key.
Bubble the error up:

-            let mut prev_file = File::create(SYMM_PREV_KEY_PATH)?;
-            prev_file.write_all(prev.as_slice())?;
+            let mut prev_file = File::create(SYMM_PREV_KEY_PATH)?;
+            prev_file.write_all(prev.as_slice())?;

→ Already returns std::io::Result<()>, so just propagate with ?.

Also, consider deleting both key files on failure to avoid partial state.


79-91: rotate_key is not concurrency-safe

The method mutates internal state (current_key, previous_key, key_rotation_time) without external synchronisation.
Cloning SymmetricKeyManager (which you currently do) introduces multiple independent timers and possible write races when each clone destroys/rewrites the same symm*.key files.

Couple this with the earlier advice: wrap a single instance in Arc<Mutex<…>> and remove Clone derivation to enforce exclusivity.

src/models.rs (2)

39-43: new() almost never returns an error – simplify the signature.
Self::compute_id is the only fallible call yet it currently cannot fail (compute_id wraps a pure, in-memory hash → UUID conversion in an Ok). Returning Result<Self, io::Error> from new therefore complicates the API and forces callers to handle an error path that will never be taken.

-pub fn new( … ) -> Result<Self, std::io::Error> {
-    let id = Self::compute_id(…)
-        .map_err(|e| std::io::Error::other(format!("Failed to compute ID: {}", e)))?;
+pub fn new( … ) -> Self {
+    let id = Self::compute_id(…);

If you anticipate future failure paths, add them explicitly in compute_id; otherwise prefer the simpler, infallible API.


65-80: compute_id can’t fail – drop the Result wrapper.
Nothing inside this function returns an error. Eliminating the Result:

-pub fn compute_id( … ) -> Result<Uuid, std::io::Error> {
+pub fn compute_id( … ) -> Uuid {-    Ok(Uuid::new_v5_from_hash(&hash))
+    Uuid::new_v5_from_hash(&hash)
 }

This cascades into a cleaner new() implementation and removes the need for the map_err dance shown above.

src/node.rs (2)

152-165: Accept &str instead of &String for flexibility.

-pub fn get_indicator_by_id(
-    &self,
-    indicator_id: &String,
+pub fn get_indicator_by_id(
+    &self,
+    indicator_id: &str,
 ) -> …

Taking a &str lets callers pass either a String or string literal without extra allocations.


192-193: Remove stray println! used for debugging.

Leaving debugging prints in library code clutters stdout for consumers (and breaks invariants in daemons without terminals).

-println!("Indicators: {:?}", indicators);
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between b88db34 and 76eb858.

⛔ Files ignored due to path filters (1)
  • Cargo.lock is excluded by !**/*.lock
📒 Files selected for processing (18)
  • .env.example (1 hunks)
  • Cargo.toml (2 hunks)
  • diesel.toml (1 hunks)
  • migrations/00000000000000_diesel_initial_setup/down.sql (1 hunks)
  • migrations/00000000000000_diesel_initial_setup/up.sql (1 hunks)
  • migrations/2025-05-21-234457_create_encrypted_indicators/down.sql (1 hunks)
  • migrations/2025-05-21-234457_create_encrypted_indicators/up.sql (1 hunks)
  • src/api.rs (7 hunks)
  • src/crypto/symmetric_key_manager.rs (4 hunks)
  • src/db/connection.rs (1 hunks)
  • src/db/mod.rs (1 hunks)
  • src/db/models.rs (1 hunks)
  • src/db/schema.rs (1 hunks)
  • src/logging.rs (3 hunks)
  • src/main.rs (4 hunks)
  • src/models.rs (4 hunks)
  • src/node.rs (3 hunks)
  • src/uuid.rs (1 hunks)
🧰 Additional context used
🧬 Code Graph Analysis (3)
src/db/mod.rs (1)
src/db/connection.rs (1)
  • get_connection_pool (7-13)
src/db/schema.rs (1)
src/node.rs (3)
  • encrypted_indicators (109-111)
  • encrypted_indicators (159-161)
  • encrypted_indicators (174-176)
src/db/models.rs (1)
src/node.rs (3)
  • encrypted_indicators (109-111)
  • encrypted_indicators (159-161)
  • encrypted_indicators (174-176)
⏰ Context from checks skipped due to timeout of 90000ms (1)
  • GitHub Check: build-test-lint
🔇 Additional comments (14)
Cargo.toml (1)

11-11: Validate added Diesel dependency and UUID feature.

The new entries look correct, but please verify that:

  1. The diesel = { version = "2.2.0", features = ["postgres", "r2d2", "chrono"] } line matches your target database version and doesn’t introduce breaking changes.
  2. The uuid crate’s "v5" feature is required alongside "v7" and "serde".

Run cargo update -p diesel and your test suite to confirm compatibility.

src/db/mod.rs (1)

1-6: Module re-exports look good.

The db module cleanly re-exports get_connection_pool and organizes connection, models, and schema.
No issues detected here.

migrations/00000000000000_diesel_initial_setup/down.sql (1)

5-6: Make function drops idempotent.

Currently:

DROP FUNCTION IF EXISTS diesel_manage_updated_at(_tbl regclass);
DROP FUNCTION IF EXISTS diesel_set_updated_at();

Both use IF EXISTS, which is correct.
Consider adding CASCADE if any triggers depend on these functions, or document that no dependencies exist.

src/db/schema.rs (1)

3-12: Schema design follows best practices for encrypted data storage

The table structure appropriately stores:

  • Fixed-length ID as primary key
  • Binary data (BYTEA) for all cryptographic components
  • Text field for the TLP level which enables filtering without decryption

The schema aligns well with the Diesel ORM queries seen in the Node implementation.

src/logging.rs (3)

1-2: Add Base64 encoding support for binary data

Adding Base64 encoding support is appropriate for storing binary cryptographic data in text-based logs.


42-47: Properly encode binary data for log storage

The change correctly implements Base64 encoding of binary cryptographic data before writing to logs, which prevents corruption and improves log readability.


76-93:

Details

❓ Verification inconclusive

Robust Base64 decoding with appropriate error handling

The implementation properly decodes Base64-encoded binary data with comprehensive error handling. The chained and_then/ok_or_else pattern is an elegant Rust solution.

Consider how key rotation might affect log file readability. If encryption keys change, older logs encrypted with previous keys might become unreadable. Is this an intentional design decision or should there be support for multiple keys?


🏁 Script executed:

#!/bin/bash
# Check if there's any key rotation functionality in the codebase
rg -A 5 -B 5 "key.*rotation|rotate.*key" --type rust

Length of output: 5144


🏁 Script executed:

#!/bin/bash
# Locate the decrypt implementation in the codebase and check for previous-key fallback
rg -A15 -B5 "pub fn decrypt" --type rust

Length of output: 2392


Key rotation includes a single-previous-key fallback

The SymmetricKeyManager::decrypt implementation first tries decryption with the current_key and then falls back to the previous_key on failure, ensuring that logs encrypted in the last rotation interval remain readable. Only one past key is retained, so logs older than two rotation intervals will fail decryption.

Points for review:

  • src/crypto/symmetric_key_manager.rs
    pub fn rotate_key (maintains previous_key)
    pub fn decrypt (tries current, then previous)

Question: If you need to read logs across more than one rotation window, should we extend this to support multiple historical keys (e.g., an archived key list)?

src/db/models.rs (1)

6-14: Data model structure looks good

The EncryptedIndicator struct is correctly annotated with Diesel ORM traits and has appropriate fields for storing encrypted threat indicators.

src/uuid.rs (3)

63-65: Good refactoring for better clarity

Renaming the method to new_v7() and delegating to now_v7() improves code clarity by explicitly indicating the UUID version being used.


67-70: Proper mutex handling for thread safety

The implementation correctly locks the timestamp context mutex, ensuring thread-safe UUID generation.


71-74: Good addition of UUIDv5 support

The new method for generating deterministic UUIDs from hashes is a valuable addition. Using the OID namespace is appropriate for this use case.

migrations/00000000000000_diesel_initial_setup/up.sql (2)

19-24: Good implementation of timestamp management function

The diesel_manage_updated_at function correctly sets up a trigger that will automatically update the updated_at column when rows are modified.


26-36: Properly implemented updated_at trigger function

The diesel_set_updated_at trigger function correctly checks if the row has been modified and if the updated_at column hasn't been explicitly set before updating it.

src/node.rs (1)

32-38: Constructor consumes SymmetricKeyManager by value – verify reuse semantics.
key_mgr is moved into Node. If the same manager instance must be shared elsewhere (e.g. other nodes or services) this move will cause ownership issues. If that’s unintended, switch to Arc<Mutex<SymmetricKeyManager>> (or similar) and store a cloneable handle instead.

@@ -0,0 +1 @@
DROP TABLE encrypted_indicators;
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Ensure safe, idempotent rollback of the table.

Using a plain DROP TABLE will error if the table doesn’t exist.
Consider making this idempotent and handling dependent objects by switching to:

-DROP TABLE encrypted_indicators;
+DROP TABLE IF EXISTS encrypted_indicators CASCADE;
🤖 Prompt for AI Agents
In migrations/2025-05-21-234457_create_encrypted_indicators/down.sql at line 1,
the current DROP TABLE statement will cause an error if the table does not
exist. To make the rollback safe and idempotent, modify the statement to include
IF EXISTS so it only attempts to drop the table if it exists. Also, add CASCADE
to handle dependent objects automatically during the drop.

Comment on lines +8 to +9
[migrations_directory]
dir = "/Users/caden/Developer/dtim/migrations"
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Avoid absolute paths for migrations directory.

The migrations_directory.dir is hardcoded to a local filesystem path. This will break for other developers and CI environments.
Use a relative path, for example:

[migrations_directory]
-dir = "/Users/caden/Developer/dtim/migrations"
+dir = "migrations"
🤖 Prompt for AI Agents
In diesel.toml around lines 8 to 9, the migrations_directory.dir is set to an
absolute path which is not portable. Change the value from the absolute path
"/Users/caden/Developer/dtim/migrations" to a relative path "migrations" to
ensure it works across different environments and for other developers.

Comment on lines +7 to +9
DATABASE_URL="postgres://postgres:@localhost:5432/postgres" # For Diesel CLI

DTIM__DEFAULT__STORAGE__DATABASE_URL=${DATABASE_URL}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Review database connection string for security

The PostgreSQL connection string uses default credentials without a password. While acceptable for a local development example, this poses security risks:

  1. Uses default "postgres" superuser account
  2. Has no password protection
  3. Uses default "postgres" database

Add clarifying comments to guide users on securing their database:

-DATABASE_URL="postgres://postgres:@localhost:5432/postgres" # For Diesel CLI
+DATABASE_URL="postgres://postgres:@localhost:5432/postgres" # For Diesel CLI - Add password after 'postgres:' for production use
 
-DTIM__DEFAULT__STORAGE__DATABASE_URL=${DATABASE_URL}
+DTIM__DEFAULT__STORAGE__DATABASE_URL=${DATABASE_URL} # Uses same connection as Diesel CLI
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
DATABASE_URL="postgres://postgres:@localhost:5432/postgres" # For Diesel CLI
DTIM__DEFAULT__STORAGE__DATABASE_URL=${DATABASE_URL}
DATABASE_URL="postgres://postgres:@localhost:5432/postgres" # For Diesel CLI - Add password after 'postgres:' for production use
DTIM__DEFAULT__STORAGE__DATABASE_URL=${DATABASE_URL} # Uses same connection as Diesel CLI
🤖 Prompt for AI Agents
In the .env.example file around lines 7 to 9, the PostgreSQL connection string
uses default credentials with no password and the superuser account, which is
insecure. Add comments above or beside the DATABASE_URL line to clarify that
this is for local development only, recommend changing the username, setting a
strong password, and using a dedicated database for production environments to
guide users on securing their database connection.

Comment on lines 199 to 211
async fn gossip_indicators_handler(
State(state): State<Arc<AppState>>,
Json(indicators): Json<Vec<EncryptedThreatIndicator>>,
Json(indicators): Json<Vec<EncryptedIndicator>>,
) -> ApiResponse<GossipIndicatorsResponse> {
let mut node = state.node.lock().await;
let mut count = 0;
for encrypted in indicators {
if let Ok(indicator) = ThreatIndicator::decrypt(&encrypted, &state.key_mgr) {
node.add_or_increment_indicator(indicator);
node.add_or_increment_indicator(indicator)
.map_err(|_| ApiError::INTERNAL_SERVER_ERROR)?;
count += 1;
}
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Minimise lock contention by decrypting before acquiring the Node mutex

state.node.lock().await is taken before the loop that performs CPU-bound decryption and signature checks.
While the lock is held, every other request that needs the node (including reads) will block, lowering throughput and increasing latency under load.

-    let mut node = state.node.lock().await;
-    let mut count = 0;
-    for encrypted in indicators {
-        if let Ok(indicator) = ThreatIndicator::decrypt(&encrypted, &state.key_mgr) {
-            node.add_or_increment_indicator(indicator)
-                .map_err(|_| ApiError::INTERNAL_SERVER_ERROR)?;
-            count += 1;
-        }
-    }
+    // 1.  Decrypt outside the critical section
+    let mut decrypted: Vec<ThreatIndicator> = indicators
+        .into_iter()
+        .filter_map(|enc| ThreatIndicator::decrypt(&enc, &state.key_mgr).ok())
+        .collect();
+
+    // 2. Acquire the lock *once* for all DB mutations
+    let mut node = state.node.lock().await;
+    let mut count = 0;
+    for indicator in decrypted.drain(..) {
+        node.add_or_increment_indicator(indicator)
+            .map_err(|_| ApiError::INTERNAL_SERVER_ERROR)?;
+        count += 1;
+    }

Benefits
• Shorter critical section → better concurrency
• Avoids holding the lock while performing CPU work or I/O on the key manager
• Slightly cleaner code by separating “validation/decryption” from “state mutation”.

🤖 Prompt for AI Agents
In src/api.rs around lines 199 to 211, the Node mutex lock is acquired before
decrypting indicators, causing unnecessary lock contention. To fix this, move
the decryption and validation of each indicator outside the locked section by
first decrypting all indicators, collecting the valid ones, and then acquiring
the lock once to update the node state. This reduces the critical section
duration, improves concurrency, and separates CPU-bound decryption from state
mutation.

Comment on lines +140 to 147
let encrypted = indicator.encrypt(&mut key_mgr).unwrap();
println!(
"Encrypted: {:?}",
serde_json::to_string(&encrypted).unwrap()
);

let encrypted = indicator.encrypt(&mut key_mgr);
println!("Encrypted: {:?}", encrypted);
let _ = key_mgr.rotate_key();

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Don’t discard errors from rotate_key()

rotate_key() persists the new key to disk – I/O failures (e.g. permission denied, full disk) are silently ignored here:

let _ = key_mgr.rotate_key(); // error lost

Either propagate (rotate_key()?;) or at minimum log the failure:

if let Err(e) = key_mgr.rotate_key() {
    log::error!("Key rotation failed: {}", e);
}
🤖 Prompt for AI Agents
In src/main.rs around lines 140 to 147, the call to key_mgr.rotate_key()
currently ignores any errors, which can hide I/O failures during key
persistence. Modify the code to handle the Result returned by rotate_key() by
either propagating the error using the ? operator or by explicitly matching on
the Result and logging any errors with log::error! to ensure failures are not
silently discarded.

Comment on lines +23 to 57
pub fn load_or_generate(rotation_days: u64) -> std::io::Result<Self> {
fs::create_dir_all("data/keys")?;

let mut current_bytes = [0u8; 32];
let mut prev_bytes = [0u8; 32];

let current_key = if Path::new(SYMM_KEY_PATH).exists() {
File::open(SYMM_KEY_PATH)?.read_exact(&mut current_bytes)?;
Key::<Aes256Gcm>::from_slice(&current_bytes).to_owned()
} else {
let key = Self::generate_key();
let mut file = File::create(SYMM_KEY_PATH)?;
file.write_all(key.as_slice())?;
key
};

let previous_key = if Path::new(SYMM_PREV_KEY_PATH).exists() {
File::open(SYMM_PREV_KEY_PATH)?.read_exact(&mut prev_bytes)?;
Some(Key::<Aes256Gcm>::from_slice(&prev_bytes).to_owned())
} else {
None
};

let key_rotation_time = SystemTime::now()
.duration_since(UNIX_EPOCH)
.unwrap()
.as_secs();
Self {

Ok(Self {
current_key,
previous_key: None,
previous_key,
key_rotation_time,
rotation_interval: rotation_days * 24 * 60 * 60,
}
})
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Keys are written to disk unencrypted with default permissions

File::create() will honour the process umask; on many systems this yields 0o644, exposing secrets to other users.
Since these are long-term AES-256 keys, apply restrictive permissions and/or encrypt the key files.

Quick fix (Unix-only; needs std::os::unix::fs::OpenOptionsExt):

-use std::fs::{self, File};
+use std::fs::{self, OpenOptions};
+use std::os::unix::fs::OpenOptionsExt; // cfg-gate on Unix

...
-let mut file = File::create(SYMM_KEY_PATH)?;
+let mut file = OpenOptions::new()
+    .write(true)
+    .create(true)
+    .truncate(true)
+    .mode(0o600)
+    .open(SYMM_KEY_PATH)?;

Long-term, consider wrapping the key with a passphrase (e.g. using ring::pbkdf2) or leveraging the OS key store.

🤖 Prompt for AI Agents
In src/crypto/symmetric_key_manager.rs between lines 23 and 57, the key files
are created with default permissions that may expose sensitive AES-256 keys to
other users. To fix this, modify the file creation to use OpenOptions with
restrictive permissions (e.g., 0o600) by importing and using
std::os::unix::fs::OpenOptionsExt on Unix systems. This ensures the key files
are only accessible by the owner. For a longer-term solution, consider
encrypting the keys on disk or using the OS key store.

Comment on lines 82 to 99
pub fn encrypt(
&self,
key_mgr: &mut SymmetricKeyManager,
) -> Result<EncryptedThreatIndicator, std::io::Error> {
) -> Result<EncryptedIndicator, std::io::Error> {
let serialized = serde_json::to_vec(self).expect("Failed to serialize ThreatIndicator");
let (ciphertext, nonce, mac) = key_mgr
.encrypt(&serialized)
.map_err(|e| std::io::Error::other(format!("Encryption failed: {}", e)))?;

Ok(EncryptedThreatIndicator {
Ok(EncryptedIndicator {
id: self.id.to_string(),
ciphertext,
nonce,
mac,
tlp_level: self.tlp.to_string(),
})
}

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

expect will panic in production – propagate the serialization error instead.
Using expect here breaks the invariant that encrypt returns a Result. Convert the panic into an error:

-let serialized = serde_json::to_vec(self).expect("Failed to serialize ThreatIndicator");
+let serialized = serde_json::to_vec(self)
+    .map_err(|e| std::io::Error::other(format!("Failed to serialize ThreatIndicator: {e}")))?;

This keeps the error-handling story consistent and prevents a hard crash when serialisation fails (for example, when a custom field contains a non-serialisable value).

🤖 Prompt for AI Agents
In src/models.rs around lines 82 to 99, the encrypt function uses expect on
serialization which can cause a panic in production. Replace the expect call
with proper error propagation by converting the serialization error into a
std::io::Error and returning it as a Result error variant, maintaining
consistent error handling without panics.

Comment on lines +100 to 146
pub fn add_or_increment_indicator(
&mut self,
new_indicator: ThreatIndicator,
) -> Result<EncryptedIndicator, Box<dyn std::error::Error + Send + Sync>> {
use self::db::schema::encrypted_indicators::dsl::*;

let indicator_id = new_indicator.get_id();

let mut conn = self.db_pool.get()?;
let existing: Option<EncryptedIndicator> = encrypted_indicators
.find(&indicator_id)
.first::<EncryptedIndicator>(&mut conn)
.optional()?;

if let Some(encrypted) = existing {
let mut indicator = ThreatIndicator::decrypt(&encrypted, &self.key_mgr)?;
indicator.sightings += 1;
indicator.updated_at = Utc::now();

let new_encrypted = indicator.encrypt(&mut self.key_mgr)?;
let res = diesel::update(encrypted_indicators.find(&indicator_id))
.set((
ciphertext.eq(new_encrypted.ciphertext),
nonce.eq(new_encrypted.nonce),
mac.eq(new_encrypted.mac),
))
.returning(EncryptedIndicator::as_returning())
.get_result(&mut conn)?;

let _ = self.logger.write_log(
log::Level::Info,
&format!("Incrementing indicator: {:?}", existing),
&format!("Incrementing indicator: {:?}", res),
);
Ok(res)
} else {
self.add_indicator(new_indicator);
let encrypted = new_indicator.encrypt(&mut self.key_mgr)?;
let res = diesel::insert_into(encrypted_indicators)
.values(&encrypted)
.returning(EncryptedIndicator::as_returning())
.get_result(&mut conn)?;

let _ = self
.logger
.write_log(log::Level::Info, &format!("Adding indicator: {:?}", res));
Ok(res)
}
id
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Compilation failure: propagating String via ? is invalid.
ThreatIndicator::decrypt currently returns Result<_, String>. The ? operator here requires a type that implements std::error::Error to convert into Box<dyn Error>, which String does not. Fixing decrypt (see previous file) resolves this block.

🤖 Prompt for AI Agents
In src/node.rs lines 100 to 146, the method add_or_increment_indicator uses the
? operator on ThreatIndicator::decrypt which returns Result with a String error
type. Since ? requires the error to implement std::error::Error to convert into
Box<dyn Error>, you need to change ThreatIndicator::decrypt to return a Result
with an error type that implements std::error::Error instead of String. This fix
in decrypt will allow proper error propagation here without compilation errors.

Comment on lines +24 to 29
db_pool: Pool<ConnectionManager<PgConnection>>,
key_mgr: SymmetricKeyManager,
logger: EncryptedLogger,
privacy_level: PrivacyLevel,
allow_custom_fields: bool,
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Sensitive material may leak through debug-level logs.
Node now stores encrypted data, yet several log messages (Added indicator: {:?}, etc.) still output the full EncryptedIndicator, revealing ciphertext, nonce, and MAC. Even though the payload is encrypted, this meta-data may assist traffic analysis or debugging tools with unrestricted access. Consider logging only the indicator ID and TLP level, or scrub/obfuscate byte arrays before logging.
[security]

🤖 Prompt for AI Agents
In src/node.rs around lines 24 to 29, the current debug-level logs output the
full EncryptedIndicator, exposing sensitive encrypted data such as ciphertext,
nonce, and MAC. To fix this, modify the logging statements to only include
non-sensitive information like the indicator ID and TLP level, or sanitize the
byte arrays by obfuscating or removing them before logging, ensuring no
sensitive encrypted material is exposed in logs.

Comment on lines +80 to 98
pub fn add_indicator(
&mut self,
indicator: ThreatIndicator,
) -> Result<EncryptedIndicator, Box<dyn std::error::Error + Send + Sync>> {
use self::db::schema::encrypted_indicators;

let encrypted_indicator = indicator.encrypt(&mut self.key_mgr)?;

let mut conn = self.db_pool.get()?;
let res = diesel::insert_into(encrypted_indicators::table)
.values(&encrypted_indicator)
.returning(EncryptedIndicator::as_returning())
.get_result(&mut conn)?;

let _ = self
.logger
.write_log(log::Level::Info, &format!("Added indicator: {:?}", res));
Ok(res)
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

add_indicator inserts unconditionally – handle primary-key clashes.
If an indicator with the same deterministic UUID already exists, this insertion will violate the primary-key constraint and bubble an error. Either:

  1. Call add_or_increment_indicator internally, or
  2. Use ON CONFLICT (id) DO UPDATE (diesel::upsert) to increment sightings.

Failing to do so exposes the API to 500s on legitimate duplicate submissions.

🤖 Prompt for AI Agents
In src/node.rs around lines 80 to 98, the add_indicator function inserts a new
indicator without handling primary key conflicts, which causes errors on
duplicates. Modify the function to either call add_or_increment_indicator
internally or implement an upsert using diesel's ON CONFLICT (id) DO UPDATE
clause to increment the sightings count on conflict, preventing 500 errors from
duplicate submissions.

Copy link
Owner Author

cadenkoj commented May 22, 2025

Merge activity

  • May 22, 4:38 AM UTC: A user started a stack merge that includes this pull request via Graphite.
  • May 22, 4:38 AM UTC: @cadenkoj merged this pull request with Graphite.

@cadenkoj cadenkoj merged commit f670038 into main May 22, 2025
2 checks passed
@cadenkoj
Copy link
Owner Author

Closes: #8, #9

This was referenced May 22, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants