feat(db): rough database implementation #19

cadenkoj · 2025-05-22T02:39:33Z

TL;DR

Added PostgreSQL database integration with Diesel ORM for persistent storage of threat indicators.

What changed?

Integrated Diesel ORM with PostgreSQL for persistent storage
Added database migrations for encrypted indicators table
Modified the Node class to store indicators in the database instead of in-memory
Updated the symmetric key manager to store keys on disk
Added UUID v5 support for deterministic indicator IDs based on content
Refactored the EncryptedThreatIndicator to use raw bytes instead of base64 strings
Updated environment variables to support Diesel CLI

How to test?

Copy the new environment variables from .env.example to your .env file
Ensure PostgreSQL is running and accessible with the configured credentials
Run diesel setup to create the database
Run diesel migration run to apply the migrations
Start the application and verify indicators are persisted between restarts

Why make this change?

This change moves the application from an in-memory storage model to a persistent database, allowing threat indicators to survive application restarts. Using Diesel ORM provides type-safe database interactions and migrations for schema management. The deterministic UUID generation ensures consistent IDs for the same indicator content, which helps with deduplication and referential integrity.

Summary by CodeRabbit

New Features
- Introduced persistent encrypted storage of threat indicators in a PostgreSQL database, with automatic encryption and decryption.
- Added support for deterministic UUID generation for indicators based on their content.
- Implemented file-based symmetric key management with key rotation and persistence.
- Added new API endpoints and improved error handling for database operations.
Refactor
- Migrated from in-memory to database-backed storage for indicators.
- Unified encryption/decryption logic and data models.
- Updated UUID generation methods and indicator creation flow.
Bug Fixes
- Improved error propagation and handling in API and encryption routines.
Chores
- Updated environment variable configuration and added database migration scripts.
- Added and reorganized database, schema, and model modules for better maintainability.

coderabbitai · 2025-05-22T02:39:39Z

Walkthrough

This update introduces persistent encrypted storage for threat indicators using a PostgreSQL database via Diesel ORM. It adds database schema migrations, connection pooling, and new data models. The symmetric key manager now supports on-disk key persistence and raw byte encryption. The API, node logic, and indicator models are refactored for database-backed operations with improved error handling and deterministic UUIDs.

Changes

File(s)	Change Summary
.env.example	Added `DATABASE_URL` for Diesel CLI; changed `DTIM__DEFAULT__STORAGE__DATABASE_URL` to reference it.
Cargo.toml	Added/updated dependencies: enabled `uuid` v5 feature, added `diesel` with PostgreSQL, r2d2, and chrono features.
diesel.toml	New Diesel CLI config: schema output path, custom derives, migrations directory.
migrations/00000000000000_diesel_initial_setup/up.sql, down.sql	Added initial Diesel migration: creates/drops functions for automatic `updated_at` management via triggers.
migrations/2025-05-21-234457_create_encrypted_indicators/up.sql, down.sql	Migration to create/drop `encrypted_indicators` table with columns for encrypted data, nonce, MAC, TLP level, and primary key.
src/api.rs	Refactored handlers: now use `EncryptedIndicator`, propagate DB errors, update imports, and standardize error handling for indicator endpoints.
src/crypto/symmetric_key_manager.rs	Key manager now supports disk persistence (load/save), key rotation saves to disk, encryption/decryption use raw bytes, error handling improved.
src/db/connection.rs	New: provides function to create a Diesel PostgreSQL connection pool.
src/db/mod.rs	New: declares and re-exports DB submodules and connection pool function.
src/db/models.rs	New: defines `EncryptedIndicator` struct for ORM and JSON (id, ciphertext, nonce, mac, tlp_level).
src/db/schema.rs	New: Diesel schema for `encrypted_indicators` table.
src/logging.rs	Log encryption components are now Base64-encoded for storage; decoding is performed on read.
src/main.rs	Main initializes DB pool, loads or generates keys, updates node construction, encrypts and stores indicators, and adapts to new error handling and method signatures.
src/models.rs	`ThreatIndicator` uses deterministic UUIDv5, new/updated methods for ID computation, encryption returns `EncryptedIndicator`, decryption updated, and uses unified encrypted type.
src/node.rs	Node now uses DB pool and key manager, all indicator operations are DB-backed, encryption/decryption integrated, error handling updated, and in-memory storage removed.
src/uuid.rs	Refactored and expanded UUID generation: added v5 from hash, explicit v7 with timestamp context, clarified method names and locking.

Sequence Diagram(s)

sequenceDiagram
    participant Main
    participant KeyManager
    participant DB as Database (Diesel/Postgres)
    participant Node
    participant API

    Main->>KeyManager: load_or_generate(rotation_days)
    Main->>DB: get_connection_pool(DATABASE_URL)
    Main->>Node: new(db_pool, key_mgr, logger, privacy)
    API->>Node: add_or_increment_indicator(indicator)
    Node->>KeyManager: encrypt(indicator)
    Node->>DB: insert or update EncryptedIndicator
    Node->>API: return EncryptedIndicator
    API->>Node: get_indicator_by_id(id)
    Node->>DB: fetch EncryptedIndicator
    Node->>KeyManager: decrypt(ciphertext, nonce, mac)
    Node->>API: return ThreatIndicator

Possibly related PRs

chore(lint): fix linter warnings #18: Adds the initial .env.example file with database URL and API key placeholders, which is built upon in this PR by introducing a new DATABASE_URL variable and referencing it for Diesel CLI and application configuration.

Poem

In the warren, bytes now sleep,
Encrypted safe, in Postgres deep.
Keys on disk, not lost in night,
Indicators stored out of sight.
Diesel runs, migrations grow,
Rabbits cheer as data flows!
🐇🔑📦

Note

⚡️ AI Code Reviews for VS Code, Cursor, Windsurf

CodeRabbit now has a plugin for VS Code, Cursor and Windsurf. This brings AI code reviews directly in the code editor. Each commit is reviewed immediately, finding bugs before the PR is raised. Seamless context handoff to your AI code agent ensures that you can easily incorporate review feedback.
Learn more here.

Note

⚡️ Faster reviews with caching

CodeRabbit now supports caching for code and dependencies, helping speed up reviews. This means quicker feedback, reduced wait times, and a smoother review experience overall. Cached data is encrypted and stored securely. This feature will be automatically enabled for all accounts on May 30th. To opt out, configure Review - Disable Cache at either the organization or repository level. If you prefer to disable all data retention across your organization, simply turn off the Data Retention setting under your Organization Settings.
Enjoy the performance boost—your workflow just got faster.

✨ Finishing Touches

📝 Generate Docstrings

🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

Review comments: Directly reply to a review comment made by CodeRabbit. Example:
- I pushed a fix in commit <commit_id>, please review it.
- Explain this complex logic.
- Open a follow-up GitHub issue for this discussion.
Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
- @coderabbitai explain this code block.
- @coderabbitai modularize this function.
PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
- @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
- @coderabbitai read src/utils.ts and explain its main purpose.
- @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
- @coderabbitai help me debug CodeRabbit configuration file.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

@coderabbitai pause to pause the reviews on a PR.
@coderabbitai resume to resume the paused reviews.
@coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
@coderabbitai full review to do a full review from scratch and review all the files again.
@coderabbitai summary to regenerate the summary of the PR.
@coderabbitai generate docstrings to generate docstrings for this PR.
@coderabbitai generate sequence diagram to generate a sequence diagram of the changes in this PR.
@coderabbitai resolve resolve all the CodeRabbit review comments.
@coderabbitai configuration to show the current CodeRabbit configuration for the repository.
@coderabbitai help to get help.

Other keywords and placeholders

Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (`.coderabbit.yaml`)

You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
Please see the configuration documentation for more information.
If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

Visit our Documentation for detailed information on how to use CodeRabbit.
Join our Discord Community to get help, request features, and share feedback.
Follow us on X/Twitter for updates and announcements.

cadenkoj · 2025-05-22T02:39:47Z

fix: issues with database implementation #20
feat(db): rough database implementation #19 👈 (View in Graphite)
main

This stack of pull requests is managed by Graphite. Learn more about stacking.

coderabbitai

Actionable comments posted: 10

🔭 Outside diff range comments (2)

src/main.rs (1)
80-99: ⚠️ Potential issue

Key manager is cloned – resulting instances rotate independently

load_or_generate returns a value that you subsequently clone() for the Logger, Node, and API layers.
Because SymmetricKeyManager owns its key material, each clone maintains its own current_key, rotation_time, etc.
After the first call to key_mgr.rotate_key() the original instance diverges from the clones, breaking decryption for requests processed through the API.

Recommended fix:

Wrap the key manager in Arc<Mutex<…>> (or Arc<RwLock<…>>) and share the single instance:
- let mut key_mgr = crypto::SymmetricKeyManager::load_or_generate(...)?
+ let key_mgr = Arc::new(Mutex::new(
+     crypto::SymmetricKeyManager::load_or_generate(settings.tls.key_rotation_days)?
+ ));
Pass Arc<…> everywhere (Node, Logger, API state) and acquire a mutable guard only when encrypt/rotate_key are needed.

This guarantees consistent key material across all threads & tasks.
src/models.rs (1)
100-114: ⚠️ Potential issue

decrypt returns String errors – this breaks callers using ?.
Node::{add_or_increment_indicator,get_indicator_by_id,list_indicators_by_tlp} propagate the error with ? into a Box<dyn Error> context. Because String does not implement std::error::Error, compilation fails.

Change the signature to return a proper error type, e.g.:
-pub fn decrypt( … ) -> Result<Self, String> {
+pub fn decrypt( … ) -> Result<Self, std::io::Error> {
     let decrypted = key_mgr
         .decrypt( … )
-        .map_err(|e| format!("Decryption failed: {}", e))?;
+        .map_err(|e| std::io::Error::other(format!("Decryption failed: {e}")))?;
 
-    serde_json::from_slice(&decrypted)
-        .map_err(|e| format!("Failed to deserialize ThreatIndicator: {}", e))
+    serde_json::from_slice(&decrypted)
+        .map_err(|e| std::io::Error::other(format!("Failed to deserialize ThreatIndicator: {e}")))
 }
This makes the method composable with ?, fixes the compile error in src/node.rs, and keeps error types uniform across encrypt/decrypt.

♻️ Duplicate comments (2)

src/api.rs (1)

248-251: Same concern as the public-indicator endpoint: internal database errors are discarded, making troubleshooting harder.
Refer to previous comment for suggested handling.

src/node.rs (1)

167-184: Same compile-error cascade for decrypt as above.
Once decrypt returns an io::Error, these ? uses will compile.

🧹 Nitpick comments (14)

migrations/2025-05-21-234457_create_encrypted_indicators/up.sql (1)
1-7: Consider adding constraints on tlp_level values

The migration creates an appropriate table structure with NOT NULL constraints. The cryptographic data is correctly stored as BYTEA.

Consider adding a CHECK constraint to ensure only valid TLP levels are stored:
CREATE TABLE encrypted_indicators (
    id CHAR(64) PRIMARY KEY,
    ciphertext BYTEA NOT NULL,
    nonce BYTEA NOT NULL,
    mac BYTEA NOT NULL,
-   tlp_level TEXT NOT NULL
+   tlp_level TEXT NOT NULL CHECK (tlp_level IN ('WHITE', 'GREEN', 'AMBER', 'RED'))
);
src/db/connection.rs (2)
7-13: Good implementation, consider additional pool configuration

The connection pool implementation is correct and follows Rust idioms with proper error handling. However, the function uses default pool settings which might not be optimal for production use.

Consider adding configuration parameters for the connection pool:
 pub fn get_connection_pool(
     conn_str: &str,
+    max_size: u32,
+    min_idle: Option<u32>,
+    connection_timeout: std::time::Duration,
 ) -> Result<Pool<ConnectionManager<PgConnection>>, Box<dyn Error + Send + Sync>> {
     let manager = ConnectionManager::<PgConnection>::new(conn_str);
-    let pool = Pool::builder().build(manager)?;
+    let pool = Pool::builder()
+        .max_size(max_size)
+        .min_idle(min_idle)
+        .connection_timeout(connection_timeout)
+        .build(manager)?;
     Ok(pool)
 }
If you prefer to keep the function signature simple, consider adding a separate function that accepts a configuration struct.

10-12: Consider adding connection health checks

The current implementation doesn't verify if connections in the pool are healthy before returning them to callers.

Add connection health checks to ensure the pool returns valid connections:
 let manager = ConnectionManager::<PgConnection>::new(conn_str);
-let pool = Pool::builder().build(manager)?;
+let pool = Pool::builder()
+    .test_on_check_out(true)
+    .build(manager)?;
 Ok(pool)
src/db/models.rs (1)
9-13: Consider adding timestamp fields for better auditing

The model currently lacks created_at/updated_at timestamp fields which would be useful for audit trails and data lifecycle management.

Add timestamp fields to the model:
 pub struct EncryptedIndicator {
     pub id: String,
     pub ciphertext: Vec<u8>,
     pub nonce: Vec<u8>,
     pub mac: Vec<u8>,
     pub tlp_level: String,
+    pub created_at: chrono::NaiveDateTime,
+    pub updated_at: chrono::NaiveDateTime,
 }
You'll also need to update the corresponding database schema to include these fields. The timestamps can leverage the diesel_manage_updated_at trigger you've already set up in the migrations.
migrations/00000000000000_diesel_initial_setup/up.sql (1)
1-37: Consider documenting the purpose of the migration file

While this is a standard Diesel setup file, it would be helpful to include comments explaining why this migration is needed and how it relates to the overall database design.

Add a comment at the top of the file explaining the purpose of these functions in the context of your application:
 -- This file was automatically created by Diesel to setup helper functions
 -- and other internal bookkeeping. This file is safe to edit, any future
 -- changes will be added to existing projects as new migrations.
+--
+-- In our application, these functions support automatic timestamp management
+-- for our encrypted_indicators table, ensuring we can track when records were
+-- last modified without manual intervention.
src/api.rs (2)
231-234: Expose underlying error instead of collapsing to 500

list_indicators_by_tlp can fail for various reasons (connection-pool exhaustion, deserialisation, etc.).
All failures are currently mapped to INTERNAL_SERVER_ERROR, hiding actionable information from both client and logs.

Consider:

Logging the concrete error (error!("{:?}", err)) before mapping.

Returning SERVICE_UNAVAILABLE for pool errors, BAD_REQUEST for illegal TLP, etc., to aid clients.

This pattern recurs in other handlers; a small helper like map_db_err(err)? would de-duplicate the logic.

326-331: Avoid printing to stdout inside a request handler

The println! call will write to stdout synchronously and bypass your structured logger.
Replace with:
log::error!("Error while listing objects by TLP: {}", error);
or propagate the error to the caller if it should surface as an HTTP failure.
src/main.rs (1)

155-156: Handle potential DB errors when inserting the indicator

add_or_increment_indicator returns a Result, yet the error is dropped with let _ = …;.
Silently ignoring insertion failures defeats the purpose of the earlier encryption-test logic.

Propagate or log the error as done in the API layer.
src/crypto/symmetric_key_manager.rs (2)
65-77: save_keys omits errors when writing the previous key

If writing symm_prev.key fails the manager continues silently, leaving the system without a recovery key.
Bubble the error up:
-            let mut prev_file = File::create(SYMM_PREV_KEY_PATH)?;
-            prev_file.write_all(prev.as_slice())?;
+            let mut prev_file = File::create(SYMM_PREV_KEY_PATH)?;
+            prev_file.write_all(prev.as_slice())?;
→ Already returns std::io::Result<()>, so just propagate with ?.

Also, consider deleting both key files on failure to avoid partial state.

79-91: rotate_key is not concurrency-safe

The method mutates internal state (current_key, previous_key, key_rotation_time) without external synchronisation.
Cloning SymmetricKeyManager (which you currently do) introduces multiple independent timers and possible write races when each clone destroys/rewrites the same symm*.key files.

Couple this with the earlier advice: wrap a single instance in Arc<Mutex<…>> and remove Clone derivation to enforce exclusivity.
src/models.rs (2)
39-43: new() almost never returns an error – simplify the signature.
Self::compute_id is the only fallible call yet it currently cannot fail (compute_id wraps a pure, in-memory hash → UUID conversion in an Ok). Returning Result<Self, io::Error> from new therefore complicates the API and forces callers to handle an error path that will never be taken.
-pub fn new( … ) -> Result<Self, std::io::Error> {
-    let id = Self::compute_id(…)
-        .map_err(|e| std::io::Error::other(format!("Failed to compute ID: {}", e)))?;
+pub fn new( … ) -> Self {
+    let id = Self::compute_id(…);
If you anticipate future failure paths, add them explicitly in compute_id; otherwise prefer the simpler, infallible API.

65-80: compute_id can’t fail – drop the Result wrapper.
Nothing inside this function returns an error. Eliminating the Result:
-pub fn compute_id( … ) -> Result<Uuid, std::io::Error> {
+pub fn compute_id( … ) -> Uuid {
     …
-    Ok(Uuid::new_v5_from_hash(&hash))
+    Uuid::new_v5_from_hash(&hash)
 }
This cascades into a cleaner new() implementation and removes the need for the map_err dance shown above.
src/node.rs (2)
152-165: Accept &str instead of &String for flexibility.
-pub fn get_indicator_by_id(
-    &self,
-    indicator_id: &String,
+pub fn get_indicator_by_id(
+    &self,
+    indicator_id: &str,
 ) -> …
Taking a &str lets callers pass either a String or string literal without extra allocations.

192-193: Remove stray println! used for debugging.

Leaving debugging prints in library code clutters stdout for consumers (and breaks invariants in daemons without terminals).
-println!("Indicators: {:?}", indicators);

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between b88db34 and 76eb858.

⛔ Files ignored due to path filters (1)

Cargo.lock is excluded by !**/*.lock

📒 Files selected for processing (18)

.env.example (1 hunks)
Cargo.toml (2 hunks)
diesel.toml (1 hunks)
migrations/00000000000000_diesel_initial_setup/down.sql (1 hunks)
migrations/00000000000000_diesel_initial_setup/up.sql (1 hunks)
migrations/2025-05-21-234457_create_encrypted_indicators/down.sql (1 hunks)
migrations/2025-05-21-234457_create_encrypted_indicators/up.sql (1 hunks)
src/api.rs (7 hunks)
src/crypto/symmetric_key_manager.rs (4 hunks)
src/db/connection.rs (1 hunks)
src/db/mod.rs (1 hunks)
src/db/models.rs (1 hunks)
src/db/schema.rs (1 hunks)
src/logging.rs (3 hunks)
src/main.rs (4 hunks)
src/models.rs (4 hunks)
src/node.rs (3 hunks)
src/uuid.rs (1 hunks)

🧰 Additional context used

🧬 Code Graph Analysis (3)

src/db/mod.rs (1)

src/db/connection.rs (1)

get_connection_pool (7-13)

src/db/schema.rs (1)

src/node.rs (3)

encrypted_indicators (109-111)

encrypted_indicators (159-161)

encrypted_indicators (174-176)

src/db/models.rs (1)

src/node.rs (3)

encrypted_indicators (109-111)

encrypted_indicators (159-161)

encrypted_indicators (174-176)

⏰ Context from checks skipped due to timeout of 90000ms (1)

GitHub Check: build-test-lint

🔇 Additional comments (14)

Cargo.toml (1)

11-11: Validate added Diesel dependency and UUID feature.

The new entries look correct, but please verify that:

The diesel = { version = "2.2.0", features = ["postgres", "r2d2", "chrono"] } line matches your target database version and doesn’t introduce breaking changes.

The uuid crate’s "v5" feature is required alongside "v7" and "serde".

Run cargo update -p diesel and your test suite to confirm compatibility.

src/db/mod.rs (1)

1-6: Module re-exports look good.

The db module cleanly re-exports get_connection_pool and organizes connection, models, and schema.
No issues detected here.
migrations/00000000000000_diesel_initial_setup/down.sql (1)

5-6: Make function drops idempotent.

Currently:
DROP FUNCTION IF EXISTS diesel_manage_updated_at(_tbl regclass);
DROP FUNCTION IF EXISTS diesel_set_updated_at();
Both use IF EXISTS, which is correct.
Consider adding CASCADE if any triggers depend on these functions, or document that no dependencies exist.
src/db/schema.rs (1)

3-12: Schema design follows best practices for encrypted data storage

The table structure appropriately stores:

Fixed-length ID as primary key

Binary data (BYTEA) for all cryptographic components

Text field for the TLP level which enables filtering without decryption

The schema aligns well with the Diesel ORM queries seen in the Node implementation.
src/logging.rs (3)

1-2: Add Base64 encoding support for binary data

Adding Base64 encoding support is appropriate for storing binary cryptographic data in text-based logs.

42-47: Properly encode binary data for log storage

The change correctly implements Base64 encoding of binary cryptographic data before writing to logs, which prevents corruption and improves log readability.

76-93:
Details

❓ Verification inconclusive

Robust Base64 decoding with appropriate error handling

The implementation properly decodes Base64-encoded binary data with comprehensive error handling. The chained and_then/ok_or_else pattern is an elegant Rust solution.

Consider how key rotation might affect log file readability. If encryption keys change, older logs encrypted with previous keys might become unreadable. Is this an intentional design decision or should there be support for multiple keys?

🏁 Script executed:
#!/bin/bash
# Check if there's any key rotation functionality in the codebase
rg -A 5 -B 5 "key.*rotation|rotate.*key" --type rust
Length of output: 5144

🏁 Script executed:
#!/bin/bash
# Locate the decrypt implementation in the codebase and check for previous-key fallback
rg -A15 -B5 "pub fn decrypt" --type rust
Length of output: 2392
Key rotation includes a single-previous-key fallback

The SymmetricKeyManager::decrypt implementation first tries decryption with the current_key and then falls back to the previous_key on failure, ensuring that logs encrypted in the last rotation interval remain readable. Only one past key is retained, so logs older than two rotation intervals will fail decryption.

Points for review:

src/crypto/symmetric_key_manager.rs
• pub fn rotate_key (maintains previous_key)
• pub fn decrypt (tries current, then previous)

Question: If you need to read logs across more than one rotation window, should we extend this to support multiple historical keys (e.g., an archived key list)?
src/db/models.rs (1)

6-14: Data model structure looks good

The EncryptedIndicator struct is correctly annotated with Diesel ORM traits and has appropriate fields for storing encrypted threat indicators.

src/uuid.rs (3)

63-65: Good refactoring for better clarity

Renaming the method to new_v7() and delegating to now_v7() improves code clarity by explicitly indicating the UUID version being used.

67-70: Proper mutex handling for thread safety

The implementation correctly locks the timestamp context mutex, ensuring thread-safe UUID generation.

71-74: Good addition of UUIDv5 support

The new method for generating deterministic UUIDs from hashes is a valuable addition. Using the OID namespace is appropriate for this use case.

migrations/00000000000000_diesel_initial_setup/up.sql (2)

19-24: Good implementation of timestamp management function

The diesel_manage_updated_at function correctly sets up a trigger that will automatically update the updated_at column when rows are modified.

26-36: Properly implemented updated_at trigger function

The diesel_set_updated_at trigger function correctly checks if the row has been modified and if the updated_at column hasn't been explicitly set before updating it.

src/node.rs (1)

32-38: Constructor consumes SymmetricKeyManager by value – verify reuse semantics.
key_mgr is moved into Node. If the same manager instance must be shared elsewhere (e.g. other nodes or services) this move will cause ownership issues. If that’s unintended, switch to Arc<Mutex<SymmetricKeyManager>> (or similar) and store a cloneable handle instead.

coderabbitai · 2025-05-22T02:44:46Z

migrations/2025-05-21-234457_create_encrypted_indicators/down.sql

@@ -0,0 +1 @@
+DROP TABLE encrypted_indicators;


🛠️ Refactor suggestion

Ensure safe, idempotent rollback of the table.

Using a plain DROP TABLE will error if the table doesn’t exist.
Consider making this idempotent and handling dependent objects by switching to:

-DROP TABLE encrypted_indicators; +DROP TABLE IF EXISTS encrypted_indicators CASCADE;

🤖 Prompt for AI Agents

In migrations/2025-05-21-234457_create_encrypted_indicators/down.sql at line 1, the current DROP TABLE statement will cause an error if the table does not exist. To make the rollback safe and idempotent, modify the statement to include IF EXISTS so it only attempts to drop the table if it exists. Also, add CASCADE to handle dependent objects automatically during the drop.

coderabbitai · 2025-05-22T02:44:46Z

diesel.toml

+[migrations_directory]
+dir = "/Users/caden/Developer/dtim/migrations"


🛠️ Refactor suggestion

Avoid absolute paths for migrations directory.

The migrations_directory.dir is hardcoded to a local filesystem path. This will break for other developers and CI environments.
Use a relative path, for example:

[migrations_directory] -dir = "/Users/caden/Developer/dtim/migrations" +dir = "migrations"

🤖 Prompt for AI Agents

In diesel.toml around lines 8 to 9, the migrations_directory.dir is set to an absolute path which is not portable. Change the value from the absolute path "/Users/caden/Developer/dtim/migrations" to a relative path "migrations" to ensure it works across different environments and for other developers.

coderabbitai · 2025-05-22T02:44:46Z

.env.example

+DATABASE_URL="postgres://postgres:@localhost:5432/postgres" # For Diesel CLI
+
+DTIM__DEFAULT__STORAGE__DATABASE_URL=${DATABASE_URL}


🛠️ Refactor suggestion

Review database connection string for security

The PostgreSQL connection string uses default credentials without a password. While acceptable for a local development example, this poses security risks:

Uses default "postgres" superuser account

Has no password protection

Uses default "postgres" database

Add clarifying comments to guide users on securing their database:

-DATABASE_URL="postgres://postgres:@localhost:5432/postgres" # For Diesel CLI +DATABASE_URL="postgres://postgres:@localhost:5432/postgres" # For Diesel CLI - Add password after 'postgres:' for production use -DTIM__DEFAULT__STORAGE__DATABASE_URL=${DATABASE_URL} +DTIM__DEFAULT__STORAGE__DATABASE_URL=${DATABASE_URL} # Uses same connection as Diesel CLI

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

DATABASE_URL="postgres://postgres:@localhost:5432/postgres" # For Diesel CLI

DTIM__DEFAULT__STORAGE__DATABASE_URL=${DATABASE_URL}

DATABASE_URL="postgres://postgres:@localhost:5432/postgres" # For Diesel CLI - Add password after 'postgres:' for production use

DTIM__DEFAULT__STORAGE__DATABASE_URL=${DATABASE_URL} # Uses same connection as Diesel CLI

🤖 Prompt for AI Agents

In the .env.example file around lines 7 to 9, the PostgreSQL connection string uses default credentials with no password and the superuser account, which is insecure. Add comments above or beside the DATABASE_URL line to clarify that this is for local development only, recommend changing the username, setting a strong password, and using a dedicated database for production environments to guide users on securing their database connection.

coderabbitai · 2025-05-22T02:44:46Z

src/api.rs

 async fn gossip_indicators_handler(
    State(state): State<Arc<AppState>>,
-    Json(indicators): Json<Vec<EncryptedThreatIndicator>>,
+    Json(indicators): Json<Vec<EncryptedIndicator>>,
 ) -> ApiResponse<GossipIndicatorsResponse> {
    let mut node = state.node.lock().await;
    let mut count = 0;
    for encrypted in indicators {
        if let Ok(indicator) = ThreatIndicator::decrypt(&encrypted, &state.key_mgr) {
-            node.add_or_increment_indicator(indicator);
+            node.add_or_increment_indicator(indicator)
+                .map_err(|_| ApiError::INTERNAL_SERVER_ERROR)?;
            count += 1;
        }
    }


🛠️ Refactor suggestion

Minimise lock contention by decrypting before acquiring the Node mutex

state.node.lock().await is taken before the loop that performs CPU-bound decryption and signature checks.
While the lock is held, every other request that needs the node (including reads) will block, lowering throughput and increasing latency under load.

- let mut node = state.node.lock().await; - let mut count = 0; - for encrypted in indicators { - if let Ok(indicator) = ThreatIndicator::decrypt(&encrypted, &state.key_mgr) { - node.add_or_increment_indicator(indicator) - .map_err(|_| ApiError::INTERNAL_SERVER_ERROR)?; - count += 1; - } - } + // 1. Decrypt outside the critical section + let mut decrypted: Vec<ThreatIndicator> = indicators + .into_iter() + .filter_map(|enc| ThreatIndicator::decrypt(&enc, &state.key_mgr).ok()) + .collect(); + + // 2. Acquire the lock *once* for all DB mutations + let mut node = state.node.lock().await; + let mut count = 0; + for indicator in decrypted.drain(..) { + node.add_or_increment_indicator(indicator) + .map_err(|_| ApiError::INTERNAL_SERVER_ERROR)?; + count += 1; + }

Benefits
• Shorter critical section → better concurrency
• Avoids holding the lock while performing CPU work or I/O on the key manager
• Slightly cleaner code by separating “validation/decryption” from “state mutation”.

🤖 Prompt for AI Agents

In src/api.rs around lines 199 to 211, the Node mutex lock is acquired before decrypting indicators, causing unnecessary lock contention. To fix this, move the decryption and validation of each indicator outside the locked section by first decrypting all indicators, collecting the valid ones, and then acquiring the lock once to update the node state. This reduces the critical section duration, improves concurrency, and separates CPU-bound decryption from state mutation.

coderabbitai · 2025-05-22T02:44:46Z

src/main.rs

+    let encrypted = indicator.encrypt(&mut key_mgr).unwrap();
+    println!(
+        "Encrypted: {:?}",
+        serde_json::to_string(&encrypted).unwrap()
    );

-    let encrypted = indicator.encrypt(&mut key_mgr);
-    println!("Encrypted: {:?}", encrypted);
+    let _ = key_mgr.rotate_key();



🛠️ Refactor suggestion

Don’t discard errors from rotate_key()

rotate_key() persists the new key to disk – I/O failures (e.g. permission denied, full disk) are silently ignored here:

let _ = key_mgr.rotate_key(); // error lost

Either propagate (rotate_key()?;) or at minimum log the failure:

if let Err(e) = key_mgr.rotate_key() { log::error!("Key rotation failed: {}", e); }

🤖 Prompt for AI Agents

In src/main.rs around lines 140 to 147, the call to key_mgr.rotate_key() currently ignores any errors, which can hide I/O failures during key persistence. Modify the code to handle the Result returned by rotate_key() by either propagating the error using the ? operator or by explicitly matching on the Result and logging any errors with log::error! to ensure failures are not silently discarded.

coderabbitai · 2025-05-22T02:44:46Z

src/crypto/symmetric_key_manager.rs

+    pub fn load_or_generate(rotation_days: u64) -> std::io::Result<Self> {
+        fs::create_dir_all("data/keys")?;
+
+        let mut current_bytes = [0u8; 32];
+        let mut prev_bytes = [0u8; 32];
+
+        let current_key = if Path::new(SYMM_KEY_PATH).exists() {
+            File::open(SYMM_KEY_PATH)?.read_exact(&mut current_bytes)?;
+            Key::<Aes256Gcm>::from_slice(&current_bytes).to_owned()
+        } else {
+            let key = Self::generate_key();
+            let mut file = File::create(SYMM_KEY_PATH)?;
+            file.write_all(key.as_slice())?;
+            key
+        };
+
+        let previous_key = if Path::new(SYMM_PREV_KEY_PATH).exists() {
+            File::open(SYMM_PREV_KEY_PATH)?.read_exact(&mut prev_bytes)?;
+            Some(Key::<Aes256Gcm>::from_slice(&prev_bytes).to_owned())
+        } else {
+            None
+        };
+
        let key_rotation_time = SystemTime::now()
            .duration_since(UNIX_EPOCH)
            .unwrap()
            .as_secs();
-        Self {
+
+        Ok(Self {
            current_key,
-            previous_key: None,
+            previous_key,
            key_rotation_time,
            rotation_interval: rotation_days * 24 * 60 * 60,
-        }
+        })
    }


⚠️ Potential issue

Keys are written to disk unencrypted with default permissions

File::create() will honour the process umask; on many systems this yields 0o644, exposing secrets to other users.
Since these are long-term AES-256 keys, apply restrictive permissions and/or encrypt the key files.

Quick fix (Unix-only; needs std::os::unix::fs::OpenOptionsExt):

-use std::fs::{self, File}; +use std::fs::{self, OpenOptions}; +use std::os::unix::fs::OpenOptionsExt; // cfg-gate on Unix ... -let mut file = File::create(SYMM_KEY_PATH)?; +let mut file = OpenOptions::new() + .write(true) + .create(true) + .truncate(true) + .mode(0o600) + .open(SYMM_KEY_PATH)?;

Long-term, consider wrapping the key with a passphrase (e.g. using ring::pbkdf2) or leveraging the OS key store.

🤖 Prompt for AI Agents

In src/crypto/symmetric_key_manager.rs between lines 23 and 57, the key files are created with default permissions that may expose sensitive AES-256 keys to other users. To fix this, modify the file creation to use OpenOptions with restrictive permissions (e.g., 0o600) by importing and using std::os::unix::fs::OpenOptionsExt on Unix systems. This ensures the key files are only accessible by the owner. For a longer-term solution, consider encrypting the keys on disk or using the OS key store.

coderabbitai · 2025-05-22T02:44:47Z

src/models.rs

    pub fn encrypt(
        &self,
        key_mgr: &mut SymmetricKeyManager,
-    ) -> Result<EncryptedThreatIndicator, std::io::Error> {
+    ) -> Result<EncryptedIndicator, std::io::Error> {
        let serialized = serde_json::to_vec(self).expect("Failed to serialize ThreatIndicator");
        let (ciphertext, nonce, mac) = key_mgr
            .encrypt(&serialized)
            .map_err(|e| std::io::Error::other(format!("Encryption failed: {}", e)))?;

-        Ok(EncryptedThreatIndicator {
+        Ok(EncryptedIndicator {
+            id: self.id.to_string(),
            ciphertext,
            nonce,
            mac,
+            tlp_level: self.tlp.to_string(),
        })
    }



⚠️ Potential issue

expect will panic in production – propagate the serialization error instead.
Using expect here breaks the invariant that encrypt returns a Result. Convert the panic into an error:

-let serialized = serde_json::to_vec(self).expect("Failed to serialize ThreatIndicator"); +let serialized = serde_json::to_vec(self) + .map_err(|e| std::io::Error::other(format!("Failed to serialize ThreatIndicator: {e}")))?;

This keeps the error-handling story consistent and prevents a hard crash when serialisation fails (for example, when a custom field contains a non-serialisable value).

🤖 Prompt for AI Agents

In src/models.rs around lines 82 to 99, the encrypt function uses expect on serialization which can cause a panic in production. Replace the expect call with proper error propagation by converting the serialization error into a std::io::Error and returning it as a Result error variant, maintaining consistent error handling without panics.

coderabbitai · 2025-05-22T02:44:47Z

src/node.rs

+    pub fn add_or_increment_indicator(
+        &mut self,
+        new_indicator: ThreatIndicator,
+    ) -> Result<EncryptedIndicator, Box<dyn std::error::Error + Send + Sync>> {
+        use self::db::schema::encrypted_indicators::dsl::*;
+
+        let indicator_id = new_indicator.get_id();
+
+        let mut conn = self.db_pool.get()?;
+        let existing: Option<EncryptedIndicator> = encrypted_indicators
+            .find(&indicator_id)
+            .first::<EncryptedIndicator>(&mut conn)
+            .optional()?;
+
+        if let Some(encrypted) = existing {
+            let mut indicator = ThreatIndicator::decrypt(&encrypted, &self.key_mgr)?;
+            indicator.sightings += 1;
+            indicator.updated_at = Utc::now();
+
+            let new_encrypted = indicator.encrypt(&mut self.key_mgr)?;
+            let res = diesel::update(encrypted_indicators.find(&indicator_id))
+                .set((
+                    ciphertext.eq(new_encrypted.ciphertext),
+                    nonce.eq(new_encrypted.nonce),
+                    mac.eq(new_encrypted.mac),
+                ))
+                .returning(EncryptedIndicator::as_returning())
+                .get_result(&mut conn)?;
+
            let _ = self.logger.write_log(
                log::Level::Info,
-                &format!("Incrementing indicator: {:?}", existing),
+                &format!("Incrementing indicator: {:?}", res),
            );
+            Ok(res)
        } else {
-            self.add_indicator(new_indicator);
+            let encrypted = new_indicator.encrypt(&mut self.key_mgr)?;
+            let res = diesel::insert_into(encrypted_indicators)
+                .values(&encrypted)
+                .returning(EncryptedIndicator::as_returning())
+                .get_result(&mut conn)?;
+
+            let _ = self
+                .logger
+                .write_log(log::Level::Info, &format!("Adding indicator: {:?}", res));
+            Ok(res)
        }
-        id
    }


⚠️ Potential issue

Compilation failure: propagating String via ? is invalid.
ThreatIndicator::decrypt currently returns Result<_, String>. The ? operator here requires a type that implements std::error::Error to convert into Box<dyn Error>, which String does not. Fixing decrypt (see previous file) resolves this block.

🤖 Prompt for AI Agents

In src/node.rs lines 100 to 146, the method add_or_increment_indicator uses the ? operator on ThreatIndicator::decrypt which returns Result with a String error type. Since ? requires the error to implement std::error::Error to convert into Box<dyn Error>, you need to change ThreatIndicator::decrypt to return a Result with an error type that implements std::error::Error instead of String. This fix in decrypt will allow proper error propagation here without compilation errors.

coderabbitai · 2025-05-22T02:44:47Z

src/node.rs

+    db_pool: Pool<ConnectionManager<PgConnection>>,
+    key_mgr: SymmetricKeyManager,
    logger: EncryptedLogger,
    privacy_level: PrivacyLevel,
    allow_custom_fields: bool,
 }


🛠️ Refactor suggestion

Sensitive material may leak through debug-level logs.
Node now stores encrypted data, yet several log messages (Added indicator: {:?}, etc.) still output the full EncryptedIndicator, revealing ciphertext, nonce, and MAC. Even though the payload is encrypted, this meta-data may assist traffic analysis or debugging tools with unrestricted access. Consider logging only the indicator ID and TLP level, or scrub/obfuscate byte arrays before logging.
[security]

🤖 Prompt for AI Agents

In src/node.rs around lines 24 to 29, the current debug-level logs output the full EncryptedIndicator, exposing sensitive encrypted data such as ciphertext, nonce, and MAC. To fix this, modify the logging statements to only include non-sensitive information like the indicator ID and TLP level, or sanitize the byte arrays by obfuscating or removing them before logging, ensuring no sensitive encrypted material is exposed in logs.

coderabbitai · 2025-05-22T02:44:47Z

src/node.rs

+    pub fn add_indicator(
+        &mut self,
+        indicator: ThreatIndicator,
+    ) -> Result<EncryptedIndicator, Box<dyn std::error::Error + Send + Sync>> {
+        use self::db::schema::encrypted_indicators;
+
+        let encrypted_indicator = indicator.encrypt(&mut self.key_mgr)?;
+
+        let mut conn = self.db_pool.get()?;
+        let res = diesel::insert_into(encrypted_indicators::table)
+            .values(&encrypted_indicator)
+            .returning(EncryptedIndicator::as_returning())
+            .get_result(&mut conn)?;
+
+        let _ = self
+            .logger
+            .write_log(log::Level::Info, &format!("Added indicator: {:?}", res));
+        Ok(res)
    }


🛠️ Refactor suggestion

add_indicator inserts unconditionally – handle primary-key clashes.
If an indicator with the same deterministic UUID already exists, this insertion will violate the primary-key constraint and bubble an error. Either:

Call add_or_increment_indicator internally, or

Use ON CONFLICT (id) DO UPDATE (diesel::upsert) to increment sightings.

Failing to do so exposes the API to 500s on legitimate duplicate submissions.

🤖 Prompt for AI Agents

In src/node.rs around lines 80 to 98, the add_indicator function inserts a new indicator without handling primary key conflicts, which causes errors on duplicates. Modify the function to either call add_or_increment_indicator internally or implement an upsert using diesel's ON CONFLICT (id) DO UPDATE clause to increment the sightings count on conflict, preventing 500 errors from duplicate submissions.

cadenkoj · 2025-05-22T04:38:27Z

Merge activity

May 22, 4:38 AM UTC: A user started a stack merge that includes this pull request via Graphite.
May 22, 4:38 AM UTC: @cadenkoj merged this pull request with Graphite.

cadenkoj · 2025-05-22T04:52:47Z

Closes: #8, #9

feat(db): rough database implementation

76eb858

cadenkoj marked this pull request as ready for review May 22, 2025 02:40

coderabbitai bot reviewed May 22, 2025

View reviewed changes

cadenkoj mentioned this pull request May 22, 2025

fix: issues with database implementation #20

Merged

cadenkoj merged commit f670038 into main May 22, 2025
2 checks passed

This was referenced May 22, 2025

[FEAT] Database Integration #8

Closed

[FEAT] Indicator Deduplication #9

Closed

		[migrations_directory]
		dir = "/Users/caden/Developer/dtim/migrations"

		DATABASE_URL="postgres://postgres:@localhost:5432/postgres" # For Diesel CLI

		DTIM__DEFAULT__STORAGE__DATABASE_URL=${DATABASE_URL}

feat(db): rough database implementation #19

feat(db): rough database implementation #19

Uh oh!

Conversation

cadenkoj commented May 22, 2025 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

TL;DR

What changed?

How to test?

Why make this change?

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented May 22, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram(s)

Possibly related PRs

Poem

Chat

Support

CodeRabbit Commands (Invoked using PR comments)

Other keywords and placeholders

CodeRabbit Configuration File (.coderabbit.yaml)

Documentation and Community

Uh oh!

cadenkoj commented May 22, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot May 22, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot May 22, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot May 22, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot May 22, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot May 22, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot May 22, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot May 22, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot May 22, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot May 22, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot May 22, 2025

Choose a reason for hiding this comment

Uh oh!

cadenkoj commented May 22, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Merge activity

Uh oh!

Uh oh!

cadenkoj commented May 22, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

cadenkoj commented May 22, 2025 •

edited by coderabbitai bot

Loading

coderabbitai bot commented May 22, 2025 •

edited

Loading

CodeRabbit Configuration File (`.coderabbit.yaml`)

cadenkoj commented May 22, 2025 •

edited

Loading

cadenkoj commented May 22, 2025 •

edited

Loading