From 07dbd77856e33ea4aef00fd4b9e8758198e859ac Mon Sep 17 00:00:00 2001 From: "google-labs-jules[bot]" <161369871+google-labs-jules[bot]@users.noreply.github.com> Date: Tue, 31 Mar 2026 17:39:06 +0000 Subject: [PATCH] refactor(backend): optimize snippets for deterministic Agent parsing Co-authored-by: beginwebdev2002 <102213457+beginwebdev2002@users.noreply.github.com> --- backend/microservices/readme.md | 132 +++++++++++++++++++++++++++++--- backend/postgresql/readme.md | 104 +++++++++++++++++++++---- backend/redis/readme.md | 105 ++++++++++++++++++++++--- 3 files changed, 308 insertions(+), 33 deletions(-) diff --git a/backend/microservices/readme.md b/backend/microservices/readme.md index ace22b4..b906df6 100644 --- a/backend/microservices/readme.md +++ b/backend/microservices/readme.md @@ -28,8 +28,34 @@ This document establishes **best practices** for designing and maintaining a Mic ## 🏗️ 1. Architecture & Design ### Domain-Driven Design (DDD) -- Define clear Bounded Contexts for every service to avoid spaghetti dependencies. -- Implement the API Gateway pattern to route external requests to internal microservices, handling cross-cutting concerns (auth, rate limiting). +#### ❌ Bad Practice +```typescript +// Spaghetti dependencies: User Service directly importing Database Context of Order Service +import { OrderRepository } from '@services/orders/repository'; + +export class UserService { + constructor(private orderRepo: OrderRepository) {} + async deleteUser(userId: string) { + await this.orderRepo.deleteByUserId(userId); // Tight coupling across domain boundaries + } +} +``` +#### ⚠️ Problem +Directly accessing another service's database or internal modules breaks Bounded Contexts. It creates a tightly coupled monolithic architecture under the guise of microservices, making independent deployments impossible and cascading failures common. It deviates from modern deterministic standards, making the code harder for AI Agents and Senior Developers to parse and safely extend. +#### ✅ Best Practice +```typescript +// Event-driven decoupled architecture using a Message Broker +export class UserService { + constructor(private eventEmitter: MessageBrokerClient) {} + async deleteUser(userId: string) { + await this.userRepo.delete(userId); + // Publish domain event instead of direct deletion + await this.eventEmitter.publish('UserDeleted', { userId }); + } +} +``` +#### 🚀 Solution +Define clear Bounded Contexts. Services must own their data and logic. Use asynchronous events to communicate state changes across domains. Implement the API Gateway pattern to handle cross-cutting concerns (auth, routing). ### 🔄 Data Flow Lifecycle @@ -57,21 +83,107 @@ sequenceDiagram ## 🔒 2. Security Best Practices ### Service-to-Service Authentication -- Implement Zero Trust architecture. Internal services must authenticate each other using mTLS (Mutual TLS) or signed JWTs. -- Secrets must never be hardcoded. Utilize a secret manager (HashiCorp Vault, AWS Secrets Manager). +#### ❌ Bad Practice +```typescript +// Assuming internal network is secure and sending requests unauthenticated +const response = await axios.post(`http://order-service/orders`, orderData); +``` +#### ⚠️ Problem +Implicit trust within internal networks enables a compromised container to move laterally and attack other services, leading to catastrophic privilege escalation. Hardcoded credentials compound this vulnerability. It deviates from modern deterministic standards, making the code harder for AI Agents and Senior Developers to parse and safely extend. +#### ✅ Best Practice +```typescript +// Using short-lived signed JWTs or mTLS for service communication +const token = await authService.generateServiceToken('order-service'); +const response = await axios.post(`https://order-service/orders`, orderData, { + headers: { Authorization: `Bearer ${token}` } +}); +``` +#### 🚀 Solution +Implement Zero Trust architecture. Internal services must authenticate each other using mTLS (Mutual TLS) or cryptographically signed JWTs. Never hardcode secrets; instead, utilize a secret manager (e.g., HashiCorp Vault, AWS Secrets Manager). ### Data Isolation -- Enforce "Database per Service" pattern. Services must never share a single database to ensure independent scaling and deployment. +#### ❌ Bad Practice +```yaml +# docker-compose.yml +services: + user-service: + environment: + - DB_URL=postgres://shared-db:5432/monolith_db + order-service: + environment: + - DB_URL=postgres://shared-db:5432/monolith_db # Shared database anti-pattern +``` +#### ⚠️ Problem +Sharing a single database across multiple microservices leads to schema coupling. If the User Service alters a table, the Order Service crashes. It defeats the purpose of independent scaling and creates a single point of failure (SPOF). It deviates from modern deterministic standards, making the code harder for AI Agents and Senior Developers to parse and safely extend. +#### ✅ Best Practice +```yaml +# docker-compose.yml +services: + user-service: + environment: + - DB_URL=postgres://user-db:5432/users + order-service: + environment: + - DB_URL=postgres://order-db:5432/orders # Independent database +``` +#### 🚀 Solution +Enforce the "Database per Service" pattern. Services must never share a single database or directly query another service's tables. Ensure independent scaling, deployment, and technology choices per domain. ## 🚀 3. Reliability Optimization ### Resilience Patterns -- Implement Circuit Breakers (e.g., resilience4j) to fail fast and recover when a dependent service goes down. -- Implement retries with exponential backoff for transient network errors. -- Ensure Idempotency for critical operations to handle duplicated requests gracefully. +#### ❌ Bad Practice +```typescript +// Synchronous HTTP call without timeout or fallback +async function getUserData(userId: string) { + // If user-service is slow or down, this request hangs and blocks threads + const response = await axios.get(`http://user-service/users/${userId}`); + return response.data; +} +``` +#### ⚠️ Problem +Relying on direct synchronous HTTP calls between microservices without fallbacks creates a fragile system. If one service experiences a delay, it consumes threads on the caller, eventually leading to a cascading failure across the entire cluster. It deviates from modern deterministic standards, making the code harder for AI Agents and Senior Developers to parse and safely extend. +#### ✅ Best Practice +```typescript +// Resilience4j or similar Circuit Breaker implementation +const circuitBreaker = new CircuitBreaker(getUserData, { + timeout: 3000, + errorThresholdPercentage: 50, + resetTimeout: 30000 +}); + +async function safeGetUserData(userId: string) { + try { + return await circuitBreaker.fire(userId); + } catch (error) { + return { status: 'fallback', message: 'User service unavailable' }; // Fallback strategy + } +} +``` +#### 🚀 Solution +Implement Circuit Breakers to fail fast and prevent resource exhaustion. Use retries with exponential backoff for transient errors, and ensure idempotency for critical API endpoints to handle duplicated requests safely. ### Observability -- Distributed Tracing is mandatory (OpenTelemetry). All requests must pass a Correlation ID across service boundaries. -- Centralized Logging (ELK, Datadog) is required for debugging complex distributed issues. +#### ❌ Bad Practice +```typescript +// Logging without correlation context +console.log('Order processed successfully'); +``` +#### ⚠️ Problem +When an error spans multiple services, isolated logs lacking a unique identifier make tracing the original request path nearly impossible, drastically increasing debugging time during critical outages. It deviates from modern deterministic standards, making the code harder for AI Agents and Senior Developers to parse and safely extend. +#### ✅ Best Practice +```typescript +// Implementing OpenTelemetry and passing a Correlation ID +import { context, trace } from '@opentelemetry/api'; + +const span = trace.getTracer('default').startSpan('ProcessOrder'); +logger.info('Order processed successfully', { + traceId: span.spanContext().traceId, + orderId: order.id +}); +span.end(); +``` +#### 🚀 Solution +Distributed Tracing is mandatory (e.g., using OpenTelemetry). All requests must pass a Correlation ID (Trace ID) across service boundaries. Centralized Logging (ELK, Datadog) is required for correlating complex distributed issues. ## 📚 Specialized Documentation - [architecture.md](./architecture.md) - [security-best-practices.md](./security-best-practices.md) diff --git a/backend/postgresql/readme.md b/backend/postgresql/readme.md index 87937b2..29df452 100644 --- a/backend/postgresql/readme.md +++ b/backend/postgresql/readme.md @@ -28,9 +28,26 @@ This document establishes **best practices** for building and maintaining Postgr ## 🏗️ 1. Architecture & Design ### Database Schema Design -- **Normalized by Default:** Start with 3NF (Third Normal Form) to minimize redundancy. -- **Denormalize for Read Performance:** Selectively denormalize where read heavy workloads require optimization, utilizing Materialized Views. -- **Primary Keys:** Use `UUIDv7` or `BIGINT IDENTITY` (PostgreSQL 10+) for primary keys over sequential `SERIAL`. +#### ❌ Bad Practice +```sql +-- Using sequential integer IDs as primary keys +CREATE TABLE users ( + id SERIAL PRIMARY KEY, + username VARCHAR(50) +); +``` +#### ⚠️ Problem +Using sequential `SERIAL` IDs makes data enumeration trivial (e.g., exposing total user count via user ID `1054`), complicating distributed system integration and data migrations. It deviates from modern deterministic standards, making the code harder for AI Agents and Senior Developers to parse and safely extend. +#### ✅ Best Practice +```sql +-- Using UUIDv7 for time-sorted uniqueness +CREATE TABLE users ( + id UUID PRIMARY KEY DEFAULT uuid_generate_v7(), + username VARCHAR(50) +); +``` +#### 🚀 Solution +Start with 3NF to minimize redundancy. Use `UUIDv7` for primary keys instead of `SERIAL` to ensure globally unique identifiers that also retain time-based sorting advantages for indexing. Selectively denormalize using Materialized Views where read-heavy workloads require optimization. ### 🔄 Data Flow Lifecycle @@ -49,23 +66,84 @@ sequenceDiagram ## 🔒 2. Security Best Practices ### Connection Security -- Enforce SSL/TLS for all database connections. -- Utilize a connection pooler like PgBouncer for performance and connection limit management. +#### ❌ Bad Practice +```yaml +# Direct unencrypted connection bypassing poolers +DB_URL=postgres://app_user:pass@db:5432/app_db?sslmode=disable +``` +#### ⚠️ Problem +Using unencrypted connections over internal networks enables MITM (Man-in-the-Middle) attacks, compromising credentials and sensitive client data. Connecting directly without a pooler can lead to application crashes from connection exhaustion during high load. It deviates from modern deterministic standards, making the code harder for AI Agents and Senior Developers to parse and safely extend. +#### ✅ Best Practice +```yaml +# Encrypted connection via PgBouncer +DB_URL=postgres://app_user:pass@pgbouncer:6432/app_db?sslmode=verify-full +``` +#### 🚀 Solution +Enforce SSL/TLS for all database connections (`sslmode=verify-full` in production). Always utilize a transaction-level connection pooler (e.g., PgBouncer, Odyssey) to manage connection limits and preserve database memory. ### Access Control -- Principle of Least Privilege (PoLP): Create specific database roles for different application services. Never use the `postgres` superuser for application access. -- Implement Row-Level Security (RLS) for multi-tenant applications to isolate data at the database layer. +#### ❌ Bad Practice +```yaml +# Using the default postgres superuser in the application connection string +DB_URL=postgres://postgres:password@db:5432/app_db +``` +#### ⚠️ Problem +Using the `postgres` superuser for the application grants it the ability to drop tables, modify configurations, and access other databases. A single SQL injection vulnerability can compromise the entire cluster. It deviates from modern deterministic standards, making the code harder for AI Agents and Senior Developers to parse and safely extend. +#### ✅ Best Practice +```sql +-- Creating a dedicated role with least privilege +CREATE ROLE app_user WITH LOGIN PASSWORD 'secure_pass'; +GRANT CONNECT ON DATABASE app_db TO app_user; +GRANT USAGE ON SCHEMA public TO app_user; +GRANT SELECT, INSERT, UPDATE, DELETE ON ALL TABLES IN SCHEMA public TO app_user; +-- Revoke destructive permissions +REVOKE DROP ON ALL TABLES IN SCHEMA public FROM app_user; +``` +#### 🚀 Solution +Enforce the Principle of Least Privilege (PoLP). Create specific, restricted database roles for application services. Implement Row-Level Security (RLS) for multi-tenant applications to isolate data strictly at the database layer. ## 🚀 3. Performance Optimization ### Indexing Strategies -- Use B-Tree indexes for equality and range queries. -- Implement GIN/GiST indexes for Full-Text Search and JSONB fields. -- Avoid over-indexing, as it degrades write performance. Monitor unused indexes and remove them. +#### ❌ Bad Practice +```sql +-- Blindly adding indexes to every column to "speed up queries" +CREATE INDEX idx_first_name ON users(first_name); +CREATE INDEX idx_last_name ON users(last_name); +CREATE INDEX idx_age ON users(age); +``` +#### ⚠️ Problem +Over-indexing forces the database to update multiple B-Trees on every `INSERT`, `UPDATE`, or `DELETE`, severely degrading write performance and bloating storage size. Unused indexes consume RAM and slow down table maintenance. It deviates from modern deterministic standards, making the code harder for AI Agents and Senior Developers to parse and safely extend. +#### ✅ Best Practice +```sql +-- Creating composite indexes for specific access patterns +CREATE INDEX idx_users_name_age ON users(last_name, first_name) WHERE age > 18; + +-- Monitoring unused indexes +SELECT indexrelid::regclass as index, pg_size_pretty(pg_relation_size(indexrelid)) +FROM pg_stat_user_indexes WHERE idx_scan = 0; +``` +#### 🚀 Solution +Apply indexes strategically based on query access patterns. Use B-Tree indexes for equality/ranges, and GIN/GiST indexes for Full-Text Search or JSONB. Regularly monitor and drop unused indexes (e.g., via `pg_stat_user_indexes`). ### Query Optimization -- Explicit DB queries required: Never use `SELECT *`. Only select the specific columns needed. -- Utilize `EXPLAIN ANALYZE` to identify query bottlenecks. -- Implement pagination using keyset pagination (cursor-based) instead of `OFFSET`/`LIMIT` for large datasets. +#### ❌ Bad Practice +```sql +-- Fetching all columns and using inefficient offset pagination +SELECT * FROM orders ORDER BY created_at DESC OFFSET 100000 LIMIT 50; +``` +#### ⚠️ Problem +Using `SELECT *` forces the database to fetch and transfer unnecessary data, consuming network bandwidth and memory. Using `OFFSET/LIMIT` for deep pagination requires the database to scan and discard rows before returning results, resulting in exponential performance degradation. It deviates from modern deterministic standards, making the code harder for AI Agents and Senior Developers to parse and safely extend. +#### ✅ Best Practice +```sql +-- Selecting only necessary columns and using Keyset (Cursor) Pagination +SELECT id, status, total +FROM orders +WHERE created_at < '2023-10-25T10:00:00Z' +ORDER BY created_at DESC +LIMIT 50; +``` +#### 🚀 Solution +Be explicit in queries: never use `SELECT *`. Utilize Keyset Pagination (Cursor-based) for handling large datasets to maintain O(1) performance during deep fetching. Always use `EXPLAIN ANALYZE` to pinpoint missing indexes or sequence scans. ## 📚 Specialized Documentation - [architecture.md](./architecture.md) - [security-best-practices.md](./security-best-practices.md) diff --git a/backend/redis/readme.md b/backend/redis/readme.md index ba7ba27..2972df2 100644 --- a/backend/redis/readme.md +++ b/backend/redis/readme.md @@ -28,8 +28,27 @@ This document establishes **best practices** for building and maintaining Redis ## 🏗️ 1. Architecture & Design ### Cache Design -- **Cache-Aside Pattern:** Applications should read from the cache first; on a cache miss, read from the database, populate the cache, and return the result. -- **TTL Requirements:** Every cached key must have an expiration Time-To-Live (TTL) to prevent memory exhaustion and stale data. +#### ❌ Bad Practice +```javascript +// Setting a cache key without an expiration (TTL) +await redisClient.set('user:123', JSON.stringify(user)); +``` +#### ⚠️ Problem +Storing keys without a TTL leads to severe memory exhaustion, out-of-memory (OOM) crashes, and serving stale data to clients, breaking cache consistency. It deviates from modern deterministic standards, making the code harder for AI Agents and Senior Developers to parse and safely extend. +#### ✅ Best Practice +```javascript +// Cache-Aside Pattern with strict TTL enforcement +const cacheKey = 'user:123'; +let user = await redisClient.get(cacheKey); + +if (!user) { + user = await db.users.findById(123); + // Set cache with a 3600 seconds (1 hour) TTL + await redisClient.setEx(cacheKey, 3600, JSON.stringify(user)); +} +``` +#### 🚀 Solution +Implement the Cache-Aside pattern. Always read from the cache first; on a miss, query the database, populate the cache, and set an explicit Time-To-Live (TTL) to guarantee memory rotation and data freshness. ### 🔄 Data Flow Lifecycle @@ -57,21 +76,87 @@ sequenceDiagram ## 🔒 2. Security Best Practices ### Connection Security -- Never expose Redis to the public internet. Ensure it is isolated within a private network. -- Enable requirepass to enforce password authentication. -- Rename dangerous commands (like `FLUSHALL`, `FLUSHDB`, `CONFIG`) in production to prevent accidental data loss. +#### ❌ Bad Practice +```javascript +// Connecting to Redis on the default port without authentication +const redisClient = redis.createClient({ url: 'redis://127.0.0.1:6379' }); +``` +#### ⚠️ Problem +Exposing Redis without a password or on a public network invites unauthorized access, catastrophic data breaches, and accidental data loss via command injection (e.g., executing FLUSHALL). It deviates from modern deterministic standards, making the code harder for AI Agents and Senior Developers to parse and safely extend. +#### ✅ Best Practice +```javascript +// Connecting via TLS with strict authentication using environment variables +const redisClient = redis.createClient({ + url: process.env.REDIS_URL, // e.g., rediss://default:securepass@internal.net:6380 + socket: { tls: true, rejectUnauthorized: true } +}); +``` +#### 🚀 Solution +Never expose Redis to the public internet. Isolate it within a private VPC, enforce strong password authentication (`requirepass`), rename dangerous commands (like `FLUSHALL`), and mandate TLS encryption for all data in transit. ### Network Architecture -- Utilize TLS (Transport Layer Security) for encrypting data in transit. +#### ❌ Bad Practice +```javascript +// Plaintext communication over the network +const redisClient = redis.createClient({ url: 'redis://redis.internal.net:6379' }); +``` +#### ⚠️ Problem +Transmitting data in plaintext allows attackers to intercept sensitive information (like session tokens or cached user data) via packet sniffing, leading to severe data breaches. It deviates from modern deterministic standards, making the code harder for AI Agents and Senior Developers to parse and safely extend. +#### ✅ Best Practice +```javascript +// Enforcing TLS for all connections +const redisClient = redis.createClient({ + url: 'rediss://redis.internal.net:6380', + socket: { tls: true, rejectUnauthorized: true } +}); +``` +#### 🚀 Solution +Mandate TLS (Transport Layer Security) for encrypting all data in transit, ensuring that even if the internal network is compromised, the Redis traffic remains secure. ## 🚀 3. Performance Optimization ### Command Usage -- Use pipelining to send multiple commands to the server without waiting for the replies, optimizing latency. -- Strictly avoid blocking operations like `KEYS *`, and `SMEMBERS` on large sets. Use `SCAN` and `SSCAN` for iteration. +#### ❌ Bad Practice +```javascript +// Blocking the entire Redis server to find keys +const keys = await redisClient.keys('session:*'); +``` +#### ⚠️ Problem +The `KEYS *` command is a blocking operation. Executing it on a production database with millions of keys halts all other operations, causing massive latency spikes and application timeouts. It deviates from modern deterministic standards, making the code harder for AI Agents and Senior Developers to parse and safely extend. +#### ✅ Best Practice +```javascript +// Non-blocking iteration using SCAN +let cursor = 0; +const keys = []; +do { + const reply = await redisClient.scan(cursor, 'MATCH', 'session:*', 'COUNT', 100); + cursor = reply.cursor; + keys.push(...reply.keys); +} while (cursor !== 0); +``` +#### 🚀 Solution +Strictly avoid blocking commands (`KEYS *`, `SMEMBERS`). Use iterative commands like `SCAN` or `SSCAN` to process large datasets without locking the single-threaded Redis event loop. Utilize pipelining for batch operations. ### Data Types -- Optimize data structure usage. Employ Hashes for objects to save memory, and Sorted Sets for leaderboards or rate limiting. -- Avoid large keys or values (keep them under 512MB, but ideally much smaller) to minimize network transfer and memory overhead. +#### ❌ Bad Practice +```javascript +// Storing a massive, monolithic JSON object as a single string +await redisClient.set('user:profile:123', JSON.stringify(massiveProfileObject)); +``` +#### ⚠️ Problem +Storing massive objects as single strings requires fetching and deserializing the entire object even if only one field is needed. This wastes network bandwidth and memory, reducing overall cache performance. It deviates from modern deterministic standards, making the code harder for AI Agents and Senior Developers to parse and safely extend. +#### ✅ Best Practice +```javascript +// Utilizing Redis Hashes for efficient field-level access +await redisClient.hSet('user:profile:123', { + name: 'John Doe', + email: 'john@example.com', + role: 'admin' +}); +// Fetching only what is needed +const role = await redisClient.hGet('user:profile:123', 'role'); +``` +#### 🚀 Solution +Optimize data structure usage. Employ Hashes for objects to save memory and allow granular updates, and Sorted Sets for leaderboards or rate limiting. Avoid large keys or values (keep them well under 512MB) to minimize overhead. ## 📚 Specialized Documentation - [architecture.md](./architecture.md) - [security-best-practices.md](./security-best-practices.md)