Goal: Production-grade distributed package cache Updated: 2026-01-17 Phase: Phase 1 - Centralized Campus Cache (Mostly Complete)
| Item | Value |
|---|---|
| Version | Alpha 0.1.0 |
| Language | C++23 |
| Build | CMake 3.14+ + vcpkg |
| Core Dependency | Boost.Beast 1.89.0 |
| Build Status | ✅ Passing (Windows MSYS2, Linux pending) |
- Real HTTP proxy (NOT redirects) - Fetches files from upstream and returns them directly
- FileCache with:
- HTTP client to fetch from upstream (Boost.Beast)
- Local disk caching with automatic directory creation
- SHA256 checksum verification for file integrity
- Configurable retries (3 attempts, exponential backoff: 1s, 2s, 4s)
- Configurable timeouts (connect: 10s, read: 30s)
- Range Request Support (HTTP 206 Partial Content)
- Parses Range header: bytes=0-1023, bytes=512-, bytes=-256
- Seeks to requested offset in cached files
- Returns proper Content-Range header
- Critical for APT resume functionality
- Conditional Request Support (HTTP 304 Not Modified)
- If-Modified-Since header support
- If-None-Match (ETag) header support
- Returns HTTP 304 when cached version is current
- Saves bandwidth on repeated requests
- ✅ Tested on Windows MSYS2 with GCC 15.2.0
- Single-process hash-based storage
- 9-dimensional index system:
- IP↔ID bidirectional mappings
- Dual timestamp indexing (generation + expiry)
- Shard indexing for package distribution
- Metadata and liveness tracking
- Core operations:
- verify_entry() - O(1) node existence check
- store_entry() - Timestamp-based intelligent storage with conflict resolution
- query_node_ids_by_shard_id() - O(1) shard-based node discovery
- clean_by_expiry_time() - Automated expired node removal
- ✅ All operations tested and working
- Validator integration:
- Request type classification (PlainClient vs Node vs Invalid)
- Custom header detection: pacPrism_node_id, pacPrism_node_signature
- Node identity verification (demo mode)
- DHT HTTP API (5 JSON endpoints):
- GET /api/dht/verify/{node_id} - Check if node exists
- POST /api/dht/store - Store new DHT entry
- GET /api/dht/query?shard_id={id} - Query nodes by shard ID
- POST /api/dht/clean/expiry - Remove expired entries
- POST /api/dht/clean/liveness - Remove unhealthy entries (stub)
- Plain client routes:
- Direct file paths: /debian/pool/main/v/vim/vim_9.0.0.deb
- Query parameter paths: /?target=/debian/pool/...
- Range request support (via FileCache)
- Conditional request support (via FileCache)
- ✅ All routes tested and working
- Request type validation based on node identification headers
- Cross-platform SHA256 calculation:
- Windows: bcrypt.h API
- Linux/macOS: OpenSSL SHA256
- SHA256 verification:
- Case-insensitive hex string comparison
- Used for file integrity validation
- ✅ Tested on Windows and Linux
- Binary package parsing: name_version_arch.extension
- Extracts package name, version, architecture, file extension
- Validates component (main/contrib/non-free)
- Source package parsing:
- .orig.tar.gz/xz format
- .dsc files
- .tar.gz/xz files (without .orig)
- Sets architecture to source
- ✅ All package formats tested
- CMake Presets (debug/release)
- Cross-platform: Windows + Linux
- vcpkg integration (local installation via VCPKG_ROOT)
- Automated dependency installation
- ✅ Build tested on Windows MSYS2 (GCC 15.2.0)
All libraries and executable built successfully:
- pacprism.exe - Main executable
- libconsole_banner.dll - Console banner display
- libconsole_io.dll - Console I/O operations, Config, FileCache (630 lines)
- libconsole_parser.dll - Command-line parser (cxxopts)
- libnetwork_router.dll - HTTP request router (304 lines)
- libnetwork_transmission.dll - HTTP transport layer (115 lines)
- libnode_dht.dll - DHT operations (89 lines)
- libnode_validator.dll - Request validation and SHA256 (294 lines)
- libnode_package.dll - Debian package parser (107 lines)
- Client connection logic for future P2P communication
- Currently just a TODO comment
- Not needed for Phase 1 (centralized proxy)
- Health-based node pruning
- Should remove nodes that fail health checks
- Not critical for Phase 1 (single-node setup)
- Always returns true for now
- TODO: Implement Ed25519 signature verification
- Critical for Phase 2/3 (distributed deployment)
- No node discovery
- No peer-to-peer communication
- No file transfer between nodes
- Current implementation is single-process in-memory only
- No network communication
- No data replication
- No consensus mechanism
- Data structures defined (shard struct in dht_types.hpp)
- No algorithm for grouping packages
- No dependency analysis
- No spatial locality optimization
curl http://localhost:9001/
# Returns: Hello from pacPrism! + version info
# Status: HTTP 200 OKcurl http://localhost:9001/debian/README
# Returns: HTTP 200 with actual file content (not redirect!)
# File is fetched from upstream and cached locallycurl -H "Range: bytes=0-1023" http://localhost:9001/debian/README
# Returns: HTTP 206 Partial Content
# Content-Range: bytes 0-1023/total_size
# Returns first 1KB of file# First request
curl -i http://localhost:9001/debian/README
# Returns: HTTP 200 with ETag and Last-Modified headers
# Second request with ETag
curl -H "If-None-Match: <etag>" http://localhost:9001/debian/README
# Returns: HTTP 304 Not Modified (if file unchanged)
# Saves bandwidth by not re-sending file# Store entry
curl -X POST -H "pacPrism_node_id: test123" -H "pacPrism_node_signature: sig123" \
-H "Content-Type: application/json" \
-d '{"node_id":"test_node","node_ip":"192.168.1.100",...}' \
http://localhost:9001/api/dht/store
# Returns: HTTP 200 with JSON success message
# Verify entry
curl -H "pacPrism_node_id: test123" -H "pacPrism_node_signature: sig123" \
http://localhost:9001/api/dht/verify/test_node
# Returns: HTTP 200 with {"exists": true}
# Query by shard
curl -H "pacPrism_node_id: test123" -H "pacPrism_node_signature: sig123" \
"http://localhost:9001/api/dht/query?shard_id=shard_a"
# Returns: HTTP 200 with list of node IDsCore Functionality:
- ✅ Real APT Proxy - Fetches and returns files (not redirects)
- ✅ File Caching - Persists to disk with SHA256 verification
- ✅ Range Requests - HTTP 206 Partial Content support
- ✅ Conditional Requests - HTTP 304 Not Modified support
- ✅ HTTP Server - Working (Boost.Beast)
- ✅ DHT Index - Working (in-memory only, but feature-complete)
- ✅ Debian Package Parser - Full parsing of binary and source packages
- ✅ Request Validation - Node vs PlainClient classification
- ✅ DHT HTTP API - Complete JSON API with 5 endpoints
What's Missing for Production:
- ❌ Error Handling - Basic retry logic exists, but needs more robust handling
- ❌ Logging - No structured logging (only std::cout/std::cerr)
- ❌ Monitoring - No metrics or observability
- ❌ Security - No authentication (validator is demo mode)
- ❌ Testing - No unit or integration tests (only manual curl tests)
- ❌ Documentation - Architecture docs need updating (being fixed now)
All Phase 2 features are design-only:
- ❌ P2P Protocol
- ❌ Network DHT
- ❌ Node discovery and registration
- ❌ Shard-based package distribution
- ❌ Node-to-node file transfer
All Phase 3 features are design-only:
- ❌ Kademlia DHT
- ❌ Cross-organization peering
- ❌ TLS encryption
- ❌ GPG signature verification
- ❌ Reputation system
-
Structured Logging (P0)
- Replace std::cout with proper logging library
- Add log levels: DEBUG, INFO, WARN, ERROR
- Log rotation and management
- Estimated effort: 1-2 days
-
Metrics and Monitoring (P0)
- Cache hit rate tracking
- Bandwidth savings calculation
- Request latency monitoring
- Health check endpoint
- Estimated effort: 2-3 days
-
Error Handling (P1)
- Better upstream error handling
- Timeout management
- Partial download recovery
- Estimated effort: 1-2 days
- Unit tests for core functionality
- Integration tests with APT
- Production hardening (security hardening, input validation)
- Performance testing and optimization
- Phase 2 Implementation: Node discovery and P2P communication
- Phase 3 Implementation: Kademlia DHT and public network deployment
Last updated: 2026-01-17