Proactive Optimization & Learning Architecture for Resilient Intelligent Systems
POLARIS is a comprehensive framework for building self-adaptive systems that can monitor, analyze, plan, and execute adaptations autonomously. It implements the MAPE-K (Monitor, Analyze, Plan, Execute over a Knowledge base) loop with advanced AI/ML capabilities, providing a robust foundation for research and production adaptive systems.
POLARIS follows a layered, event-driven architecture with clear separation of concerns:
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β POLARIS Framework β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ€
β π― Control & Reasoning Layer β
β βββββββββββββββββββ βββββββββββββββββββ βββββββββββββββββββ β
β β Adaptive β β Agentic β β Meta β β
β β Controller β β Reasoner β β Learner β β
β β β’ MAPE-K Loop β β β’ LLM-based β β β’ Strategy β β
β β β’ Strategy β β β’ Tool Usage β β Learning β β
β β Selection β β β’ Autonomous β β β’ Parameter β β
β β β’ Orchestration β β Reasoning β β Tuning β β
β βββββββββββββββββββ βββββββββββββββββββ βββββββββββββββββββ β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ€
β π§ Digital Twin Layer β
β βββββββββββββββββββ βββββββββββββββββββ βββββββββββββββββββ β
β β World Model β β Knowledge Base β β Learning Engine β β
β β β’ Bayesian β β β’ Time Series β β β’ Pattern β β
β β β’ LLM-based β β β’ Graph DB β β Recognition β β
β β β’ Statistical β β β’ Document β β β’ Reinforcement β β
β β β’ Hybrid β β Store β β Learning β β
β βββββββββββββββββββ βββββββββββββββββββ βββββββββββββββββββ β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ€
β π Adapter Layer β
β βββββββββββββββββββ βββββββββββββββββββ βββββββββββββββββββ β
β β Monitor β β Execution β β Verification β β
β β Adapter β β Adapter β β Adapter β β
β β β’ Telemetry β β β’ Action β β β’ Safety β β
β β Collection β β Execution β β Constraints β β
β β β’ Metric β β β’ Result β β β’ Policy β β
β β Processing β β Publishing β β Enforcement β β
β βββββββββββββββββββ βββββββββββββββββββ βββββββββββββββββββ β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ€
β ποΈ Infrastructure Layer β
β βββββββββββββββββββ βββββββββββββββββββ βββββββββββββββββββ β
β β Message Bus β β Data Storage β β Observability β β
β β β’ NATS β β β’ Time Series β β β’ Structured β β
β β β’ Event β β β’ Graph DB β β Logging β β
β β Streaming β β β’ Document β β β’ Metrics β β
β β β’ Pub/Sub β β Store β β β’ Tracing β β
β βββββββββββββββββββ βββββββββββββββββββ βββββββββββββββββββ β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ€
β π§ Plugin Interface β
β βββββββββββββββββββ βββββββββββββββββββ βββββββββββββββββββ β
β β SWIM Plugin β β SWITCH Plugin β β Custom Plugins β β
β β β’ Web Service β β β’ ML Model β β β’ Your System β β
β β Simulation β β Switching β β Integration β β
β β β’ Server β β β’ YOLO β β β’ HTTP/TCP/ β β
β β Scaling β β Variants β β Custom β β
β βββββββββββββββββββ βββββββββββββββββββ βββββββββββββββββββ β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
- Python 3.8+ with pip
- NATS Server (included in
polaris_poc/bin/or install separately) - Virtual Environment (recommended)
# Clone the repository
git clone <repository-url>
cd POLARIS
# Set up virtual environment
python -m venv .venv
source .venv/bin/activate # On Windows: .venv\Scripts\activate
# Install dependencies
cd polaris_poc
pip install -r requirements.txt
# Set up environment variables
export GEMINI_API_KEY="your-gemini-api-key" # For LLM-based components
export NATS_URL="nats://localhost:4222"The SWIM (Simulated Web Infrastructure Manager) system demonstrates POLARIS's full capabilities:
# Option 1: Automated startup with tmux (Recommended)
./start_polaris_swim_system.sh
# Option 2: Manual component startup
# Terminal 1: Start NATS server
./bin/nats-server --port 4222
# Terminal 2: Start Knowledge Base
python src/scripts/start_component.py knowledge-base
# Terminal 3: Start Digital Twin with Bayesian World Model
python src/scripts/start_component.py digital-twin --world-model bayesian
# Terminal 4: Start Verification Adapter
python src/scripts/start_component.py verification --plugin-dir extern
# Terminal 5: Start Kernel (coordination)
python src/scripts/start_component.py kernel
# Terminal 6: Start Monitor Adapter (SWIM telemetry)
python src/scripts/start_component.py monitor --plugin-dir extern
# Terminal 7: Start Execution Adapter (SWIM actions)
python src/scripts/start_component.py execution --plugin-dir extern
# Terminal 8: Start Agentic Reasoner (AI decision making)
python src/scripts/start_component.py agentic-reasoner --use-bayesian-world-model
# Terminal 9: Start Meta Learner (strategy optimization)
python src/scripts/start_component.py meta-learnerThe SWITCH system demonstrates ML model adaptation with YOLO variants:
# Start SWITCH system components
./start_switch_system.sh
# Or manually:
python src/scripts/start_component.py monitor --plugin-dir extern/switch_plugin
python src/scripts/start_component.py execution --plugin-dir extern/switch_plugin
python src/scripts/start_component.py digital-twin --world-model bayesian
python extern/switch_plugin/run_switch_kernel.py# Monitor all POLARIS messages
python src/scripts/nats_spy.py
# Monitor specific message types
python src/scripts/nats_spy.py --preset telemetry
python src/scripts/nats_spy.py --preset execution
python src/scripts/nats_spy.py --subjects "polaris.verification.>"
# Check component health
./start_polaris_swim_system.sh --check-healthThe complete, production-ready implementation with all features:
polaris_poc/
βββ src/polaris/ # Core framework
β βββ adapters/ # System interface adapters
β β βββ monitor.py # Telemetry collection
β β βββ execution.py # Action execution
β β βββ verification.py # Safety validation
β βββ agents/ # AI/ML reasoning agents
β β βββ agentic_reasoner.py # LLM-based reasoning
β β βββ digital_twin_agent.py # Digital twin management
β β βββ meta_learner_agent.py # Strategy learning
β βββ controllers/ # Control strategies
β β βββ fast_controller.py # Reactive control
β β βββ slow_controller.py # Deliberative control
β βββ kernel/ # Core coordination
β β βββ kernel.py # MAPE-K orchestration
β βββ models/ # Data models & world models
β β βββ world_model.py # Abstract world model
β β βββ bayesian_world_model.py # Bayesian implementation
β β βββ gemini_world_model.py # LLM-based implementation
β βββ services/ # gRPC services
β βββ digital_twin_service.py # Digital twin API
βββ extern/ # Managed system plugins
β βββ swim/ # SWIM exemplar system
β βββ switch/ # SWITCH ML system
β βββ switch_plugin/ # SWITCH POLARIS plugin
βββ config/ # System configurations
β βββ swim_optimized_config.yaml # SWIM-specific config
β βββ switch_optimized_config.yaml # SWITCH-specific config
βββ examples/ # Usage examples & demos
βββ tests/ # Comprehensive test suite
βββ docs/ # Detailed documentation
Clean architecture implementation following enterprise patterns:
polaris_refactored/
βββ src/
β βββ framework/ # Core framework services
β β βββ configuration/ # Config management
β β βββ plugin_management/ # Plugin system
β βββ adapters/ # Adapter implementations
β β βββ monitor_adapter/ # Monitoring strategies
β β βββ execution_adapter/ # Execution pipelines
β βββ digital_twin/ # Digital twin components
β β βββ world_model.py # World model interface
β β βββ knowledge_base.py # Knowledge management
β β βββ learning_engine.py # Learning algorithms
β βββ control_reasoning/ # Control & reasoning
β β βββ adaptive_controller.py # MAPE-K controller
β β βββ reasoning_engine.py # Multi-strategy reasoning
β βββ infrastructure/ # Infrastructure services
β β βββ message_bus.py # Event messaging
β β βββ data_storage/ # Data persistence
β βββ domain/ # Domain models
βββ plugins/ # System plugins
βββ swim/ # SWIM plugin
βββ switch/ # SWITCH plugin
Collects telemetry from managed systems and publishes to NATS.
Features:
- Plugin-driven metric collection
- Batch and streaming telemetry
- Derived metric calculations
- Configurable collection strategies
- Error handling and retry logic
Usage:
python src/scripts/start_component.py monitor --plugin-dir extern --log-level DEBUGExecutes control actions on managed systems with safety validation.
Features:
- Action validation and precondition checking
- Parameter type and range validation
- Concurrent execution control
- Result publishing and metrics
- Queue management with throttling
Usage:
python src/scripts/start_component.py execution --plugin-dir externValidates control actions before execution to ensure safety and policy compliance.
Features:
- Safety constraint checking
- Organizational policy enforcement
- Digital Twin integration for predictive verification
- Multi-level verification (Basic, Policy, Formal, Comprehensive)
- Comprehensive violation reporting
Usage:
# With plugin constraints
python src/scripts/start_component.py verification --plugin-dir extern
# Standalone with built-in defaults
python src/scripts/start_component.py verificationProvides intelligent system modeling and predictive capabilities.
Features:
- NATS Ingestion: Processes telemetry and execution events
- gRPC Services: Query, Simulation, Diagnosis, Management APIs
- World Model: Pluggable AI/ML implementations
- Real-time Processing: Batch processing with configurable timeouts
World Model Options:
mock: Simple implementation for testingbayesian: Deterministic Bayesian/Kalman filter modelgemini: Google Gemini LLM-based modelstatistical: Statistical analysis modelhybrid: Combination approach
Usage:
# Start with Bayesian world model (recommended)
python src/scripts/start_component.py digital-twin --world-model bayesian
# Start with Gemini LLM model
python src/scripts/start_component.py digital-twin --world-model gemini
# Health check
python src/scripts/start_component.py digital-twin --health-checkAdvanced LLM-based reasoning agent with autonomous tool usage.
Features:
- Improved gRPC client with circuit breaker
- Automatic retry with exponential backoff
- Performance monitoring and metrics
- Autonomous tool usage (Knowledge Base, Digital Twin)
- Bayesian world model integration
Usage:
# Basic startup with improved gRPC
python src/scripts/start_component.py agentic-reasoner
# With Bayesian world model integration
python src/scripts/start_component.py agentic-reasoner --use-bayesian-world-model
# With performance monitoring
python src/scripts/start_component.py agentic-reasoner --monitor-performanceLearns and adapts reasoning strategies over time.
Features:
- Strategy learning from adaptation outcomes
- Parameter tuning based on performance
- Pattern recognition in system behavior
- Continuous improvement of adaptation policies
Usage:
python src/scripts/start_component.py meta-learnerCentral coordination and MAPE-K loop orchestration.
Features:
- MAPE-K loop implementation
- Component coordination
- Action routing and verification
- State management
Usage:
python src/scripts/start_component.py kernelPOLARIS uses a plugin architecture to integrate with different managed systems. Each plugin implements the ManagedSystemConnector interface.
- Create Plugin Directory:
mkdir my_system_plugin
cd my_system_plugin
touch __init__.py- Define Configuration (
config.yaml):
system_name: "my_system"
implementation:
connector_class: "connector.MySystemConnector"
connection:
protocol: "http"
host: "localhost"
port: 8080
monitoring:
metrics:
- name: "status"
command: "GET /health"
unit: "boolean"
execution:
actions:
- type: "RESTART"
command: "POST /restart"- Implement Connector (
connector.py):
from polaris.adapters.core import ManagedSystemConnector
class MySystemConnector(ManagedSystemConnector):
async def connect(self):
# Implementation here
pass
async def execute_command(self, command, params=None):
# Implementation here
pass- Test Plugin:
python src/scripts/start_component.py monitor --plugin-dir my_system_plugin --validate-onlySystem: Simulated Web Infrastructure Manager Purpose: Web service simulation with server scaling and QoS controls Actions: ADD_SERVER, REMOVE_SERVER, SET_DIMMER Metrics: Response times, throughput, server utilization, arrival rate
System: ML Model Switching System Purpose: YOLO model adaptation for optimal utility Actions: SWITCH_MODEL (between YOLOv5 variants) Metrics: Processing time, confidence, utility, CPU usage
python examples/verification_demo.pyInteractive demonstration of the verification system with various constraint scenarios.
python examples/agentic_reasoner_demo.pyShows LLM-based reasoning capabilities with different system scenarios.
python examples/production_usage_example.pyComplete production deployment examples with monitoring and alerting.
# Monitor all POLARIS messages
python src/scripts/nats_spy.py
# Monitor specific subjects
python src/scripts/nats_spy.py --subjects "polaris.telemetry.>" "polaris.execution.>"
# Show full message content
python src/scripts/nats_spy.py --show-data
# Use presets
python src/scripts/nats_spy.py --preset telemetry
python src/scripts/nats_spy.py --preset executionpolaris.telemetry.events.stream- Individual telemetry eventspolaris.telemetry.events.batch- Batched telemetry eventspolaris.execution.actions- Control actions to executepolaris.execution.results- Action execution resultspolaris.verification.requests- Verification requestspolaris.verification.results- Verification resultspolaris.digitaltwin.*- Digital twin communications
- Query Service (
:50051): Current and historical system state - Simulation Service: Predictive "what-if" analysis
- Diagnosis Service: Root cause analysis
- Management Service: Health checks and metrics
- Telemetry processing throughput
- Adaptation decision latency
- Action execution success rates
- Verification approval/rejection rates
- World model prediction accuracy
Main configuration in src/config/polaris_config.yaml:
- NATS connection settings
- Telemetry batching parameters
- Component timeouts and retries
- Logging configuration
config/swim_optimized_config.yaml- SWIM system optimizationconfig/switch_optimized_config.yaml- SWITCH system optimizationconfig/bayesian_world_model_config.yaml- Bayesian model parameters
Each plugin has its own config.yaml with:
- System identification and metadata
- Connection parameters
- Metric definitions and collection strategies
- Action definitions and validation rules
- Verification constraints and policies
# Run all tests
python scripts/run_tests.py
# Run only non-async tests (fast)
python scripts/run_tests.py --non-async
# Run specific test file
python -m pytest tests/test_verification_adapter.py -v
# Run with coverage
python -m pytest tests/ --cov=src/polaris --cov-report=html# Validate all configurations
python src/scripts/start_component.py all --plugin-dir extern --validate-only
# Test component startup
python src/scripts/start_component.py digital-twin --dry-run# Analyze performance metrics
python scripts/analyze_performance.py logs/polaris_metrics.log-
NATS Connection Failures
# Check NATS server status nc -z localhost 4222 # Start NATS server ./bin/nats-server --port 4222
-
Plugin Not Found
# Validate plugin configuration python src/scripts/start_component.py monitor --plugin-dir extern --validate-only -
Digital Twin gRPC Errors
# Check Digital Twin health python src/scripts/start_component.py digital-twin --health-check -
Gemini API Issues
# Verify API key echo $GEMINI_API_KEY # Test API connectivity python examples/test_interactive_api_key.py
# Enable debug logging for any component
python src/scripts/start_component.py <component> --log-level DEBUG# Check system health
./start_polaris_swim_system.sh --check-health
# Show component logs
./start_polaris_swim_system.sh --show-logs digital-twindocs/COMPONENT_STARTUP_GUIDE.md- Complete component startup referencedocs/DIGITAL_TWIN_IMPLEMENTATION_SUMMARY.md- Digital Twin architecturedocs/VERIFICATION_IMPLEMENTATION_SUMMARY.md- Verification system detailsdocs/verification_agent_guide.md- Verification usage guide
- gRPC service definitions in
src/polaris/proto/ - Comprehensive docstrings throughout codebase
- Configuration schema documentation
polaris_refactored/doc/design.md- System design principlespolaris_refactored/doc/requirements.md- Detailed requirements
# Install development dependencies
pip install -r requirements.txt
pip install -r requirements_dev.txt
# Run code quality checks
python scripts/validate_configs.py
python scripts/run_tests.py
# Generate protocol buffers
python scripts/generate_proto.py- Follow PEP 8 guidelines
- Use type hints throughout
- Comprehensive docstrings for all public APIs
- Structured logging with correlation IDs
- Unit tests for all new components
- Integration tests for system interactions
- Performance tests for critical paths
- Configuration validation tests
This project is licensed under the MIT License - see the LICENSE file for details.
- Carnegie Mellon University ABLE Team - SWIM exemplar system
- NATS.io - High-performance messaging system
- Google Gemini - LLM capabilities for intelligent reasoning
- gRPC - High-performance RPC framework
For questions, issues, or contributions:
- Check the documentation
- Review troubleshooting guide
- Run health checks and validation
- Create an issue with detailed logs and configuration
POLARIS - Building the future of adaptive systems, one adaptation at a time. π