Releases: ringo380/inferno
Inferno 0.10.6
Inferno 0.10.6
[Unreleased]
Installation
Quick Install (Linux/macOS)
curl -L https://github.com/ringo380/inferno/releases/download/v0.10.6/inferno-linux-x86_64.tar.gz | tar xz
sudo mv inferno /usr/local/bin/Manual Download
Download the appropriate binary for your platform from the assets below.
Verification
All release binaries include SHA256 checksums for verification:
sha256sum -c inferno-*.sha256What's Changed
See the changelog above for detailed changes in this release.
Inferno 0.10.5
Inferno 0.10.5
[Unreleased]
Installation
Quick Install (Linux/macOS)
curl -L https://github.com/ringo380/inferno/releases/download/v0.10.5/inferno-linux-x86_64.tar.gz | tar xz
sudo mv inferno /usr/local/bin/Manual Download
Download the appropriate binary for your platform from the assets below.
Verification
All release binaries include SHA256 checksums for verification:
sha256sum -c inferno-*.sha256What's Changed
See the changelog above for detailed changes in this release.
Inferno 0.10.4
Inferno 0.10.4
[Unreleased]
Installation
Quick Install (Linux/macOS)
curl -L https://github.com/ringo380/inferno/releases/download/v0.10.4/inferno-linux-x86_64.tar.gz | tar xz
sudo mv inferno /usr/local/bin/Manual Download
Download the appropriate binary for your platform from the assets below.
Verification
All release binaries include SHA256 checksums for verification:
sha256sum -c inferno-*.sha256What's Changed
See the changelog above for detailed changes in this release.
v0.10.3
What's Changed
Added
- Metrics System: Added generic counter and gauge support to MetricsCollector
- New
increment_counter()andrecord_gauge()public methods - Custom metrics included in MetricsSnapshot and Prometheus export
- Thread-safe implementation using
Arc<RwLock<...>>
- New
- Token Sampling: Implemented proper RNG-based token sampling for inference
Fixed
- Dashboard API: Fixed user permissions serialization (convert to strings)
- Response Cache: Resolved deadlock issue and re-enabled cache tests
- Batch Processing: Fixed cron parsing to use correct
fromparameter
Changed
- CLI Middleware: Metrics middleware now records to MetricsCollector instead of just logging
- Code Style: Applied consistent formatting across codebase
- Rust Edition: Upgraded to Rust edition 2024
Full Changelog: v0.10.2...v0.10.3
v0.10.1 - Dashboard UI & CI Fixes
What's Changed
🐛 Fixed
- Dashboard UI: Added
MainLayoutwrapper to 9 pages that were missing sidebar, header, and proper margins:- batch, monitoring, observability, performance, pipeline, security, settings, tenants, versioning
- CI Pipeline: Optimized GitHub Actions to reduce CI minutes usage
- CI Pipeline: Fixed cross-platform build failures
- Code Quality: Resolved clippy warnings for CI linting compliance
📦 Changed
- Updated dashboard Cargo.lock dependencies
Full Changelog: v0.10.0...v0.10.1
Phase 5: Production Deployment & Scaling
Phase 5: Production Deployment & Scaling Complete 🚀
Overview
Phase 5 completes the production deployment and scaling infrastructure for Inferno v0.8.0, adding comprehensive Helm charts, monitoring, enterprise authentication, and advanced caching & optimization. This phase enables production-ready deployments across dev/staging/prod environments.
Phase 5B: Helm Charts & Multi-Environment Configuration
Commit: 8041fae
Features
-
Production-Grade Helm Chart (17 files, 2,330 lines)
- Complete Kubernetes deployment templates
- Configurable for dev/staging/production
- Health probes (startup, readiness, liveness)
- Pod anti-affinity and resource quotas
- RBAC and NetworkPolicy
-
Environment-Specific Values
- Development (1 replica, debug logging, minimal resources)
- Staging (2 replicas, info logging, moderate resources)
- Production (3+ replicas, HPA, strict security)
-
Storage & Scaling
- PersistentVolumeClaims (models, cache, queue)
- Horizontal Pod Autoscaler (2-10 replicas)
- Pod Disruption Budget (min 2 available)
Phase 5C: Monitoring & Observability
Commit: 53b1d99
Features
-
Prometheus Configuration (4 files, 2,643 lines)
- Global scrape config with Kubernetes SD
- 20+ alert rules (critical, warning, info)
- 10 recording rules for dashboard performance
-
Grafana Dashboard
- 8-panel overview (status, latency, errors, queue, etc.)
- Real-time metrics visualization
- Auto-import capability
-
Alert Thresholds
- Critical: Pod down (2min), queue >500, memory critical, disk <5%
- Warning: High latency (P95 >1s), error rate >5%, queue >100
- Info: Cache hit rate <60%, rate limiting
Phase 5D: Enterprise Authentication & Multi-Tenancy
Commit: 7383ae3
Features
-
OAuth2 Integration (5 providers, 2,257 lines)
- Google, GitHub, Okta, Auth0, Azure AD
- JWT validation with signature, expiration, audience checks
- Secure session management (HttpOnly, Secure, SameSite cookies)
-
Multi-Tenancy
- Tenant identification: JWT claim → header → hostname → domain
- Data isolation: Schema-level separation (SQL injection proof)
- Queue and cache isolation per tenant
- Resource quotas per tenant (rate limiting, concurrent requests)
-
RBAC (5 default roles)
- admin, developer, analyst, service, guest
- Permission-based model (resource + action + scope)
- Role claim mapping from OAuth2
-
API Key Management
- Ed25519 keys (256-bit security)
- 90-day rotation with 7-day grace period
- Scope restriction and optional IP whitelist
- Audit trail (creation, usage, rotation)
Phase 5E: Advanced Caching & Optimization
Commit: 14771e4
Features
-
Hybrid Cache System (6 files, 2,303 lines)
- L1: In-memory (500MB, LRU, Zstd compression)
- L2: Disk (100GB, persistent, 24-hour TTL)
- 4 eviction policies (LRU, LFU, Random, FIFO)
- Cache warm-up on startup
-
Cache Types
- Response cache (API responses)
- Inference cache (model outputs, deterministic only)
- Embedding cache (24-hour retention)
- Prompt cache (tokenized prompts)
- KV cache (attention weights)
-
Performance Optimization (5 profiles)
- Latency-optimized: P50 50-100ms, P99 200-500ms
- Throughput-optimized: 1000+ req/s
- Balanced (default): 100-300 req/s
- Memory-constrained: 2-4GB per replica
- GPU-accelerated: 100-500 req/s per GPU, 5-10x speedup vs CPU
-
Advanced Techniques
- Token batching (batch_size: 3, adaptive)
- Speculative decoding (+20-40% throughput)
- Request batching and deduplication
- Context caching
- CPU affinity and memory pooling
Key Metrics
Performance Improvements
- Latency: 5x faster (500ms → 100ms P50) with caching + optimization
- Throughput: 3-5x faster (100 → 300-500 req/s)
- Cache Hit Rate: >80% in production
- GPU Speedup: 5-10x faster vs CPU
- Memory: +10% for caching infrastructure
Infrastructure
- Helm Chart: 17 templates, 100+ configurable options
- Monitoring: 20+ alerts, 10 recording rules, 8-panel dashboard
- Auth: 5 OAuth2 providers, multi-tenancy support
- Caching: Hybrid L1/L2, 5 profiles, multiple eviction policies
Documentation
Comprehensive Guides (2000+ lines)
- OPTIMIZATION_GUIDE.md: Performance tuning, profiling, benchmarking
- ENTERPRISE_AUTH_GUIDE.md: OAuth2 setup, RBAC, multi-tenancy
- MONITORING_GUIDE.md: Prometheus, Grafana, alerting setup
- Helm Chart README.md: Configuration, deployment examples
- Performance README.md: Cache strategies, optimization profiles
Statistics
Code
- Total Phase 5 files: 41 files
- Total Phase 5 lines: 9,533 lines of production code
- Commits: 4 major commits
- Documentation: 2000+ lines
By Phase
- Phase 5B: 17 files, 2,330 lines (Helm)
- Phase 5C: 10 files, 2,643 lines (Monitoring)
- Phase 5D: 7 files, 2,257 lines (Auth)
- Phase 5E: 6 files, 2,303 lines (Caching)
Deployment Ready
Phase 5 is production-ready with:
- ✅ Multi-environment support (dev/staging/prod)
- ✅ Enterprise authentication (OAuth2 + RBAC)
- ✅ Multi-tenant isolation and quotas
- ✅ Real-time monitoring and alerting
- ✅ Advanced caching and optimization
- ✅ Horizontal and vertical scaling
- ✅ High availability (3+ replicas, PDB)
- ✅ Comprehensive documentation
How to Deploy
Development
helm install inferno ./helm/inferno -f helm/inferno/values-dev.yamlStaging
helm install inferno ./helm/inferno \
-f helm/inferno/values-staging.yaml \
-n inferno-staging --create-namespaceProduction (Full Features)
helm install inferno ./helm/inferno \
-f helm/inferno/values-prod.yaml \
-n inferno-prod --create-namespace \
--set auth.oauth2.enabled=true \
--set auth.oauth2.providers.google.enabled=true \
--set auth.multiTenancy.enabled=true \
--set monitoring.serviceMonitor.enabled=trueWhat's Included
- ✅ Production Helm chart with 100+ configuration options
- ✅ 20+ Prometheus alert rules with proper thresholds
- ✅ Grafana dashboard for real-time monitoring
- ✅ OAuth2 integration (5 providers)
- ✅ Multi-tenancy with RBAC
- ✅ Advanced hybrid caching (L1/L2)
- ✅ 5 optimization profiles
- ✅ Comprehensive benchmarking suite
- ✅ Complete documentation and guides
Contributors
Thank you to the Inferno team for completing Phase 5 production infrastructure! 🎉
Version: Inferno v0.8.0 + Phase 5
Release Date: 2024-Q4
Status: Production Ready
v0.9.0 - Technical Debt & Cross-Platform Upgrades
🔥 Inferno v0.9.0 - Technical Debt & Cross-Platform Upgrades
This release completes a comprehensive 7-phase technical debt remediation effort, delivering cross-platform upgrade support and significantly improving code quality across the entire codebase.
🎯 Highlights
- Cross-Platform Auto-Updates: Full upgrade support for Windows, Linux, and macOS
- 32 TODO Items Resolved: Complete technical debt remediation
- ~6,300 Lines of Improvements: New features, fixes, and consolidation
- Unified Infrastructure: Consolidated cache and monitoring systems
🚀 New Features
Cross-Platform Upgrade Handlers
Windows Support
- MSI installer via
msiexecwith logging - EXE installer with multiple silent flag strategies (NSIS, Inno Setup, etc.)
- Winget (Windows Package Manager) integration
- Authenticode signature verification via PowerShell
- Administrator privilege detection
- Application restart via
cmd.exe start
Linux Support
- DEB package via
dpkgwithapt-getdependency resolution - RPM package via
dnfwithrpmfallback - AppImage with automatic symlink creation
- Snap integration with refresh/install fallback
- Flatpak with Flathub support
- Homebrew on Linux support
- GPG signature verification for DEB/RPM
- Distribution auto-detection (Ubuntu, Debian, Fedora, RHEL, Arch, etc.)
sudo/pkexecprivilege elevation
macOS Enhancements
- Native menu bar with 7 submenus and keyboard shortcuts
- System tray with status display and quick actions
- App Bundle, PKG, and Homebrew installation
- Code signature verification
CLI Command Completion
- A/B Testing: Full implementation with file-based persistence, traffic splits, variant metrics tracking
- Model Versioning: Export, Import, Validate, Tag, Search, Cleanup commands
- GPU Info: JSON, YAML, CSV output formats
- Audit Statistics: CSV, YAML export formats
- Upgrade Rollback: BackupManager integration with confirmation prompts
Infrastructure Consolidation
- Unified Cache Manager: Consolidated configuration, statistics, and dependency injection
- Unified Monitoring Manager: Prometheus, OpenTelemetry, and Grafana support in one system
- CLI Metrics: Proper metrics collection with
CliCountersandCliMetrics
Security Improvements
- Ed25519 Signature Verification: Cryptographic package verification using
ringcrate - Security Scanning: API key expiration detection, brute force detection
- Trusted Publisher Verification: Marketplace model verification
Model Features
- Quantization: 12 new quantization paths (Q4_0, Q4_1, Q8_0, F16↔F32 conversions)
- Deployment Strategies: FeatureFlag and RollingUpdate deployment support
- Sampling RNG: Proper stochastic sampling with
StdRng(seeded and random modes)
📊 Technical Debt Summary
| Phase | Focus | Status |
|---|---|---|
| Phase 1 | Critical Stubs (RNG, Menu, Batch) | ✅ Complete |
| Phase 2 | Module Consolidation (Cache, Monitoring) | ✅ Complete |
| Phase 3 | CLI Command Completion | ✅ Complete |
| Phase 4 | Backend/Feature Completion | ✅ Complete |
| Phase 5 | Code Quality (Metrics, Security) | ✅ Complete |
| Phase 6 | Low Priority Items (Deployment, Signatures) | ✅ Complete |
| Phase 7 | Cross-Platform Upgrades | ✅ Complete |
📁 Files Changed
- 32 files modified
- 6 new files created
- 1 file deleted (
src/backends/metal.rs- placeholder removed, actual Metal is in GGUF backend)
New Files
src/infrastructure/cache/manager.rs
src/infrastructure/cache/statistics.rs
src/infrastructure/cache/unified_config.rs
src/infrastructure/monitoring/manager.rs
src/infrastructure/monitoring/statistics.rs
src/infrastructure/monitoring/unified_config.rs
Key Modified Files
src/upgrade/windows.rs (66 → 483 lines)
src/upgrade/linux.rs (66 → 761 lines)
src/upgrade/macos.rs (enhanced)
src/metrics/mod.rs (CLI metrics)
src/marketplace.rs (publisher verification)
src/conversion.rs (quantization paths)
src/model_versioning.rs (deployment strategies)
src/upgrade/safety.rs (Ed25519 verification)
🔧 Breaking Changes
None. This release is fully backward compatible.
📋 Related Issues
- #13 - Technical Debt Remediation Complete
- #8 - macOS Native Experience (Closed)
- #6 - Temperature Sampling (Closed)
- #10 - Phase 4 Complete (Closed)
- #12 - Phase 5 Complete (Closed)
🙏 Acknowledgments
This release represents a significant effort to improve code quality, eliminate technical debt, and prepare Inferno for production deployment across all major platforms.
Full Changelog: v0.8.0...v0.9.0
v0.7.0 - Metal GPU Acceleration (13x Speedup)
🚀 Inferno v0.7.0 - Metal GPU Acceleration
🎉 Major Features
⚡ Metal GPU Acceleration for Apple Silicon
Full Metal GPU acceleration delivering production-ready performance on macOS with a 13x speedup!
Performance Metrics
- CPU-only baseline: 15 tok/s
- Metal GPU: 198 tok/s (M4 Max)
- Speedup: 13x improvement 🚀
- GPU offloading: 23/23 layers (100%)
- GPU memory: ~747 MiB
Technical Implementation
- ✅ Production-ready llama-cpp-2 integration
- ✅ Thread-safe Arc-based backend architecture
- ✅ Per-inference LlamaContext creation
- ✅ Greedy sampling for token generation
- ✅ Flash Attention auto-enabled
- ✅ Unified memory architecture support
Compatibility
- ✅ Apple M1/M2/M3/M4 (all variants: base, Pro, Max, Ultra)
- ✅ Metal 3 support (MTLGPUFamilyApple9)
- ✅ All GGUF quantizations (Q4, Q5, Q6, Q8)
- ✅ Automatic GPU detection and enablement
Tested Configuration
- Hardware: Apple M4 Max
- OS: macOS 24.6.0
- Model: TinyLlama-1.1B-Chat-v1.0.Q4_K_M.gguf (638MB)
- Result: 198.1 tok/s average throughput
🔧 Backend Improvements
GGUF Backend
- Real Metal GPU-accelerated inference (no longer placeholder)
- Proper !Send constraint handling with spawn_blocking
- GPU memory management and validation
- Automatic capability detection
- Default GPU enablement on macOS
- Increased default batch size to 512 for better throughput
⚙️ Configuration
Metal GPU is automatically enabled on macOS. To configure:
# .inferno.toml
[backend_config]
gpu_enabled = true # Auto-enabled on macOS
context_size = 2048
batch_size = 512 # Optimized for Metal📚 Documentation
New comprehensive documentation:
METAL_GPU_RESULTS.md: Detailed performance benchmarks and architectureMETAL_GPU_TESTING.md: Testing methodology and guidesQUICK_TEST.md: Quick reference for testingTESTING_STATUS.md: Current testing status- Updated README with Metal GPU capabilities
- Updated CHANGELOG with detailed metrics
🚦 Usage
CLI
# GPU-accelerated inference (default on macOS)
cargo run --release -- run \
--model models/TinyLlama-1.1B-Chat-v1.0.Q4_K_M.gguf \
--prompt "Explain quantum computing"
# Expected: ~198 tok/s on M4 MaxDesktop App
cd dashboard
npm run tauri dev
# Metal GPU automatically enabled
# GPU status visible in System Info panel🧹 Repository Improvements
- Added Claude Code directories to .gitignore
- Excluded test scripts from repository
- Improved repository organization
📊 Performance Comparison
| Configuration | Throughput | Speedup |
|---|---|---|
| CPU Only (M4 Max) | 15 tok/s | 1x (baseline) |
| Metal GPU (M4 Max) | 198 tok/s | 13x 🚀 |
🔗 References
📦 Installation
macOS Desktop App (Recommended)
Download Inferno.dmg from the releases page and enjoy Metal-accelerated inference!
CLI Tools
# Homebrew
brew install ringo380/tap/inferno
# Or build from source
git clone https://github.com/ringo380/inferno.git
cd inferno
cargo build --release🙏 Credits
Metal GPU implementation powered by:
- llama.cpp by Georgi Gerganov
- llama-cpp-2 Rust bindings by utilityai
- Metal Performance Shaders by Apple
Full Changelog: v0.6.1...v0.7.0
Inferno v0.6.1 - Code Quality & Repository Optimization
🎉 Highlights
This maintenance release focuses on code quality, repository optimization, and Phase 3 architectural improvements.
🚀 Code Quality & Refactoring
- Function Signature Simplification: Reduced complexity across multiple modules
convert.rs: 22 args → 4 argsdeployment.rs: 12 args → 2 argsmarketplace.rs: 30 args → 4 argsmultimodal.rs,model_versioning.rs,qa_framework.rs: Significant reductions
- Error Handling: Boxed large InfernoError variants to reduce enum size
- Thread Safety: Fixed MetricsCollector Arc Send+Sync issues
- Memory Management: Enhanced MemoryPool Send/Sync implementation
🧹 Repository Optimization
- Disk Space Reduction: 30GB → 2.1GB (93% reduction, 27.9GB saved)
- Cleaned Rust build artifacts (16.8GB)
- Cleaned Tauri build artifacts (12.6GB)
- Removed node_modules and build outputs (785MB)
- Deleted test models and obsolete directories (95MB)
- Improved .gitignore: Added missing entries for gen/, test directories, build outputs
📚 Documentation
- Phase 3 Tracking: Complete documentation for Week 1 (High-Impact Fixes)
- Arc Audit: Comprehensive Send+Sync audit documentation
- Error Optimization: Documented error enum size reduction strategy
🔧 Developer Experience
- Automated clippy fixes applied across codebase
- Cleanup of unused variables and imports
- Enhanced code maintainability and readability
📊 Statistics
- 37 commits since v0.6.0
- 137 files changed in repository cleanup
- +2,998 insertions, -1,314 deletions
🔗 Links
Inferno v0.6.0 - Major CLI Architecture Migration
Inferno v0.6.0 - Major CLI Architecture Migration
🎯 Overview
This release represents a complete migration of the Inferno CLI to a modern, modular v2 architecture. All 46+ CLI commands have been reorganized into logical feature groups with improved error handling, consistency, and maintainability.
✨ Major Features
Complete CLI v2 Migration (56 commits)
- Backup & Recovery v2: 7 commands with enhanced reliability
- Performance Optimization v2: 6 commands for fine-tuned performance
- Performance Benchmark v2: 5 commands for comprehensive testing
- QA Framework v2: 5 commands for quality assurance
- Deployment v2: 5 commands for streamlined deployments
Migrated Commands (35+ commands)
All major command groups migrated to v2 architecture:
- ✅ Multimodal, Optimization, Dashboard
- ✅ Logging & Audit, Advanced Monitoring
- ✅ Advanced Cache, Multi-tenancy
- ✅ API Gateway, Model Versioning
- ✅ Federated Learning, Marketplace
- ✅ Package Management, Data Pipeline
- ✅ Batch Queue, Server (API)
- ✅ Security, Observability
- ✅ Monitoring, Distributed Inference
- ✅ Auto-upgrade, Versioning
- ✅ Resilience, Response Cache
- ✅ Help & Documentation
🏗️ Architecture Improvements
Modular Structure
Commands are now organized into 6 main categories:
- Core Platform: config, backends, models, io, security
- Infrastructure: cache, monitoring, observability, metrics, audit
- Operations: batch, deployment, backup, upgrade, resilience, versioning
- AI Features: conversion, optimization, multimodal, streaming, gpu
- Enterprise: distributed, multi-tenancy, federated, marketplace, api_gateway, data_pipeline, qa_framework
- Interfaces: cli, api, tui, dashboard, desktop
Enhanced Error Handling
- Consistent error types across all commands
- Better error messages with actionable suggestions
- Graceful degradation and fallback mechanisms
Better Maintainability
- Reduced code duplication
- Clear separation of concerns
- Improved testability
- Standardized command patterns
📦 What's Included
Command Categories
- 46+ CLI commands across all feature areas
- Enterprise features: distributed inference, multi-tenancy, federated learning
- Operations tools: batch processing, deployment automation, backup/recovery
- Developer tools: benchmarking, profiling, QA framework
- Integration features: API gateway, model marketplace, data pipelines
Backward Compatibility
- All existing commands maintain their interfaces
- Configuration files are forward-compatible
- Gradual migration path for custom integrations
🚀 Getting Started
# Install/upgrade Inferno
cargo install inferno
# Explore new features
inferno help
inferno backup-recovery-v2 --help
inferno performance-optimization-v2 --help
inferno qa-framework-v2 --help📊 Stats
- 56 commits of carefully organized changes
- 35+ commands fully migrated to v2 architecture
- 7 new command groups added
- Zero breaking changes to existing APIs
🔜 What's Next (v0.7.0)
- Enhanced desktop app features
- GPU acceleration improvements
- Additional enterprise integrations
- Performance optimizations
For detailed migration guides and documentation, visit the Inferno documentation.