Skip to content

High-performance multimodal AI and ML research framework in Rust - A comprehensive toolkit for diffusion models, EEG analysis, multisensorial processing, and real-time neurofeedback systems.

Notifications You must be signed in to change notification settings

compiling-org/stream-diffusion-rs

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

15 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Stream Diffusion RS 🧠⚡

A comprehensive toolkit for diffusion models, EEG analysis, multisensorial processing, and real-time neurofeedback systems.

⚠️ WORK IN PROGRESS - Active Development

Current Status & Recent Updates

Completed Features

  • EEG Processing: Real-time brain wave analysis with frequency band extraction
  • Diffusion Models: UNet architecture with DDIM scheduler implementation
  • ONNX Integration: Model conversion and inference with hardware acceleration
  • Web Interface: REST API and interactive dashboard for experiments
  • Visualization: Plotting engine for EEG data and training curves
  • Training Framework: Multiple optimizers and loss functions
  • Basic NUWE Features: Lightweight node-based processing for standalone use
  • Web Server: Functional HTTP server with Gradio-like UI at http://127.0.0.1:3000
  • Compilation Fixed: All syntax errors resolved, project builds successfully
  • Gesture Control: Real-time pose estimation and gesture recognition with camera input
  • Fractal Shaders: Real-time fractal visualization with WebGL shaders and presets

🔄 In Development

  • MCP Server Integration: Model Context Protocol servers for AI-brain interfaces
  • Gesture Tracking: MediaPipe and LeapMotion integration for gesture control
  • Multimodal Fusion: Cross-sensory data integration and synthesis
  • Real-time Streaming: Sub-10ms latency optimization for neurofeedback
  • BCI Integration: Brain-computer interface protocols and hardware support
  • UI Analysis & Fixing Tools: Automated tools for assessing and implementing missing UI functionalities

🚧 Known Issues

  • Missing Dependencies: Additional crates needed for full functionality
  • Integration Testing: Cross-module compatibility needs verification
  • Performance Optimization: GPU acceleration and memory management tuning
  • Model Loading: ONNX model loading requires actual model files

📈 Next Development Phase

  1. Complete MCP Integration: Finish BCI and scientific computing server setup
  2. Gesture Control: Implement full gesture-EEG correlation system
  3. Performance Testing: Benchmark real-time processing capabilities
  4. Documentation: Update API docs and usage examples
  5. Model Loading: Add support for loading actual ONNX models
  6. UI Enhancement: Implement missing UI functionalities using new analysis tools

🌟 Vision: Multimodal & Multisensorial AI

Combining StreamDiffusion with multimodal and multisensorial AI describes an emerging field of high-speed, interactive generative AI that can process and create content using multiple senses at once. While StreamDiffusion focuses on the real-time processing of images and video, combining it with multimodal inputs (e.g., text, images, audio) and multisensorial AI (tactile, thermal, EEG, biometric) creates more contextually rich and responsive AI systems.

Stream Diffusion RS extends this vision by providing:

  • Multimodal Fusion: Text, image, audio, and biometric data integration
  • Multisensorial Processing: EEG, tactile, thermal, and physiological signals
  • Real-time Streaming: Sub-10ms latency across multiple data dimensions
  • Neuro-Emotive AI: Brain-computer interfaces with emotional intelligence
  • Cross-Modal Generation: Converting between different sensory modalities

Rust License: MIT

🚀 Features

🌈 Multimodal AI Core

  • Cross-Modal Fusion: Text, image, audio, and biometric data integration
  • Multisensorial Processing: EEG, tactile, thermal, physiological signal analysis
  • Real-time Streaming: Sub-10ms latency across multiple sensory dimensions
  • Neuro-Emotive Intelligence: Brain-computer interfaces with emotional context

🤖 Diffusion Models

  • UNet Architecture: Complete implementation with attention blocks, resnet blocks, and time embeddings
  • DDIM Scheduler: Advanced denoising diffusion implicit models for fast inference
  • Text-to-Image: CLIP text encoding integration
  • Streaming Support: Real-time image generation with progress callbacks
  • Multimodal Generation: EEG-to-visual, audio-to-image cross-modal synthesis

🔄 ONNX Integration

  • Model Conversion: PyTorch, TensorFlow, JAX, and HuggingFace model support
  • ONNX Runtime: High-performance inference with hardware acceleration
  • Model Registry: Management system for multiple models
  • Burn Compatibility: Seamless integration with Burn tensor operations

🎭 Performance Control

  • Gesture Control: Real-time motion capture for parameter modulation
  • EEG Integration: Brain wave analysis for interactive control
  • Audio-Reactive: Spectrum analysis and beat detection
  • Real-time Processing: Low-latency parameter mapping
  • Multimodal Control: Combined gesture and EEG input processing

🎨 Creative Tools

  • Interactive Generation: Real-time parameter control and feedback
  • Style Transfer: Dynamic visual style modulation
  • Composition Tools: Multi-layer image synthesis and blending
  • Export Options: High-resolution image and video output
  • Fractal Shader Renderer: Real-time fractal visualization with WebGL shaders
    • Mandelbrot, Julia, and Burning Ship fractal types
    • Customizable parameters (iterations, zoom, offset, rotation, colors)
    • Animation support with time-based effects
    • Preset configurations for different creative styles

🎨 Visualization

  • Plotting Engine: Line plots, scatter plots, histograms, confusion matrices
  • EEG Visualizations: Signal plots, topographic maps, spectrograms
  • Training Curves: Loss and accuracy monitoring over epochs
  • Real-time Dashboard: Live experiment monitoring

🎛️ Control Interface

  • Parameter Mapping: Intuitive control over generation parameters
  • Preset System: Save and recall favorite settings
  • Automation: External control via OSC, MIDI, and API
  • Feedback Loop: Real-time visual feedback during generation

🌐 Web Interface

  • Gradio-like UI: Modern tabbed interface with JavaScript
  • REST API: Full API for image generation, EEG analysis, training
  • Interactive Features: Real-time plotting, model management
  • File Upload: Support for EEG data and model files
  • Functional Server: Running at http://127.0.0.1:3000 with complete UI

🔧 UI Analysis & Fixing Tools

  • UI Analyzer: Automated assessment of UI component implementation status
  • Advanced UI Analyzer: Deep code inspection to identify missing functionalities
  • UI Fixer: Automated implementation of missing UI features
  • Comprehensive Reporting: Detailed analysis and fix reports in multiple formats

🏗️ Architecture

graph TD
    A[User Interface] --> B[Web Server]
    B --> C[Synesthetic Framework]
    C --> D[Diffusion Models]
    C --> E[EEG Processing]
    C --> F[ONNX Runtime]
    C --> G[3D Model Generation]
    C --> H[Audio Synthesis]
    D --> I[Model Inference]
    E --> J[Signal Analysis]
    F --> K[Hardware Acceleration]
    G --> L[Geometry Creation]
    H --> M[Waveform Generation]
    
    style A fill:#4CAF50,stroke:#388E3C
    style B fill:#2196F3,stroke:#0D47A1
    style C fill:#9C27B0,stroke:#4A148C
    style D fill:#FF9800,stroke:#E65100
    style E fill:#009688,stroke:#004D40
    style F fill:#FF5722,stroke:#BF360C
    style G fill:#795548,stroke:#3E2723
    style H fill:#607D8B,stroke:#263238

Complete System Integration

graph LR
    A[External Inputs] --> B[Input Processing]
    B --> C[Feature Extraction]
    C --> D[Data Fusion]
    D --> E[Cross-Modal Mapping]
    E --> F[Action Generation]
    F --> G[Output Systems]
    G --> H[User Feedback]
    H --> A
    
    A1[User Interface] --> A
    A2[Hardware Sensors] --> A
    A3[File Inputs] --> A
    A4[Network Data] --> A
    
    B1[Web Server] --> B
    B2[Sensory Connectors] --> B
    B3[Data Preprocessing] --> B
    
    C1[Signal Analysis] --> C
    C2[Image Processing] --> C
    C3[Audio Analysis] --> C
    C4[EEG Feature Extraction] --> C
    
    D1[Temporal Fusion] --> D
    D2[Spatial Fusion] --> D
    D3[Semantic Fusion] --> D
    D4[Contextual Fusion] --> D
    
    E1[Gesture → Visual] --> E
    E2[Audio → Visual] --> E
    E3[EEG → Audio] --> E
    E4[Visual → Audio] --> E
    E5[EEG → Visual] --> E
    E6[Audio → Haptic] --> E
    
    F1[Visual Actions] --> F
    F2[Audio Actions] --> F
    F3[3D Model Actions] --> F
    F4[Haptic Actions] --> F
    F5[EEG Feedback] --> F
    
    G1[Display Output] --> G
    G2[Audio Output] --> G
    G3[Haptic Output] --> G
    G4[File Output] --> G
    G5[Network Output] --> G
    
    style A fill:#4CAF50,stroke:#388E3C
    style B fill:#2196F3,stroke:#0D47A1
    style C fill:#FF9800,stroke:#E65100
    style D fill:#9C27B0,stroke:#4A148C
    style E fill:#4CAF50,stroke:#388E3C
    style F fill:#FF5722,stroke:#BF360C
    style G fill:#009688,stroke:#004D40
    style H fill:#795548,stroke:#3E2723

Real-time Multimodal Processing Flow

sequenceDiagram
    participant User as User Interface
    participant Web as Web Server
    participant Framework as Synesthetic Framework
    participant Diffusion as Diffusion Models
    participant EEG as EEG Processing
    participant ONNX as ONNX Runtime
    participant Output as Output Systems
    
    User->>Web: User Interaction
    Web->>Framework: Process Request
    Framework->>Framework: Feature Extraction
    Framework->>EEG: Analyze EEG Data
    EEG->>Framework: Return Band Powers
    Framework->>Diffusion: Generate Image
    Diffusion->>ONNX: Execute Model
    ONNX->>Diffusion: Return Results
    Diffusion->>Framework: Generated Image
    Framework->>Output: Send Results
    Output->>User: Display Results
stream-diffusion-rs/
├── src/
│   ├── lib.rs              # Main library interface
│   ├── diffusion.rs        # Diffusion model implementation
│   ├── onnx.rs            # ONNX model integration
│   ├── ml.rs              # ML research utilities
│   ├── eeg.rs             # EEG analysis and processing
│   ├── visualization.rs   # Data visualization tools
│   ├── training.rs        # Model training framework
│   ├── web.rs             # Web interface and API
│   ├── ui_analyzer.rs     # UI analysis tools
│   ├── advanced_ui_analyzer.rs # Advanced UI analysis
│   ├── ui_fixer.rs        # UI implementation fixer
│   └── bin/               # Command-line tools
│       ├── ui_analyzer.rs
│       ├── advanced_ui_analyzer.rs
│       └── ui_fixer.rs
├── examples/
│   ├── basic_usage.rs              # Basic usage example
│   ├── eeg_neurofeedback.rs        # Real-time neurofeedback
│   ├── ui_analysis_and_fixing.rs   # UI analysis and fixing example
│   └── model_training_example.rs   # Custom model training
└── Cargo.toml

System Data Flow Architecture

graph LR
    A[Data Sources] --> B[Input Processing]
    B --> C[Feature Extraction]
    C --> D[Data Fusion Engine]
    D --> E[Cross-Modal Mapping]
    E --> F[Action Generation]
    F --> G[Output Systems]
    
    A1[EEG Sensors] --> A
    A2[Camera Input] --> A
    A3[Audio Input] --> A
    A4[Gesture Tracking] --> A
    A5[File Uploads] --> A
    
    B1[Preprocessing] --> B
    B2[Normalization] --> B
    B3[Filtering] --> B
    
    C1[Signal Analysis] --> C
    C2[Image Processing] --> C
    C3[Audio Feature Extraction] --> C
    
    D1[Temporal Fusion] --> D
    D2[Spatial Fusion] --> D
    D3[Semantic Fusion] --> D
    
    E1[EEG → Visual] --> E
    E2[Audio → Visual] --> E
    E3[Gesture → Audio] --> E
    E4[Visual → Audio] --> E
    
    F1[Image Generation] --> F
    F2[Audio Synthesis] --> F
    F3[3D Model Creation] --> F
    
    G1[Display] --> G
    G2[Audio Output] --> G
    G3[File Export] --> G
    
    style A fill:#4CAF50,stroke:#388E3C
    style B fill:#2196F3,stroke:#0D47A1
    style C fill:#FF9800,stroke:#E65100
    style D fill:#9C27B0,stroke:#4A148C
    style E fill:#4CAF50,stroke:#388E3C
    style F fill:#FF5722,stroke:#BF360C
    style G fill:#009688,stroke:#004D40

📦 Installation

Add to your Cargo.toml:

[dependencies]
stream-diffusion-rs = "0.1.0"

Dependencies

[dependencies]
# Core ML
ndarray = "0.15"
ndarray-rand = "0.14"
rand = "0.8"

# ONNX Runtime
ort = "1.16"

# Visualization
plotters = { version = "0.3", default-features = false, features = ["svg_backend", "bitmap_backend"] }

# Web interface
axum = "0.7"
tokio = { version = "1.0", features = ["full"] }
serde = { version = "1.0", features = ["derive"] }

# Audio processing
rodio = "0.17"
hound = "3.5"

# Image processing
image = "0.24"

# Utils
anyhow = "1.0"
thiserror = "1.0"
log = "0.4"
env_logger = "0.10"

🚀 Quick Start

Basic Image Generation

use stream_diffusion_rs::*;

#[tokio::main]
async fn main() -> Result<(), Box<dyn std::error::Error>> {
    env_logger::init();

    // Initialize the engine
    let mut engine = StreamDiffusionRs::new();

    // Load a diffusion model
    engine.load_model("stable-diffusion", "models/sd.onnx")?;

    // Configure streaming parameters
    engine.set_stream_parameters(30.0, 0.8); // 30 FPS, 80% quality

    // Generate an image
    let prompt = "A beautiful sunset over mountains, digital art style";
    let image_data = engine.generate_image(prompt, "stable-diffusion")?;

    println!("Generated image with {} bytes", image_data.len());

    Ok(())
}

EEG Neurofeedback

use stream_diffusion_rs::*;

#[tokio::main]
async fn main() -> Result<(), Box<dyn std::error::Error>> {
    // Initialize EEG processor
    let mut processor = EEGProcessor::new();
    processor.add_filter("bandpass", DigitalFilter::new(FilterType::BandPass, 4, 1.0, 40.0));

    // Initialize visualizer
    let visualizer = EEGVisualizer::new(std::path::Path::new("output"));

    // Simulate real-time EEG processing
    for frame in 0..1000 {
        let eeg_data = generate_simulated_eeg_data();

        // Process EEG data
        let mut processed_data = eeg_data.clone();
        processor.remove_dc_offset(&mut processed_data);

        // Extract frequency bands
        let alpha_power = processor.extract_band_power(&processed_data, FrequencyBand::Alpha)?;
        let beta_power = processor.extract_band_power(&processed_data, FrequencyBand::Beta)?;

        // Calculate focus index
        let focus_index = calculate_focus_index(&beta_power, &alpha_power);

        // Generate feedback visualization
        visualizer.plot_eeg_topography(&values, &electrode_names, &format!("neurofeedback_{}.png", frame))?;
    }

    Ok(())
}

Custom Model Training

use stream_diffusion_rs::*;

// Define your model
struct MyModel {
    weights: ndarray::Array2<f32>,
    biases: ndarray::Array1<f32>,
}

impl TrainableModel for MyModel {
    fn forward(&self, input: &ndarray::Array4<f32>) -> Result<ndarray::Array2<f32>, Box<dyn std::error::Error>> {
        // Implement forward pass
        Ok(input.sum_axis(ndarray::Axis(3)).sum_axis(ndarray::Axis(2)).sum_axis(ndarray::Axis(1)))
    }

    fn get_parameters(&self) -> Vec<&ndarray::Array2<f32>> {
        vec![&self.weights]
    }
}

#[tokio::main]
async fn main() -> Result<(), Box<dyn std::error::Error>> {
    // Create training configuration
    let config = TrainingConfig {
        epochs: 50,
        batch_size: 32,
        learning_rate: 0.001,
        ..Default::default()
    };

    // Initialize trainer
    let mut trainer = ModelTrainer::new(config, Optimizer::Adam, LossFunction::MSE);

    // Train your model
    trainer.train(&mut model, &train_features, &train_labels, Some(&val_features), Some(&val_labels))?;

    Ok(())
}

🌐 Web Interface

Start the web server:

use stream_diffusion_rs::web::*;

#[tokio::main]
async fn main() -> Result<(), Box<dyn std::error::Error>> {
    start_default_server().await
}

Then visit http://localhost:3000 for the interactive interface.

🔧 UI Analysis & Fixing Tools

Stream Diffusion RS now includes powerful tools for analyzing and fixing UI implementations:

UI Analyzer

Analyze the current state of UI components:

cargo run --bin ui_analyzer

Advanced UI Analyzer

Perform deep code inspection to identify missing functionalities:

cargo run --bin advanced_ui_analyzer

UI Fixer

Automatically implement missing UI features:

cargo run --bin ui_fixer

Complete UI Analysis and Fixing Example

Run a comprehensive example that demonstrates the full workflow:

cargo run --example ui_analysis_and_fixing

📊 EEG Analysis Pipeline

use stream_diffusion_rs::*;

// Load EEG data
let eeg_data = EEGData::load_from_file("eeg_data.edf")?;

// Process signals
let mut processor = EEGProcessor::new();
processor.apply_filter(&mut eeg_data, "bandpass");

// Extract features
let alpha_power = processor.extract_band_power(&eeg_data, FrequencyBand::Alpha)?;
let connectivity = processor.compute_connectivity(&eeg_data);

// Convert to audiovisual
let converter = EEGToAudiovisualConverter::new("output");
let audiovisual = converter.convert(&eeg_data)?;

🔧 ONNX Model Integration

use stream_diffusion_rs::*;

// Initialize converter
let converter = OnnxConverter::new()?;

// Convert PyTorch model to ONNX
converter.convert_pytorch_to_onnx(
    "model.pth",
    "model.onnx",
    &[1, 3, 224, 224],
    11
)?;

// Load and run inference
let mut bridge = OnnxBridge::new();
bridge.load_onnx_model("model.onnx", &converter)?;
bridge.convert_to_burn()?;

// Run inference
let inputs = HashMap::new();
// ... populate inputs
let outputs = bridge.run_inference_burn(&input_tensor)?;

🎯 Performance Goals

  • Real-time generation with interactive control
  • High-quality output at various resolutions
  • Efficient processing with GPU acceleration
  • Responsive interface for live performance
  • Cross-platform compatibility for different workflows

🤝 Contributing

Contributions welcome! Areas of interest:

  • Multimodal fusion algorithms for cross-sensory data integration
  • Additional sensor support (thermal, tactile, biometric)
  • Burn framework integration for GPU acceleration
  • Advanced EEG signal processing and artifact removal
  • Real-time audio synthesis and cross-modal generation
  • Web interface enhancements for multisensorial visualization
  • ONNX model optimization for edge devices
  • Neuro-emotive AI models for emotional intelligence
  • UI/UX improvements using the new analysis and fixing tools

📄 License

MIT License - see LICENSE file for details.

🙏 Acknowledgements

  • Built with ❤️ using Rust
  • Inspired by the need for high-performance ML in neuroscience

📊 Project Status Summary

🎯 Core Features - Production Ready

  • ✅ EEG Processing with real-time brain wave analysis
  • ✅ Diffusion Models with UNet architecture and DDIM scheduler
  • ✅ ONNX Integration with model conversion and hardware acceleration
  • ✅ Web Interface with REST API and interactive dashboard
  • ✅ Visualization engine for EEG data and training curves
  • ✅ Training Framework with multiple optimizers and loss functions
  • ✅ Web Server with Gradio-like UI at http://127.0.0.1:3000
  • ✅ Gesture Control with real-time pose estimation
  • ✅ Fractal Shaders with WebGL visualization

🔄 In Development - Active Implementation

  • 🔄 MCP Server Integration for AI-brain interfaces
  • 🔄 Gesture Tracking with MediaPipe and LeapMotion
  • 🔄 Multimodal Fusion for cross-sensory data integration
  • 🔄 Real-time Streaming optimization for neurofeedback
  • 🔄 BCI Integration with brain-computer interface protocols
  • 🔄 UI Analysis & Fixing Tools for automated assessment

🚧 Planned Features - Roadmap

  • 🚧 Advanced AI chat interfaces
  • 🚧 Cloud synchronization capabilities
  • 🚧 Quantum computing visualization
  • 🚧 VR/AR full integration
  • 🚧 Multi-user collaborative sessions

Last Updated: 2025-11-16 | Version: 1.0.0 | Status: Active Development

About

High-performance multimodal AI and ML research framework in Rust - A comprehensive toolkit for diffusion models, EEG analysis, multisensorial processing, and real-time neurofeedback systems.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages