A high-performance distributed data processing framework for real-time multi-source stream analytics and transformation. Process, correlate, and analyze massive volumes of heterogeneous data streams with automatic scaling and fault tolerance.
- Multi-protocol stream ingestion with custom adapters (Kafka, HTTP, TCP, WebSocket)
- Dynamic query engine with SQL-like syntax for real-time transformations
- Auto-scaling cluster management with intelligent load balancing
- Built-in fault tolerance and state recovery
- Pluggable serialization formats (JSON, Avro, Protobuf)
pip install -r requirements.txtFor development:
pip install -e .# Start coordinator node
python -m stream_fusion.cluster.coordinator --port 8080
# Start worker nodes
python -m stream_fusion.cluster.worker --coordinator localhost:8080 --port 8081from stream_fusion import StreamProcessor, KafkaSource, HTTPSink
# Create processor with SQL-like transformations
processor = StreamProcessor()
# Add data source
source = KafkaSource(topics=['user_events'], bootstrap_servers='localhost:9092')
processor.add_source('events', source)
# Define transformation query
query = """
SELECT user_id, COUNT(*) as event_count, AVG(value) as avg_value
FROM events
WHERE event_type = 'click'
GROUP BY user_id, TUMBLING_WINDOW(INTERVAL '5' MINUTE)
"""
processor.add_transformation('aggregated_events', query)
# Add output sink
sink = HTTPSink(endpoint='https://api.example.com/events')
processor.add_sink('aggregated_events', sink)
# Start processing
processor.start()python -m stream_fusion.client submit --job my_job.py --cluster localhost:8080- Coordinator: Manages cluster state and job scheduling
- Workers: Execute stream processing tasks with auto-scaling
- Query Engine: SQL-like syntax for stream transformations
- Adapters: Pluggable connectors for various data sources/sinks
MIT License