Skip to content

A high-performance distributed data processing framework for real-time multi-source stream analytics and transformation

Notifications You must be signed in to change notification settings

not-manuu/stream-fusion

Repository files navigation

Stream Fusion

A high-performance distributed data processing framework for real-time multi-source stream analytics and transformation. Process, correlate, and analyze massive volumes of heterogeneous data streams with automatic scaling and fault tolerance.

Features

  • Multi-protocol stream ingestion with custom adapters (Kafka, HTTP, TCP, WebSocket)
  • Dynamic query engine with SQL-like syntax for real-time transformations
  • Auto-scaling cluster management with intelligent load balancing
  • Built-in fault tolerance and state recovery
  • Pluggable serialization formats (JSON, Avro, Protobuf)

Installation

pip install -r requirements.txt

For development:

pip install -e .

Quick Start

Start a Stream Fusion cluster

# Start coordinator node
python -m stream_fusion.cluster.coordinator --port 8080

# Start worker nodes
python -m stream_fusion.cluster.worker --coordinator localhost:8080 --port 8081

Define a stream processing job

from stream_fusion import StreamProcessor, KafkaSource, HTTPSink

# Create processor with SQL-like transformations
processor = StreamProcessor()

# Add data source
source = KafkaSource(topics=['user_events'], bootstrap_servers='localhost:9092')
processor.add_source('events', source)

# Define transformation query
query = """
SELECT user_id, COUNT(*) as event_count, AVG(value) as avg_value
FROM events
WHERE event_type = 'click'
GROUP BY user_id, TUMBLING_WINDOW(INTERVAL '5' MINUTE)
"""

processor.add_transformation('aggregated_events', query)

# Add output sink
sink = HTTPSink(endpoint='https://api.example.com/events')
processor.add_sink('aggregated_events', sink)

# Start processing
processor.start()

Submit job to cluster

python -m stream_fusion.client submit --job my_job.py --cluster localhost:8080

Architecture

  • Coordinator: Manages cluster state and job scheduling
  • Workers: Execute stream processing tasks with auto-scaling
  • Query Engine: SQL-like syntax for stream transformations
  • Adapters: Pluggable connectors for various data sources/sinks

License

MIT License

About

A high-performance distributed data processing framework for real-time multi-source stream analytics and transformation

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published