A production-ready, end-to-end data pipeline that ingests live market data from Binance, streams changes via PostgreSQL CDC, and delivers real-time insights through Kafka → Cassandra → Grafana.
A real-time cryptocurrency analytics pipeline that ingests live Binance market data, stores it in PostgreSQL, replicates it to Cassandra via Change Data Capture (CDC) using Debezium, and visualizes insights in Grafana.
This setup allows you to track and analyze market metrics such as the top 5 performing cryptocurrencies by 24h gain, candlestick trends, and live price movements — all in real time.
+---------------------+
| Binance API |
| (WebSocket & REST) |
+----------+----------+
|
| JSON Streams (Trades, Tickers, Klines)
v
+---------------------+
| Python Ingest App |
| (asyncio / aiohttp) |
+----------+----------+
|
| INSERT INTO
v
+---------------------+
| PostgreSQL (OLTP) |
+----------+----------+
|
| CDC via Debezium
v
+---------------------+
| Kafka + Connect |
+----------+----------+
|
| Cassandra Sink Connector
v
+---------------------+
| Cassandra DB |
+----------+----------+
|
| SQL Queries
v
+---------------------+
| Grafana |
+---------------------+
| Component | Purpose |
|---|---|
| Binance API | Real-time crypto data source (prices, trades, klines, 24h stats) |
| Python (asyncio) | Data ingestion and transformation |
| PostgreSQL | Structured staging database |
| Debezium + Kafka Connect | Change Data Capture (CDC) replication layer |
| Cassandra | Scalable, time-series optimized datastore |
| Grafana | Real-time visualization dashboards |
| Docker Compose | Container orchestration and local development |
git clone https://github.com/Mutinda-Kioko/CRYPTO_CDC.git
cd CRYPTO_CDCAdjust credentials and endpoints in .env or directly in docker-compose.yml if needed.
docker-compose up -dThis will launch:
- Zookeeper & Kafka
- PostgreSQL
- Debezium Connect
- Cassandra
- Grafana
- PostgreSQL →
localhost:5432 - Kafka Connect API →
http://localhost:8083 - Grafana →
http://localhost:3000(default login:admin/admin) - Cassandra →
localhost:9042
python3 -m venv venv
source venv/bin/activatecd ingestor
pip install -r requirements.txt
python binance_ingest.pyThis will:
- Connect to Binance’s REST endpoints
- Insert real-time market and trade data into PostgreSQL tables
bash scripts/register_debezium_connector.shbash scripts/register_cassandra_sink_connector.sh
Kafka Connect will now stream changes from PostgreSQL into Cassandra in near real-time.
Run the CQL below after Cassandra is running to create the keyspace, primary time-series tables, and a lightweight top5_24h table as well as an example materialized view . To execute these, you can paste them into cqlsh running inside the Cassandra container or use a heredoc.
Run with docker-compose:
docker-compose exec -T cassandra cqlsh -e "SOURCE 'cassandra/init_cassandra.cql';"
Or run the commands directly:
docker exec -it cassandra cqlsh << 'EOF'
USE binance;
-- Create materialized view for top performers (last 1 hour)
DROP MATERIALIZED VIEW IF EXISTS top_performers_1h;
CREATE MATERIALIZED VIEW top_performers_1h AS
SELECT symbol, price_change_percent, timestamp, cdc_timestamp
FROM ticker_24h
WHERE symbol IS NOT NULL
AND timestamp IS NOT NULL
AND price_change_percent IS NOT NULL
AND cdc_timestamp IS NOT NULL
PRIMARY KEY (symbol, timestamp, cdc_timestamp)
WITH CLUSTERING ORDER BY (timestamp DESC, cdc_timestamp DESC);
-- Create table for aggregated latest data (we'll populate this from the consumer)
DROP TABLE IF EXISTS latest_ticker_summary;
CREATE TABLE latest_ticker_summary (
symbol text PRIMARY KEY,
price_change_percent decimal,
high_price decimal,
low_price decimal,
volume decimal,
last_updated timestamp
);
-- Verify
DESCRIBE TABLES;
EOF
- Go to Grafana →
http://localhost:3000 - Add a PostgreSQL datasource:
- Host:
host.docker.internal:5432 - Database:
binance - User:
postgres - Password:
postgres
- Host:
- Import the dashboard:
- Dashboard → Import → Upload
grafana/binance-dashboard.json
- Dashboard → Import → Upload
You’ll now see:
- 📈 Top 5 coins by 24h % gain
- 🕯️ Candlestick charts (klines)
- 💹 Real-time ticker metrics
CRYPTO_CDC/
├── docker-compose.yml
├── cassandra/
│ ├── create_views.cql
│ └── init_cassandra.cql
├── cassandra_sink/
│ ├── kafka_to_cassandra.py
├── grafana/
│ └── dashboards/
| ├── price_trends.json
| └── top5_gainers.json
├── ingestor/
│ ├── ingest_binance.py
│ └── requirements.txt
├── connect_plugins/
├── scripts/
│ ├── register_cassandra_sink_connector.sh
│ └── register_debezium_connector.sh
└── README.md
Check PostgreSQL Data
docker-compose exec postgres psql -U postgres -d binance -c "SELECT * FROM trades LIMIT 10;"Check Kafka Topics
docker-compose exec kafka kafka-topics --list --bootstrap-server kafka:9092Check Cassandra Data
docker-compose exec cassandra cqlsh -e "SELECT * FROM binance.trades_by_symbol_day LIMIT 5;"Check Connectors
curl http://localhost:8083/connectors | jqThe Grafana dashboard includes panels for:
- Top 5 Performing Cryptocurrencies (24h Gain)
- Candlestick (Kline) Chart
- Price & Volume Trend
- Recent Trades Stream
- Market Summary
John Kioko
💼 Data Engineer / Fullstack Developer
🌐 LinkedIn | GitHub
This project is licensed under the MIT License — feel free to fork, modify, and adapt for your own use.
Happy Trading & Streaming! 🚀