Github: https://github.com/JasonZe41/Stock_Dashboard
This project implements a Lambda architecture for stock data analysis, combining batch processing of historical data with real-time processing of current market data.
- Data Source: Historical stock data from 1999-2022
- Storage: Apache Hive for data warehousing
- Processing:
- Raw stock data imported into Hive tables
- Stock metrics calculated and stored in a dedicated metrics table
- Data mapped to HBase for quick access
- Data Source: Real-time stock data from Polygon API (post-2022)
- Processing:
- Real-time data fetched based on user queries
- Data processed through Kafka messaging system
- StreamStocks consumer processes messages and updates HBase
- Node.js frontend application
- Allows users to query stock metrics by symbol and date
- Automatically routes to batch or speed layer based on date
- SSH into the cluster:
ssh -i /Users/jasonze/.ssh/id_ed25519 -C2qTnNf -D 9876 sshuser@hbase-mpcs53014-2024-ssh.azurehdinsight.net- Navigate to the application directory:
cd home/sshuser/yanze41/app3- Install dependencies and start the server:
npm install
node app.js 3041 http://10.0.0.26:8090 $KAFKABROKERS-
SSH into the cluster (use the same command as above)
-
Navigate to the target directory:
cd home/sshuser/yanze41/target- Submit the Spark job:
spark-submit \
--driver-java-options "-Dlog4j.configuration=file:///home/sshuser/ss.log4j.properties" \
--class StreamStocks \
uber-speedLayerKafka-1.0-SNAPSHOT.jar \
$KAFKABROKERS- Access the web interface
- Enter a stock symbol (e.g., AAPL, GOOGL)
- Select a date:
- For dates between 1999-2022: Data served from batch layer (HBase)
- For dates after 2022: Real-time data fetched from Polygon API through speed layer
- User submits query through frontend
- System checks date:
- Historical data: Retrieved directly from HBase
- Recent data: Fetched from Polygon API, processed through Kafka, and stored in HBase
- Results displayed to user with calculated metrics


