This project demonstrates an end-to-end solution for processing, analyzing, and predicting financial transaction patterns using AWS services and big data tools. The pipeline includes data ingestion, validation, transformation, machine learning, and visualization.
- Data Ingestion: Raw financial data is ingested and stored securely in AWS S3.
- Data Processing: PySpark and Spark SQL are used for transformations and feature engineering.
- Machine Learning: Built a fraud detection model using AWS SageMaker Autopilot.
- Visualization: Interactive dashboards created with AWS QuickSight.
- Cloud Services: AWS S3, SageMaker, SNS, QuickSight, EC2.
- Big Data Tools: PySpark, Spark SQL.
- Programming Language: Python.
- Visualization: AWS QuickSight
- Workflow Automation: AWS Step Functions.
Link to dataset - https://www.kaggle.com/datasets/ealaxi/paysim1
- Fraud Detection F1 Score: 72.7%.
- Monthly revenue trends show seasonal peaks in spending.
- Majority of revenue comes from high-value customers in specific regions.
- Class Imbalance: Addressed through oversampling and class weighting techniques.
- Data Privacy: Ensured encryption and anonymization of PII.
- Real-Time Processing: Implemented AWS Kinesis for streaming use cases.
- Add support for real-time fraud detection pipelines with AWS Kinesis.
- Explore advanced hyperparameter tuning methods for the SageMaker model.
- Incorporate more explainable AI tools like SHAP for better interpretability.

