This project showcases a manual ETL pipline for AWS CloudTrail logs using Pythong and SQLite. The pipeline parses raw JSON logs and transforms them into tables. SQLite is then used to detect anomalies. The overall goal is to effectively and efficiently identify unusual activity within a cloud enviornment.
cloud_security_pipeline/
|
|- data/
| |_ cloudtrail_logs.json # Sample CloudTrail log file
|
|- db/
| |_ cloudtrail.db # SQLite database
|
|- etl/
| |- parse_cloudtrail.py # Parses JSON -> CSV
| |- load_to_sql.py # Loads CSV -> SQLite
| |_ detections.py # Runs SQL detection queries
|
|_ README.md
- Parses raw CloudTrail JSON logs into a clean CSV format.
- Loads structured data into a SQLite database.
- Detects data anomlies using SQL queries:
- API call rate spikes
- Activity from unusual regions
- Sensitive or high-risk API calls
- Sudden spikes in event metrics
- Scripts seperated into parsing, loading, and detecting.
- Necessary Installations:
- install pandas
- Place CloudTrail logs in JSON form into the data/ folder.
- Parse logs into CSV: py etl/parse_cloudtrail.py
- Load CSV into SQLite database: py etl/load_to_sql.py
- Run detection queries: py etl/detections.py
- Python
- pandas
- SQLite
- SQL
- AWS CloudTrail
- Automate input of logs
- Allow for file types other than JSON
- Integrate with AWS S3 buckets for continuous updates
- Build dashboard for visualization of anomalies