Skip to content

Python ETL pipeline to parse AWS CloudTrail logs and perform SQL-based anomaly detections.

Notifications You must be signed in to change notification settings

yonalsera/cloud-security-pipeline

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

CloudTrail ETL & Detection Pipeline

Author: Yonal Serasinghe

Overview

This project showcases a manual ETL pipline for AWS CloudTrail logs using Pythong and SQLite. The pipeline parses raw JSON logs and transforms them into tables. SQLite is then used to detect anomalies. The overall goal is to effectively and efficiently identify unusual activity within a cloud enviornment.

Structure

cloud_security_pipeline/ | |- data/ | |_ cloudtrail_logs.json # Sample CloudTrail log file
|
|- db/ | |_ cloudtrail.db # SQLite database | |- etl/ | |- parse_cloudtrail.py # Parses JSON -> CSV | |- load_to_sql.py # Loads CSV -> SQLite | |_ detections.py # Runs SQL detection queries | |_ README.md

Features

Note these command prompts are based on Windows, adjust accordingly

  • Parses raw CloudTrail JSON logs into a clean CSV format.
  • Loads structured data into a SQLite database.
  • Detects data anomlies using SQL queries:
    • API call rate spikes
    • Activity from unusual regions
    • Sensitive or high-risk API calls
    • Sudden spikes in event metrics
  • Scripts seperated into parsing, loading, and detecting.

How to Run

  1. Necessary Installations:
  • install pandas
  1. Place CloudTrail logs in JSON form into the data/ folder.
  2. Parse logs into CSV: py etl/parse_cloudtrail.py
  3. Load CSV into SQLite database: py etl/load_to_sql.py
  4. Run detection queries: py etl/detections.py

Tech Stack

  • Python
  • pandas
  • SQLite
  • SQL
  • AWS CloudTrail

Possible Improvements and Considerations

  • Automate input of logs
  • Allow for file types other than JSON
  • Integrate with AWS S3 buckets for continuous updates
  • Build dashboard for visualization of anomalies

About

Python ETL pipeline to parse AWS CloudTrail logs and perform SQL-based anomaly detections.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages