This is a beginner-friendly ETL pipeline project using Apache Airflow with Docker Compose. It reads a CSV file, performs simple transformations using pandas, and writes the output to another CSV file.
- Apache Airflow (via Docker Compose)
- Python
- pandas
- Postgres (as Airflow metadata DB)
- Redis (for scheduler/message queue)
.
├── dags/ # Airflow DAGs
│ └── etl_csv_pipeline.py
├── data/ # Input and output CSVs
│ ├── raw/customers.csv
│ └── processed/processed_customers.csv
├── docker-compose.yaml # Airflow setup
├── .env # Airflow environment config
├── logs/ # Airflow logs
└── plugins/ # (empty, for future custom plugins)
-
Clone the repo and navigate to the directory:
git clone https://github.com/yourusername/airflow_etl_project.git cd airflow_etl_project -
Start the Airflow containers:
docker compose up airflow-init docker compose up
-
Visit http://localhost:8080
Login with:- Username:
airflow - Password:
airflow
- Username:
-
Trigger the DAG
etl_csv_pipelineto run the ETL job.
After successful DAG execution:
- You’ll find a
processed_customers.csvfile with cleaned and deduplicated customer records indata/processed.
This project is open-source and free to use.