Skip to content

frauikram/airflow_etl_project

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Apache Airflow ETL Project

This is a beginner-friendly ETL pipeline project using Apache Airflow with Docker Compose. It reads a CSV file, performs simple transformations using pandas, and writes the output to another CSV file.

🛠️ Project Stack

  • Apache Airflow (via Docker Compose)
  • Python
  • pandas
  • Postgres (as Airflow metadata DB)
  • Redis (for scheduler/message queue)

📁 Project Structure

.
├── dags/                  # Airflow DAGs
│   └── etl_csv_pipeline.py
├── data/                  # Input and output CSVs
│   ├── raw/customers.csv
│   └── processed/processed_customers.csv
├── docker-compose.yaml    # Airflow setup
├── .env                   # Airflow environment config
├── logs/                  # Airflow logs
└── plugins/               # (empty, for future custom plugins)

▶️ How to Run

  1. Clone the repo and navigate to the directory:

    git clone https://github.com/yourusername/airflow_etl_project.git
    cd airflow_etl_project
  2. Start the Airflow containers:

    docker compose up airflow-init
    docker compose up
  3. Visit http://localhost:8080
    Login with:

    • Username: airflow
    • Password: airflow
  4. Trigger the DAG etl_csv_pipeline to run the ETL job.

✅ Output

After successful DAG execution:

  • You’ll find a processed_customers.csv file with cleaned and deduplicated customer records in data/processed.

📄 License

This project is open-source and free to use.

About

A portfolio ETL pipeline using Apache Airflow to ingest, transform, and store data into a DB.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages