Skip to content

NickTimosh/modern_data_pipeline_project

Repository files navigation

Modern Data Pipeline with Apache Airflow, Docker, Data Quality Testing, and CI/CD automation

Overview

I’ve started building a personal data platform that collects and processes data from my YouTube channel, LinkedIn, GitHub, and other platforms to better understand engagement, growth, and audience behavior across channels. It’s also a great opportunity to apply modern data engineering tools and practices such as Apache Airflow, Docker, API integrations, functional and data quality testing, and CI/CD automation.

Untitled design

Summary

This ELT (or I`d rather say EtLT) pipeline is orchestrated with Airflow, containerized with Docker, and stores data in PostgreSQL. The process includes:

  • Retrieve video metadata via the YouTube API.
  • Store the raw data in a staging schema inside a dockerized PostgreSQL instance.
  • Transform & Load to reporting tables.
  • Ensure data quality applying unit tests and data quality checks.
  • Run tests and build Docker images using GitHub Actions CI/CD workflows.

DAGs

Three Airflow DAGs are defined and triggered sequentially:

  • produce_json — Extracts YouTube data and saves it as a JSON file.
  • update_db — Loads and processes the data into staging and core schemas.
  • data_quality — Runs Soda checks to validate data quality.

CI/CD & Testing

  • Unit and integration tests ensure pipelines behave as expected.
  • Data quality is monitored automatically with Soda.
  • A GitHub Actions workflow builds and pushes Docker images, starts Airflow services, and tests DAG execution.

Useful links for everyone who is interested in grasping these tools:

Youtube API

Docker

Airflow

Tests

Github Actions

This project is inspired by the Data Engineering ELT Pipeline course by @MattTheDataEngineer — a great resource for mastering Airflow + Docker development. 100% recommend!!!

About

API integration with Apache Airflow, Docker, Data Quality tests, and CI/CD automation

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors