Modern Data Pipeline with Apache Airflow, Docker, Data Quality Testing, and CI/CD automation

Overview

I’ve started building a personal data platform that collects and processes data from my YouTube channel, LinkedIn, GitHub, and other platforms to better understand engagement, growth, and audience behavior across channels. It’s also a great opportunity to apply modern data engineering tools and practices such as Apache Airflow, Docker, API integrations, functional and data quality testing, and CI/CD automation.

Summary

This ELT (or I`d rather say EtLT) pipeline is orchestrated with Airflow, containerized with Docker, and stores data in PostgreSQL. The process includes:

Retrieve video metadata via the YouTube API.
Store the raw data in a staging schema inside a dockerized PostgreSQL instance.
Transform & Load to reporting tables.
Ensure data quality applying unit tests and data quality checks.
Run tests and build Docker images using GitHub Actions CI/CD workflows.

DAGs

Three Airflow DAGs are defined and triggered sequentially:

produce_json — Extracts YouTube data and saves it as a JSON file.
update_db — Loads and processes the data into staging and core schemas.
data_quality — Runs Soda checks to validate data quality.

CI/CD & Testing

Unit and integration tests ensure pipelines behave as expected.
Data quality is monitored automatically with Soda.
A GitHub Actions workflow builds and pushes Docker images, starts Airflow services, and tests DAG execution.

Useful links for everyone who is interested in grasping these tools:

Youtube API

Docker

Airflow

Tests

Github Actions

This project is inspired by the Data Engineering ELT Pipeline course by @MattTheDataEngineer — a great resource for mastering Airflow + Docker development. 100% recommend!!!

Name		Name	Last commit message	Last commit date
Latest commit History 32 Commits
.github/workflows		.github/workflows
.vscode		.vscode
dags		dags
docker/postgres		docker/postgres
include/soda		include/soda
tests		tests
.gitignore		.gitignore
README.md		README.md
docker-compose.yaml		docker-compose.yaml
dockerfile		dockerfile
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Modern Data Pipeline with Apache Airflow, Docker, Data Quality Testing, and CI/CD automation

Overview

Summary

DAGs

CI/CD & Testing

Useful links for everyone who is interested in grasping these tools:

Youtube API

Docker

Airflow

Tests

Github Actions

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Modern Data Pipeline with Apache Airflow, Docker, Data Quality Testing, and CI/CD automation

Overview

Summary

DAGs

CI/CD & Testing

Useful links for everyone who is interested in grasping these tools:

Youtube API

Docker

Airflow

Tests

Github Actions

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages