Skip to content

xduarde/denue-challenge

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

denue-challenge

Project to extract, clean and ingest data from INEGI - Directorio Estadístico Nacional de Unidades Económicas (DENUE) to BigQuery (monthly). Data pipeline developed in Python, orchestrated in Airflow and implemented in Docker.

Getting Started

Installation

Clone the repository.

git clone https://github.com/xduarde/denue-challenge.git
cd denue-challenge

Usage

The data pipeline is deployed from a Docker container, through an Airflow image. Moreover, the INEGI - DENUE data model is built and desinged in a data warehouse (BigQuery).

The follow command deploys the environment and initialize the pipeline:

docker compose up --build
  1. Deploy the Airflow container
  2. Configure GCP (BigQuery) connection
  3. Trigger the ingest_denue_data Dag
    • Validate datasets.
    • Extract data from INEGI API by state.
    • Clean data.
    • Validate tables.
    • Insert data to staging table.
    • Execute queries to populate model.

alt text

In order to monitor the ingest_denue_data, access the Airflow Web Server in:

http://localhost:8080/

Extras:

Data Model

alt text

License

Distributed under the MIT License. See LICENSE for more information.

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors