Skip to content

Latest commit

 

History

History
59 lines (37 loc) · 1.7 KB

File metadata and controls

59 lines (37 loc) · 1.7 KB

denue-challenge

Project to extract, clean and ingest data from INEGI - Directorio Estadístico Nacional de Unidades Económicas (DENUE) to BigQuery (monthly). Data pipeline developed in Python, orchestrated in Airflow and implemented in Docker.

Getting Started

Installation

Clone the repository.

git clone https://github.com/xduarde/denue-challenge.git
cd denue-challenge

Usage

The data pipeline is deployed from a Docker container, through an Airflow image. Moreover, the INEGI - DENUE data model is built and desinged in a data warehouse (BigQuery).

The follow command deploys the environment and initialize the pipeline:

docker compose up --build
  1. Deploy the Airflow container
  2. Configure GCP (BigQuery) connection
  3. Trigger the ingest_denue_data Dag
    • Validate datasets.
    • Extract data from INEGI API by state.
    • Clean data.
    • Validate tables.
    • Insert data to staging table.
    • Execute queries to populate model.

alt text

In order to monitor the ingest_denue_data, access the Airflow Web Server in:

http://localhost:8080/

Extras:

Data Model

alt text

License

Distributed under the MIT License. See LICENSE for more information.