Project to extract, clean and ingest data from INEGI - Directorio Estadístico Nacional de Unidades Económicas (DENUE) to BigQuery (monthly). Data pipeline developed in Python, orchestrated in Airflow and implemented in Docker.
Clone the repository.
git clone https://github.com/xduarde/denue-challenge.gitcd denue-challengeThe data pipeline is deployed from a Docker container, through an Airflow image. Moreover, the INEGI - DENUE data model is built and desinged in a data warehouse (BigQuery).
The follow command deploys the environment and initialize the pipeline:
docker compose up --build- Deploy the Airflow container
- Configure GCP (BigQuery) connection
- Trigger the ingest_denue_data Dag
- Validate datasets.
- Extract data from INEGI API by state.
- Clean data.
- Validate tables.
- Insert data to staging table.
- Execute queries to populate model.
In order to monitor the ingest_denue_data, access the Airflow Web Server in:
Extras:
- BigQuery Dataplex tool to create and execute data qualiy tasks.
- BigQuery BI Engine to create a relation to Data Studio.
- Data Studio to exploit data.
Distributed under the MIT License. See LICENSE for more information.

