The AMS Background Tasks is a set of tools designed to create and update the database of the Amazon Situation Room (AMS). The execution of these tools is managed by Airflow.
In short, Airflow is a platform created by the community to programmatically author, schedule, and monitor workflows or DAGs. In Airflow, a DAG (Directed Acyclic Graph) is a collection of tasks that you want to run, organized in a way that reflects their relationships and dependencies. A DAG is defined in a Python script, which represents the DAG's structure (tasks and their dependencies) as code.
The DAG ams-create-db is responsible for creating and updating the AMS database. This DAG consists of following tasks:
check-variablesupdate-environmentcheck-recreate-dbcreate-dbupdate-biomeupdate-spatial-unitsupdate-active-firesupdate-amz-deterupdate-cer-deterdownload-risk-fileupdate-ibama-riskprepare-classificationclassify-deter-by-land-useclassify-fires-by-land-usefinalize-classification
Each of these tasks is a Python command-line tool developed using the Click library.
To run the DAG ams-create-db, three external databases are required: one for DETER data (for each biome), another for active fires data, and an auxiliary database.
From the auxiliary database, the following tables are required:
public.lm_bioma_250public.municipio_testpublic.lml_unidade_federacao_acs_amz_5kmcs_amz_5km_biomecs_amz_25kmcs_amz_25km_biomecs_amz_150kmcs_amz_150km_biomecs_cer_5kmcs_cer_5km_biomecs_cer_25kmcs_cer_25km_biomecs_cer_150kmcs_cer_150km_biome
The cell tables (prefixed with cs_), except for the 5km ones, are created by the notebook update_auxiliary.ipynb, which uses data from the existing AMS Database. The tables cs_*_5km*, however, are created by the notebook import_cells_from_shapefile.ipynb, which allows for importing cells into the auxiliary database from a shapefile. If the cell tables are not defined in the auxiliary database, it is necessary to run these notebooks. The shapefile containing the 5km cells is attached to issue #26.
$ jupyter-notebook notebooks/update_auxiliary.ipynbThis DAG is made to run from "DagBag", this means that all the dag files are inside the root folder. Assuming that the Airflow environment is using
To run over an Airflow instance, it's necessary to setup the following airflow configurations:
Setup this connections ids:
AMS_AF_DB_URL(Raw fires database, ex: raw_fires_data)AMS_AUX_DB_URL(Auxiliary database, ex: auxiliary)AMS_AMZ_DETER_B_DB_URL(Deter Amazonia database, ex: deter_amazonia_nb)AMS_CER_DETER_B_DB_URL(Deter Cerrado database, ex: deter_amazonia_nb)AMS_DB_URL(AMS ouput database, ex: ams_new)AMS_FTP_URL(FTP to get the risk file provided by IBAMA)
Example how to setup the connection fields:
- Connection Id: AMS_AF_DB_URL (Id used by DAG)
- Connection Type: Postgres
- Host: 192.168.1.9 (Database host or IP)
- Database: raw_fires_data (Database name)
- Login: ams (Database username)
- Password: ams (Database password)
- Port: 5432 (Database port)
Setup the following variables:
AIRFLOW_UID: see Setting the right Airflow user. Example: AIRFLOW_UID=1000AMS_FORCE_RECREATE_DB: the expected values are 0 or 1. When enabled, it forces the recreation of the AMS database. Example: AMS_FORCE_RECREATE_DB=1AMS_ALL_DATA_DB: the expected values are 0 or 1. When enabled, it updates all data, including historical data. Example: AMS_ALL_DATA_DB=1AMS_BIOMES: a list of biomes separated by semicolons. Example: AMS_BIOMES="Amazônia;Cerrado;".AMS_STAC_API_URL: the STAC API url to retrieve the risk image. Example: AMS_STAC_API_URL=https://terrabrasilis.dpi.inpe.br/stac-api/v1/AMS_STAC_COLLECTION: the STAC collection name. Example: AMS_STAC_COLLECTION=collection1
Additionally, it is necessary to place the land use files in the land_use directory. The naming convention for the files is: {BIOMA}_land_use.tif (e.g., Amazônia_land_use.tif, Cerrado_land_use.tif, and so on).