A Python application to validate and store published OC4IDS datasets.
You can open this repository in a dev container to get an environment complete with Postgres database.
If you prefer to use Docker Compose, you can instead run:
docker compose -f docker-compose.dev.yml up -d
docker compose -f docker-compose.dev.yml exec app bash
docker compose -f docker-compose.dev.yml stop
alembic upgrade head
If enabled, the pipeline will upload the files to a DigitalOcean Spaces bucket.
First create the bucket with DigitalOcean.
If doing this via the UI, take the following steps:
- Choose any region
- Enable CDN
- Choose any bucket name
- Click "Create a Spaces Bucket"
After the bucket is created, create an access key in DigitalOcean.
If doing this via the UI, take the following steps:
- Go to your bucket
- Go to settings
- Under "Access Keys" click "Create Access Key"
- Set the access scope to "Limited Access"
- Select your bucket from the list and set "Permissions" to "Read/Write/Delete"
- Choose any name
- Click "Create Access Key"
Securely store the access key ID and secret.
Once you have created the bucket and access key, set the following environment variables for the pipeline:
ENABLE_UPLOAD: 1 to enable, 0 to disableBUCKET_REGION: e.g.fra1BUCKET_NAME: e.g.my-bucketBUCKET_ACCESS_KEY_ID: e.g.access-key-idBUCKET_ACCESS_KEY_SECRET: e.g.access-key-secret
To make this easier, the project uses python-dotenv to load environment variables from a config file.
For local development, create a file called .env.local, which will be used by default.
You can change which file is loaded setting the environment variable APP_ENV.
For example the tests set APP_ENV=test, which loads variables from .env.test.
To send failure notifications by email, the following environment variables must be set:
NOTIFICATIONS_ENABLED: 1 to enable, 0 to disableNOTIFICATIONS_SMTP_HOSTNOTIFICATIONS_SMTP_PORTNOTIFICATIONS_SMTP_SSL_ENABLED: 1 to enable, 0 to disableNOTIFICATIONS_SENDER_EMAILNOTIFICATIONS_RECEIVER_EMAIL
TRANSFORM_MAX_FILE_SIZE- Integer, Bytes. JSON files over this size will not be transformed to CSV and Excel.
oc4ids-datastore-pipeline
From inside the dev container or Docker container:
psql postgresql://postgres:postgres@localhost:5432/postgres
Connecting from outside:
- If using a dev container or Docker Compose locally the same command should work
- In GitHub Codespaces, we're not sure how to access the port
black oc4ids_datastore_pipeline/ tests/
isort oc4ids_datastore_pipeline/ tests/
flake8 oc4ids_datastore_pipeline/ tests/
mypy oc4ids_datastore_pipeline/ tests/
pytest
alembic revision --autogenerate -m "<MESSAGE HERE>"
To publish a new version, raise a PR to main updating the version in pyproject.toml. Once merged, create a git tag and GitHub release for the new version, with naming vX.Y.Z. This will trigger a docker image to to be built and pushed, tagged with the version and latest.