A collection of notebooks for storing and querying OC4IDS data in a database.
Use the buttons below to open notebooks from the default branch:
| Notebook | Colab link | Description |
|---|---|---|
| Download, Check and Import | Use this notebook to, download data from OC4IDS publishers, check data using the OC4IDS Data Review Tool and import data and check results into the OC4IDS database. | |
| Delete Collections | Use this notebook to delete collections from the database. | |
| Data Feedback | Use this notebook to provide feedback on OC4IDS data. | |
| Indicator Coverage | Use this notebook to calculate the coverage of the OC4IDS Indicators. | |
| CoST IDS Coverage | Use this notebook to calculate the coverage of the CoST Infrastructure Data Standard (IDS), according to the CoST IDS to OC4IDS mapping. | |
| Publication quality criteria, checks and metrics | Use this notebook to check data against the OC4IDS publication criteria, to perform quality checks and to calculate quality metrics. | |
| Publisher status report | Use this notebook to report on the results of checking OC4IDS publications against publication quality criteria, checks and metrics |
To open a notebook from a different branch, use Colab's Github browser to choose the notebook and branch you want to open.
Alternatively, you can use the Open in Colab browser extension (Chrome, Firefox) to add a button that, when clicked when viewing a Jupyter notebook on github, will open that notebook in Colab.
The following diagram shows the relationships between the main tables in the database.
Some tables are omitted from the diagram: those containing reference data used in checks (oc4ids_schema and exchange_rates) and those used to store temporary data as part of the import process (temp_data and temp_checks).
erDiagram
check_results {
character_varying check_id PK
integer collection_id PK,FK
jsonb output
boolean result
character_varying run_id PK
}
collection {
timestamp_without_time_zone data_version
integer id PK
character_varying load_id UK
text source_id UK
}
collection_check {
integer collection_id FK
jsonb cove_output
integer id PK
}
field_counts {
integer array_count
integer collection_id PK,FK
integer distinct_projects
integer object_property
text path PK
ARRAY path_array
}
indicator_coverage {
numeric checks
integer collection_id PK,FK
jsonb fields
character_varying indicator PK
character_varying indicator_source PK
character_varying run_id PK
numeric successes
}
project_fields {
integer collection_id PK,FK
ARRAY paths
text project_id PK
}
projects {
integer collection_id FK
jsonb data
integer id PK
text project_id
}
run_collection {
integer collection_id FK,UK
integer id PK
character_varying run_id UK
}
check_results }o--|| collection : "collection_id"
collection_check }o--|| collection : "collection_id"
field_counts }o--|| collection : "collection_id"
indicator_coverage }o--|| collection : "collection_id"
project_fields }o--|| collection : "collection_id"
projects }o--|| collection : "collection_id"
run_collection }o--|| collection : "collection_id"
The following table lists all tables in the database. For information on the columns in each table, refer to the subheadings below the table.
| Table | Description |
|---|---|
| collection | A collection is an OC4IDS dataset. |
| collection_check | libcoveoc4ids CLI output for each collection, i.e. the results reported by the OC4IDS Data Review Tool. For more information, see https://github.com/open-contracting/lib-cove-oc4ids/ |
| projects | Projects in OC4IDS JSON format. |
| field_counts | Counts of the occurrences of fields in each collection. |
| project_fields | The fields present in each project. |
| check_results | Results of running checks on data quality. For more information, see the quality criteria, checks and metrics notebook in https://github.com/open-contracting/notebooks-oc4ids |
| run_collection | Relationships between collections and check runs. |
| indicator_coverage | Results of running checks on indicator coverage. That is, how often an indicator can be calcuated for a given collection. |
| oc4ids_schema | A copy of the OC4IDS schema in 'mapping-sheet' format. For more information, see https://ocdskit.readthedocs.io/en/latest/cli/schema.html#mapping-sheet |
| exchange_rates | USD-base exchange rates for currency conversion. |
| temp_data | A temporary table used when importing data. |
| temp_checks | A temporary table used for storing the libcoveoc4ids output when importing data. |
| Column | Type | Description |
|---|---|---|
| id | integer | A unique identifier for this collection. |
| source_id | text | The OC4IDS publisher from which the data was collected. |
| data_version | timestamp without time zone | A timestamp of when the data was imported. |
| load_id | character varying | An identifer for a group of collections loaded together. |
| Column | Type | Description |
|---|---|---|
| id | integer | A unique identifier for this output. |
| collection_id | integer | The collection to which this output relates. |
| cove_output | jsonb | The libcoveoc4ids output. Documentation: https://github.com/open-contracting/lib-cove-ocds?tab=readme-ov-file#output-json-format |
| Column | Type | Description |
|---|---|---|
| id | integer | A unique identifier for this project. |
| collection_id | integer | The collection to which this project belongs. |
| project_id | text | The id of the project from the JSON data. |
| data | jsonb | The OC4IDS JSON data that descibes the project. |
| Column | Type | Description |
|---|---|---|
| collection_id | integer | The collection to which this count relates. |
| path | text | The JSON Pointer of the field to which this count relates, excluding array indices. |
| path_array | text[] | The components of the JSON Pointer of the field to which this count relates. |
| object_property | integer | The number of occurrences of the field, across all array entries and all releases. |
| array_count | integer | If the field is an array, the cumulative length of its occurrences, across all array entries and all releases. |
| distinct_projects | integer | The number of distinct projects in which the path appears. |
| Column | Type | Description |
|---|---|---|
| collection_id | integer | The collection to which the project belongs. |
| project_id | text | The id of the project. |
| paths | text[] | The JSON Pointers of the fields present in the project. |
| Column | Type | Description |
|---|---|---|
| run_id | character varying | The run in which this result was calculated. |
| check_id | character varying | The check that this result relates to. |
| result | boolean | The result of the check. TRUE represents success, FALSE represents failure and NULL indicates that no result was calculated. |
| output | jsonb | The output of the check. |
| collection_id | integer | The collection to which this check result relates. |
| Column | Type | Description |
|---|---|---|
| run_id | character varying | A run of data quality checks. |
| collection_id | integer | A collection that was checked in the run. |
| id | integer | A unique identifier for the relationship. |
| Column | Type | Description |
|---|---|---|
| run_id | character varying | The run in which this coverage result was calculated. |
| collection_id | integer | The collection to which this coverage result relates. |
| indicator_source | character varying | The source from which the indicator was drawn. |
| indicator | character varying | The indicator to which this coverage result relates. |
| fields | jsonb | The fields needed to calculate the indicator. |
| successes | numeric | A count of objects for which the indicator can be calculated. |
| checks | numeric | The number of checks performed. |
| Column | Type | Description |
|---|---|---|
| section | text | The first part of the JSON path to the field in the data, e.g. budget. |
| path | text | The JSON path to the field in the data, e.g. budget/description. |
| title | text | The field’s title in the JSON schema. If the field has no title, defaults to the field’s name followed by “*”. |
| description | text | The field’s description in the JSON schema. URLs are removed (see the links column). |
| type | text | A comma-separated list of the field’s type in the JSON schema, excluding “null”. If the field has no type, defaults to “unknown”. |
| range | text | The field’s allowed number of occurrences: “0..1” if the field defines an optional literal value, “0..n” if the field defines an optional array, “1..1” if the field defines a required literal value, or “1..n” if the field defines a required array. |
| values | text | If the field’s schema sets: format - the format; pattern - the pattern; enum - “Enum: “ followed by the enum as a comma-separated list, excluding null; items/enum - “Enum: “ followed by the items/enum as a comma-separated list, excluding null. |
| links | text | The URLs extracted from the field’s description. |
| deprecated | text | The OC4IDS minor version in which the field (or its parent) was deprecated. |
| deprecationnotes | text | The explanation for the deprecation of the field. |
| Column | Type | Description |
|---|---|---|
| date | date | The date of the rate. |
| currency | character varying | The currency of the rate. |
| rate | numeric | The rate. |
| Column | Type | Description |
|---|---|---|
| data | jsonb | An OC4IDS dataset. |
| Column | Type | Description |
|---|---|---|
| cove_output | jsonb | The libcoveoc4ids output for a collection. |
-
Open the notebook in Colab.
-
Make your changes in Colab, following the style guide for SQL statements.
-
Format any SQL code you add or edit using pgFormatter.
In Colab:
-
Click Edit -> Clear all outputs.
-
Remove any database credentials you entered into the notebook.
-
Click File -> Save a copy in Github.
-
Uncheck 'Include a link to Colaboratory'
-
Select your branch, enter a commit message and click OK.
-
Request a review from a helpdesk analyst.
-
If the reviewer requests changes, make the changes then repeat this step.
Once approved, you can merge your changes yourself.
- Make your changes.
- Add or update PostgreSQL comments on new or changed tables and columns.
- Update any notebooks affected by your changes.
- Update the entity relationship diagram using mermerd.
- Update the table documentation, the
psqlmeta-commands\dt+(to list tables) and\d+ table_name(to list columns).
For small changes, you can review the raw diff in the Github review interface.
For larger changes, you can review and comment on a visual diff by clicking the
button. You need to authorize the app the first time you open it.