CKAN plugin for dataset syndication between CKAN instances
This plugin provides a mechanism for syndicating datasets to another CKAN instance. If a dataset has the syndicate flag set to True in its custom metadata, any updates to that dataset will be reflected in the syndicated version.
Resources in the syndicated dataset are stored as URLs pointing to the resources in the original dataset. You must have the API key of a user on the target CKAN instance. See the Config Settings section below for details.
Other plugins can modify the data being syndicated or react to before/after syndication events by implementing the ISyndicate interface and subscribing to the corresponding signals. This is useful when schemas differ between CKAN instances.
Python 3.10+
To work over SSL, requires
pyOpenSSL
Compatibility with core CKAN versions:
| CKAN version | Compatibility |
|---|---|
| 2.9 and earlier | no |
| 2.10 | yes |
| 2.11 | yes |
To install ckanext-auth:
- Activate your CKAN virtual environment, for example:
. /usr/lib/ckan/default/bin/activate- Clone the source and install it on the virtualenv
git clone https://github.com/DataShades/ckanext-syndicate.git
cd ckanext-syndicate
pip install -e .-
Add
syndicate tablesto theckan.pluginssetting in your CKAN config file (by default the config file is located at/etc/ckan/default/ckan.ini). -
Apply database migrations:
ckan db upgrade
- Restart CKAN. For example if you've deployed CKAN with Apache on Ubuntu:
sudo service apache2 reload
Syndication performs dataset creation and updates on the remote portal. It also possible to syndicate the dataset to the multiple portals simultaneously. ckanext-syndicate makes no assumptions as to how many syndication endpoints you have and performs each synchronization separately as if you've configured the first syndication endpoint, did syndication, updated configuration did syndication once again.
Internally, set of config option related to the particular endpoint is called
profile(ckanext.syndicate.types.Profile). Each profile has an ID. ID is a
part of config option: ckanext.syndicate.profile.<PROFILE ID>.<OPTION> If
you want to syndicate dataset to the two different portals, first and
another, configuration may look like:
ckanext.syndicate.profile.first.ckan_url = https://data.example.com
ckanext.syndicate.profile.another.ckan_url = https://another.example.comHere is the full list of config options available for Profile. Don't forget
to replace PROFILE_ID with any identifier you like.
Note: In the options below, PREFIX = ckanext.syndicate.profile.PROFILE_ID.
| Option | Default | Example | Description |
|---|---|---|---|
PREFIX.ckan_url |
(required) | https://data.example.com |
The URL of the target CKAN instance to which datasets will be syndicated. |
PREFIX.api_key |
(required) | 9efdd954-c643-444a-97a1-c9c374cef861 |
The API key of the user on the target CKAN instance. |
PREFIX.organization |
None |
test-org |
The name of the organization on the target CKAN instance where syndicated datasets are created. |
PREFIX.flag |
syndicate |
syndicate_to_hdx |
The custom metadata flag used to mark datasets for syndication. |
PREFIX.field_id |
syndicated_id |
hdx_id |
The custom metadata field used to store the syndicated dataset ID on the original dataset. |
PREFIX.name_prefix |
'' |
my-prefix |
A prefix added to the name of the syndicated dataset. |
PREFIX.replicate_organization |
false |
true |
Whether to replicate the original dataset’s organization on the target CKAN instance. |
PREFIX.update_organization |
false |
true |
Whether to update organization metadata (doesn't update extras) if exists |
PREFIX.refresh_package_name |
false |
true |
Whether to refresh the dataset name on the remote portal. |
PREFIX.author |
None |
ricardomm |
The username whose API key is used. If a dataset already exists on the target CKAN, it will only be updated if its creator matches this username. |
PREFIX.user_agent |
None |
My CKAN Syndicator/1.0 |
Custom User-Agent string to use for HTTP requests to the target CKAN instance. |
PREFIX.upload_organization_image |
true |
false |
Whether to upload organization image when replicating organization. |
PREFIX.queue |
default |
syndication |
The name of the background jobs queue used for syndication tasks for this profile. |
In addition, the following config options control behavior of syndication process in general:
| Option | Default | Description |
|---|---|---|
ckanext.syndicate.sync_on_changes |
true |
Whether to automatically syndicate datasets whenever they are created, updated, or deleted. Disable this option if syndication should be triggered manually. |
Syndication can be configured for each individual portal. There are two types of customization: reactions to events and changes to workflow.
Reactions are useful when you need to perform a side-effect right before or right after the syndication. This can be achieved via the blinker's signals. The ckanext-syndicate provides two signals that can be imported from the ckanext.syndicate.signals (or subscribe via ISignal starting from CKAN v2.10):
before_syndicationafter_syndicationbefore_group_syndicationafter_group_syndication
The before_syndication and after_syndication signals get the local dataset's ID as sender and extra keyword argument
with the name profile (current syndication profile). Basic subscription looks like this:
@after_syndication.connect
def after_syndication_listener(package_id, **kwargs):
profile = kwargs.get("profile")
if profile:
do_something(package_id, profile)Changes to syndication workflow are made via ckanext.syndicate.interfaces.ISyndicate interface. At moment, it contains next methods:
skip_syndication- decide, whether syndication must be performed for the given profile.prepare_package_for_syndication- update the package, before it sent to the remote portal. It can be really useful if the portal that you are syndicating to, is using a different metadata schema.prepare_group_for_syndication- update the group, before it sent to the remote portal.
Basic implementations look like this:
class MyPlugin(plugins.Plugin):
plugins.implements(ISyndicate, inherit=True)
def skip_syndication(self, package: model.Package, profile: Profile) -> bool:
if should_be_syndicated(package):
return False
return True
def prepare_package_for_syndication(
self, package_id: str, data_dict: dict[str, Any], profile: Profile
) -> dict[str, Any]:
data_dict.pop("sensitive_field")
return data_dict
def prepare_group_for_syndication(
self, group_id: str, group: dict[str, Any], profile: Profile
) -> dict[str, Any]:
data_dict.pop("sensitive_field")
return groupDefault implementation of skip_syndication prevents syndication for:
- private datasets
- datasets with the falsy value of the field, specified by
ckanext.syndicate.profile.PROFILE_ID.flagconfig option (syndicateby default)
Mass or individual syndication can be triggered as well from the command line:
ckan syndicate sync [ID]Syndication provides that will be applied to the given datasets in case of syndication:
ckan syndicate check [ID]An individual profile synchronization can be triggered as well from the command line:
ckan syndicate sync-profile [PROFILE_ID]
ckan syndicate sync-profile [PROFILE_ID] -f # foregroundInstall dev-requirements.txt:
pip install -r dev-requirements.txtTo run the tests, do:
pytest --ckan-ini=test.ini