ckanext-syndicate

CKAN plugin for dataset syndication between CKAN instances

This plugin provides a mechanism for syndicating datasets to another CKAN instance. If a dataset has the syndicate flag set to True in its custom metadata, any updates to that dataset will be reflected in the syndicated version.

Resources in the syndicated dataset are stored as URLs pointing to the resources in the original dataset. You must have the API key of a user on the target CKAN instance. See the Config Settings section below for details.

Other plugins can modify the data being syndicated or react to before/after syndication events by implementing the ISyndicate interface and subscribing to the corresponding signals. This is useful when schemas differ between CKAN instances.

Requirements

Python 3.10+

To work over SSL, requires pyOpenSSL

Compatibility with core CKAN versions:

CKAN version	Compatibility
2.9 and earlier	no
2.10	yes
2.11	yes

Installation

To install ckanext-auth:

Activate your CKAN virtual environment, for example:

. /usr/lib/ckan/default/bin/activate

Clone the source and install it on the virtualenv

git clone https://github.com/DataShades/ckanext-syndicate.git
cd ckanext-syndicate
pip install -e .

Add syndicate tables to the ckan.plugins setting in your CKAN config file (by default the config file is located at /etc/ckan/default/ckan.ini).
Apply database migrations:

ckan db upgrade

Restart CKAN. For example if you've deployed CKAN with Apache on Ubuntu:

sudo service apache2 reload

Config settings

Syndication performs dataset creation and updates on the remote portal. It also possible to syndicate the dataset to the multiple portals simultaneously. ckanext-syndicate makes no assumptions as to how many syndication endpoints you have and performs each synchronization separately as if you've configured the first syndication endpoint, did syndication, updated configuration did syndication once again.

Internally, set of config option related to the particular endpoint is called profile(ckanext.syndicate.types.Profile). Each profile has an ID. ID is a part of config option: ckanext.syndicate.profile.<PROFILE ID>.<OPTION> If you want to syndicate dataset to the two different portals, first and another, configuration may look like:

ckanext.syndicate.profile.first.ckan_url = https://data.example.com
ckanext.syndicate.profile.another.ckan_url = https://another.example.com

Here is the full list of config options available for Profile. Don't forget to replace PROFILE_ID with any identifier you like.

Note: In the options below, PREFIX = ckanext.syndicate.profile.PROFILE_ID.

Option	Default	Example	Description
`PREFIX.ckan_url`	(required)	`https://data.example.com`	The URL of the target CKAN instance to which datasets will be syndicated.
`PREFIX.api_key`	(required)	`9efdd954-c643-444a-97a1-c9c374cef861`	The API key of the user on the target CKAN instance.
`PREFIX.organization`	`None`	`test-org`	The name of the organization on the target CKAN instance where syndicated datasets are created.
`PREFIX.flag`	`syndicate`	`syndicate_to_hdx`	The custom metadata flag used to mark datasets for syndication.
`PREFIX.field_id`	`syndicated_id`	`hdx_id`	The custom metadata field used to store the syndicated dataset ID on the original dataset.
`PREFIX.name_prefix`	`''`	`my-prefix`	A prefix added to the name of the syndicated dataset.
`PREFIX.replicate_organization`	`false`	`true`	Whether to replicate the original dataset’s organization on the target CKAN instance.
`PREFIX.update_organization`	`false`	`true`	Whether to update organization metadata (doesn't update extras) if exists
`PREFIX.refresh_package_name`	`false`	`true`	Whether to refresh the dataset name on the remote portal.
`PREFIX.author`	`None`	`ricardomm`	The username whose API key is used. If a dataset already exists on the target CKAN, it will only be updated if its creator matches this username.
`PREFIX.user_agent`	`None`	`My CKAN Syndicator/1.0`	Custom User-Agent string to use for HTTP requests to the target CKAN instance.
`PREFIX.upload_organization_image`	`true`	`false`	Whether to upload organization image when replicating organization.
`PREFIX.queue`	`default`	`syndication`	The name of the background jobs queue used for syndication tasks for this profile.

In addition, the following config options control behavior of syndication process in general:

Option	Default	Description
`ckanext.syndicate.sync_on_changes`	`true`	Whether to automatically syndicate datasets whenever they are created, updated, or deleted. Disable this option if syndication should be triggered manually.

Extending

Signals

Syndication can be configured for each individual portal. There are two types of customization: reactions to events and changes to workflow.

Reactions are useful when you need to perform a side-effect right before or right after the syndication. This can be achieved via the blinker's signals. The ckanext-syndicate provides two signals that can be imported from the ckanext.syndicate.signals (or subscribe via ISignal starting from CKAN v2.10):

before_syndication
after_syndication
before_group_syndication
after_group_syndication

The before_syndication and after_syndication signals get the local dataset's ID as sender and extra keyword argument with the name profile (current syndication profile). Basic subscription looks like this:

@after_syndication.connect
def after_syndication_listener(package_id, **kwargs):
    profile = kwargs.get("profile")
    if profile:
        do_something(package_id, profile)

Interface

Changes to syndication workflow are made via ckanext.syndicate.interfaces.ISyndicate interface. At moment, it contains next methods:

skip_syndication - decide, whether syndication must be performed for the given profile.
prepare_package_for_syndication - update the package, before it sent to the remote portal. It can be really useful if the portal that you are syndicating to, is using a different metadata schema.
prepare_group_for_syndication - update the group, before it sent to the remote portal.

Basic implementations look like this:

class MyPlugin(plugins.Plugin):
    plugins.implements(ISyndicate, inherit=True)

    def skip_syndication(self, package: model.Package, profile: Profile) -> bool:
        if should_be_syndicated(package):
            return False
        return True

    def prepare_package_for_syndication(
        self, package_id: str, data_dict: dict[str, Any], profile: Profile
    ) -> dict[str, Any]:
        data_dict.pop("sensitive_field")
        return data_dict

    def prepare_group_for_syndication(
        self, group_id: str, group: dict[str, Any], profile: Profile
    ) -> dict[str, Any]:
        data_dict.pop("sensitive_field")
        return group

Default implementation of skip_syndication prevents syndication for:

private datasets
datasets with the falsy value of the field, specified by ckanext.syndicate.profile.PROFILE_ID.flag config option (syndicate by default)

CLI commands

Mass or individual syndication can be triggered as well from the command line:

ckan syndicate sync [ID]

Syndication provides that will be applied to the given datasets in case of syndication:

ckan syndicate check [ID]

An individual profile synchronization can be triggered as well from the command line:

ckan syndicate sync-profile [PROFILE_ID]
ckan syndicate sync-profile [PROFILE_ID] -f # foreground

Tests

Install dev-requirements.txt:

pip install -r dev-requirements.txt

To run the tests, do:

pytest --ckan-ini=test.ini

License

AGPL

Name		Name	Last commit message	Last commit date
Latest commit History 235 Commits
.github/workflows		.github/workflows
ckanext		ckanext
.coveragerc		.coveragerc
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
CHANGELOG.md		CHANGELOG.md
LICENSE		LICENSE
MANIFEST.in		MANIFEST.in
README.md		README.md
dev-requirements.txt		dev-requirements.txt
pyproject.toml		pyproject.toml
setup.cfg		setup.cfg
setup.py		setup.py
test.ini		test.ini

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

ckanext-syndicate

Requirements

Installation

Config settings

Extending

Signals

Interface

CLI commands

Tests

License

About

Uh oh!

Releases 1

Packages

Uh oh!

Languages

License

DataShades/ckanext-syndicate

Folders and files

Latest commit

History

Repository files navigation

ckanext-syndicate

Requirements

Installation

Config settings

Extending

Signals

Interface

CLI commands

Tests

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Uh oh!

Languages

Packages