Skip to content

CKAN 2.10 compatability#98

Open
blagojabozinovski wants to merge 10 commits intockan:masterfrom
keitaroinc:ckan-2.10
Open

CKAN 2.10 compatability#98
blagojabozinovski wants to merge 10 commits intockan:masterfrom
keitaroinc:ckan-2.10

Conversation

@blagojabozinovski
Copy link

Adds comparability with CKAN 2.10


_run_async_validation(resource_id)

if ckan_2_10:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This conditional isn't really needed. It's fine to create the functions even if they won't be called.

Copy link
Author

@blagojabozinovski blagojabozinovski Jun 13, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ThrawnCA On point. I removed that condition.

@rantoniuk
Copy link

This PR had no conflicts at the time of review, but now has conflicts to be resolved. Is there any specific reason it wasn't merged at the time with the positive review it had?

@ThrawnCA
Copy link
Contributor

Is there any specific reason it wasn't merged at the time with the positive review it had?

I can only speak for myself and say I'm pretty sure I didn't have write access at the time.

It should be possible to create a new branch within this repo, pull in the Keitaro changes, and fix the conflict.

@duttonw
Copy link
Contributor

duttonw commented Dec 16, 2024

I'll get onto this, sorry i did not see this before i started to incorporate qld-gov-au 2.10 work (still wip)

@blagojabozinovski
Copy link
Author

This PR had no conflicts at the time of review, but now has conflicts to be resolved. Is there any specific reason it wasn't merged at the time with the positive review it had?
Same goes here. I don't have access

@duttonw
Copy link
Contributor

duttonw commented Dec 17, 2024

Hi @blagojabozinovski ,

I've tried to correct your branch based off where this was forked from v2.0.0. Sadly you have 38 failing tests

https://github.com/ckan/ckanext-validation/actions/runs/12367164030
only additions were cicd and test updates to latest pytest which do pass in 2.9.


commit 7a0a12e (HEAD -> keitaronic/ckan-2.10-test, origin/keitaronic/ckan-2.10-test)
Author: william dutton will.dutt@gmail.com
Date: Tue Dec 17 15:47:05 2024 +1000

fix: tests need to now be 'Pytest 5' compatible

commit da28f22
Author: william dutton will.dutt@gmail.com
Date: Tue Dec 17 15:39:01 2024 +1000

enable 2.9 and 2.10 cicd testing on keitaronic base branch

reviewing what you changed:

v2.0.0...keitaroinc:ckanext-validation:ckan-2.10

You went by pushing back the split that ckan 2.10 on dataset/resource creates/updates back onto its original function name and it does seem its not fully functional.


    def after_dataset_create(self, context, data_dict):
        self.after_create(context, data_dict)

    def before_resource_update(self, context, current_resource, updated_resource):
        self.before_update(context, current_resource, updated_resource)

    def after_dataset_update(self, context, data_dict):
        self.after_update(context, data_dict)

I'll try and work on the qld-gov-au fork which has 2.10 working with passing tests into upstream.

the biggest issue i'm needing to work through is logic resource_create, resource_update, being changed to package_patch, resource_show, and its logic placed elsewhere since the original validation was 'hacking' the core functions :'(

Code below:


@t.chained_action
def resource_create(up_func, context, data_dict):
    '''Appends a new resource to a datasets list of resources.

    This is duplicate of the CKAN core resource_create action, with just the
    addition of a synchronous data validation step.

    This is of course not ideal but it's the only way right now to hook
    reliably into the creation process without overcomplicating things.
    Hopefully future versions of CKAN will incorporate more flexible hook
    points that will allow a better approach.

    '''

    if settings.get_create_mode_from_config() != 'sync':
        return up_func(context, data_dict)

    model = context['model']

    package_id = t.get_or_bust(data_dict, 'package_id')
    if not data_dict.get('url'):
        data_dict['url'] = ''

    pkg_dict = t.get_action('package_show')(
        dict(context, return_type='dict'),
        {'id': package_id})

    t.check_access('resource_create', context, data_dict)

    for plugin in plugins.PluginImplementations(plugins.IResourceController):
        plugin.before_create(context, data_dict)

    if 'resources' not in pkg_dict:
        pkg_dict['resources'] = []

    upload = uploader.get_resource_uploader(data_dict)

    if 'mimetype' not in data_dict:
        if hasattr(upload, 'mimetype'):
            data_dict['mimetype'] = upload.mimetype

    if 'size' not in data_dict:
        if hasattr(upload, 'filesize'):
            data_dict['size'] = upload.filesize

    pkg_dict['resources'].append(data_dict)

    try:
        context['defer_commit'] = True
        context['use_cache'] = False
        t.get_action('package_update')(context, pkg_dict)
        context.pop('defer_commit')
    except t.ValidationError as e:
        try:
            raise t.ValidationError(e.error_dict['resources'][-1])
        except (KeyError, IndexError):
            raise t.ValidationError(e.error_dict)

    # Get out resource_id resource from model as it will not appear in
    # package_show until after commit
    resource_id = context['package'].resources[-1].id
    upload.upload(resource_id,
                  uploader.get_max_resource_size())

    # Custom code starts

    run_validation = True

    for plugin in plugins.PluginImplementations(IDataValidation):
        if not plugin.can_validate(context, data_dict):
            log.debug('Skipping validation for resource {}'.format(resource_id))
            run_validation = False

    if run_validation:
        is_local_upload = (
            hasattr(upload, 'filename')
            and upload.filename is not None
            and isinstance(upload, uploader.ResourceUpload))
        _run_sync_validation(
            resource_id, local_upload=is_local_upload, new_resource=True)

    # Custom code ends

    model.repo.commit()

    #  Run package show again to get out actual last_resource
    updated_pkg_dict = t.get_action('package_show')(
        context, {'id': package_id})
    resource = updated_pkg_dict['resources'][-1]

    #  Add the default views to the new resource
    t.get_action('resource_create_default_resource_views')(
        {'model': context['model'],
         'user': context['user'],
         'ignore_auth': True
         },
        {'resource': resource,
         'package': updated_pkg_dict
         })

    for plugin in plugins.PluginImplementations(plugins.IResourceController):
        plugin.after_create(context, resource)

    return resource


@t.chained_action
def resource_update(up_func, context, data_dict):
    '''Update a resource.

    This is duplicate of the CKAN core resource_update action, with just the
    addition of a synchronous data validation step.

    This is of course not ideal but it's the only way right now to hook
    reliably into the creation process without overcomplicating things.
    Hopefully future versions of CKAN will incorporate more flexible hook
    points that will allow a better approach.

    '''

    if settings.get_update_mode_from_config() != 'sync':
        return up_func(context, data_dict)

    model = context['model']
    id = t.get_or_bust(data_dict, "id")

    if not data_dict.get('url'):
        data_dict['url'] = ''

    resource = model.Resource.get(id)
    context["resource"] = resource
    old_resource_format = resource.format

    if not resource:
        log.debug('Could not find resource %s', id)
        raise t.ObjectNotFound(t._('Resource was not found.'))

    t.check_access('resource_update', context, data_dict)
    del context["resource"]

    package_id = resource.package.id
    pkg_dict = t.get_action('package_show')(dict(context, return_type='dict'),
                                            {'id': package_id})

    for n, p in enumerate(pkg_dict['resources']):
        if p['id'] == id:
            break
    else:
        log.error('Could not find resource %s after all', id)
        raise t.ObjectNotFound(t._('Resource was not found.'))

    # Persist the datastore_active extra if already present and not provided
    if ('datastore_active' in resource.extras
            and 'datastore_active' not in data_dict):
        data_dict['datastore_active'] = resource.extras['datastore_active']

    for plugin in plugins.PluginImplementations(plugins.IResourceController):
        plugin.before_update(context, pkg_dict['resources'][n], data_dict)

    upload = uploader.get_resource_uploader(data_dict)

    if 'mimetype' not in data_dict:
        if hasattr(upload, 'mimetype'):
            data_dict['mimetype'] = upload.mimetype

    if 'size' not in data_dict and 'url_type' in data_dict:
        if hasattr(upload, 'filesize'):
            data_dict['size'] = upload.filesize

    pkg_dict['resources'][n] = data_dict

    try:
        context['defer_commit'] = True
        context['use_cache'] = False
        updated_pkg_dict = t.get_action('package_update')(context, pkg_dict)
        context.pop('defer_commit')
    except t.ValidationError as e:
        try:
            raise t.ValidationError(e.error_dict['resources'][-1])
        except (KeyError, IndexError):
            raise t.ValidationError(e.error_dict)

    upload.upload(id, uploader.get_max_resource_size())

    # Custom code starts

    run_validation = True
    for plugin in plugins.PluginImplementations(IDataValidation):
        if not plugin.can_validate(context, data_dict):
            log.debug('Skipping validation for resource {}'.format(id))
            run_validation = False

    if run_validation:
        run_validation = not data_dict.pop('_skip_next_validation', None)

    if run_validation:
        is_local_upload = (
            hasattr(upload, 'filename')
            and upload.filename is not None
            and isinstance(upload, uploader.ResourceUpload))
        _run_sync_validation(
            id, local_upload=is_local_upload, new_resource=False)

    # Custom code ends

    model.repo.commit()

    resource = t.get_action('resource_show')(context, {'id': id})

    if old_resource_format != resource['format']:
        t.get_action('resource_create_default_resource_views')(
            {'model': context['model'], 'user': context['user'],
             'ignore_auth': True},
            {'package': updated_pkg_dict,
             'resource': resource})

    for plugin in plugins.PluginImplementations(plugins.IResourceController):
        plugin.after_update(context, resource)

    return resource
@tk.chained_action
def package_patch(original_action, context, data_dict):
    ''' Detect whether resources have been replaced, and if not,
    place a flag in the context accordingly if save flag is not set

    Note: controllers add default context where save is in request params
        'save': 'save' in request.params
    '''
    if 'save' not in context and 'resources' not in data_dict:
        context['save'] = True
    original_action(context, data_dict)


@tk.side_effect_free
@tk.chained_action
def resource_show(next_func, context, data_dict):
    """Throws away _success_validation flag, that we are using to prevent
    multiple validations of resource in different interface methods
    """
    if context.get('ignore_auth'):
        return next_func(context, data_dict)

    data_dict = next_func(context, data_dict)

    data_dict.pop('_success_validation', None)
    return data_dict

@rantoniuk
Copy link

@duttonw do you see a way to get the issues you mentioned fixed or do you need any contribution from the original contributor or the community?

@duttonw
Copy link
Contributor

duttonw commented Mar 27, 2025

Our www.data.qld.gov.au website is now using

   CKANExtValidation: &CKANExtValidation
      name: "ckanext-validation-{{ Environment }}"
      shortname: "ckanext-validation"
      description: "CKAN Extension for validating Data Packages"
      type: "git"
      url: "https://github.com/qld-gov-au/ckanext-validation.git"
      version: "0.0.8-qgov.15"

qld-gov-au/ckanext-validation@v0.0.8-qgov.13...0.0.8-qgov.15

This is 2.10 and 2.11 compatible and is using frictionless==5.18.0
It includes @JVickery-TBS IPipeValidation integration with xloader and similar to trigger datastore updates if validation passes.

It is majority passing on ckan master
https://github.com/qld-gov-au/ckanext-validation/actions/runs/12781896654/job/35630544240#step:6:1
Only these failing

  FAILED ckanext/validation/tests/test_interfaces.py::TestInterfaceAsync::test_can_validate_called_on_update_async
  FAILED ckanext/validation/tests/test_plugin.py::TestResourceControllerHooksUpdate::test_validation_run_on_upload
  FAILED ckanext/validation/tests/test_plugin.py::TestResourceControllerHooksUpdate::test_validation_run_on_url_change
  FAILED ckanext/validation/tests/test_plugin.py::TestResourceControllerHooksUpdate::test_validation_run_on_schema_change
  FAILED ckanext/validation/tests/test_plugin.py::TestResourceControllerHooksUpdate::test_validation_run_on_format_change
  FAILED ckanext/validation/tests/test_plugin.py::TestResourceControllerHooksUpdate::test_validation_run_on_validation_options_change
  FAILED ckanext/validation/tests/test_plugin.py::TestPackageControllerHooksUpdate::test_validation_runs_with_url

I'll try and get this into upstream in the coming weeks.

@rantoniuk
Copy link

@duttonw I'm guessing your work is going to be in a separate branch/PR that's going to replace this one?

@duttonw
Copy link
Contributor

duttonw commented Apr 15, 2025

Hi @rantoniuk ,

The work that was done on qld-gov-au ckanext-validation for tests don't at all agree to the tests upstream (was majority written by a vendor) I know that https://github.com/qld-gov-au/ckanext-validation works on 2.11 but I've not had the mental fortitude to do the alignment due to ensuring tests pass for at least 2.9 for both code bases being merged.

What has been done on the qld-gov-au fork has compartmentalised majority of the functions instead of leaving them as global as well as used valid 2.10+ structures. So the majority of the code that is run in production can be shuffled without much risk. (except for this last bit)

The tests on the other hand are .... a tough order. After I've got some thinking space (post etag/caching on ckan core) I'll try and get back to this.

For the time being use https://github.com/qld-gov-au/ckanext-validation 0.0.8-qgov.15 which is 2.2.0-qgov (but due to not being merged back from upstream is still on the 0.0.x naming)

Would you like me to publish the latest qld-gov-au forked version to pypi perhaps as a v2.3.0rc1 ?

based on following version spec: https://peps.python.org/pep-0440/

@duttonw
Copy link
Contributor

duttonw commented Apr 15, 2025

-_-
https://github.com/qld-gov-au/ckanext-validation
This branch is 458 commits ahead of, 193 commits behind ckan/ckanext-validation:master.

@rantoniuk
Copy link

https://github.com/qld-gov-au/ckanext-validation
This branch is 458 commits ahead of, 193 commits behind ckan/ckanext-validation:master.

Yeah, I checked that yesterday and that's what triggered my question 🥹

Trying to merge back origin/develop gives a bunch of conflicts, not so many in terms of number of files, but when I looked at the code diff, there is quite a logic conflicts that would need to be resolved.

	both added:      .github/dependabot.yml
	both added:      .github/workflows/test.yml
	both modified:   .gitignore
	both modified:   MANIFEST.in
	both modified:   README.md
	both added:      ckanext/validation/common.py
	both modified:   ckanext/validation/helpers.py
	both modified:   ckanext/validation/interfaces.py
	both modified:   ckanext/validation/jobs.py
	both added:      ckanext/validation/logic/action.py
	both modified:   ckanext/validation/model.py
	both modified:   ckanext/validation/plugin.py
	both modified:   ckanext/validation/settings.py
	both added:      ckanext/validation/templates/validation/snippets/validation_report_asset.html
	both added:      ckanext/validation/tests/fixtures.py
	both modified:   ckanext/validation/tests/helpers.py
	both modified:   ckanext/validation/tests/test_form.py
	both modified:   ckanext/validation/tests/test_helpers.py
	both modified:   ckanext/validation/tests/test_interfaces.py
	both modified:   ckanext/validation/tests/test_jobs.py
	both modified:   ckanext/validation/tests/test_logic.py
	both modified:   ckanext/validation/tests/test_plugin.py
	both modified:   ckanext/validation/tests/test_utils.py
	both modified:   ckanext/validation/tests/test_validators.py
	both modified:   ckanext/validation/utils.py
	deleted by us:   ckanext/validation/webassets/css/validation-report.css
	deleted by us:   ckanext/validation/webassets/resource.config
	both added:      ckanext/validation/webassets/webassets.yml
	both modified:   dev-requirements.txt
	both modified:   requirements.txt

Would you like me to publish the latest qld-gov-au forked version to pypi perhaps as a v2.3.0rc1 ?

Nah, I'm poking with comments here because we're observing and discussing internally how alive this plugin is at all in terms of long-term maintenance and keeping it aligned with ckan core updates support - so I'm trying to understand in overall what's the timeline and plan for aligning those two forks (if at all) - or if we should use other approaches for validation (like tabledesigner plugin or a totally new plugin written from scratch).

At the same time, I see the recent upstream's (this repo) develop commits pushed by you, so I guess that in the long run, if you plan to continue using this plugin, then you're also interested to sync with upstream and drop the fork to make your life easier?

@duttonw
Copy link
Contributor

duttonw commented Apr 15, 2025

Hi @rantoniuk ,

I've recently (last year) gained access to this plugin and was successful in moving from datashades to ckan org. Its also been uplifted to be pypi pipeline integrated.

This is one of the major plugins in our system https://www.data.qld.gov.au/api/3/action/status_show as we well as in canada ckan. Its not going away, only difficultly is that our version was dramatically uplifted and then was cherry picked back to master but in such a way they diverged yet still align for the majority.

Ckan Canada did the ipipe validation that when in combination with xloader, makes xloader wait for validation plugin to be green prior to updating the datastore saving blatting api datastores with bad columns etc.

I'd highly suggest you try the qld-gov-au version and see how it goes for you. We also have cucumber selenium tests on top of the unit testing. We also do full integration tests of the suite of plugins https://github.com/qld-gov-au/ckan-qld-infrastructure/blob/master/vars/instances-OpenData.var.yml tests like https://github.com/qld-gov-au/ckan-qld-infrastructure/blob/master/test/features/resource_validation.feature

And we have just updated dev data.qld.gov.au to 2.11 today and will be rolling forward to production in the coming 2-4 weeks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants