Skip to content

Conversation

@rbeucher
Copy link
Member

@rbeucher rbeucher commented Feb 17, 2025

The yamanifest conda package requires nchash
Based on discussions with @charles-turner-1 and @aidanheerdegen the package is not used any more and the dependency to nchash should be removed.

@rbeucher rbeucher changed the title Update pyproject.toml Remove nchash dependency Feb 17, 2025
@aidanheerdegen
Copy link
Member

It will require code changes. Sorry if that wasn't clear.

@aidanheerdegen
Copy link
Member

Just a comment, if you're running an opinionated python formatting tool it's best to do that as a separate commit so the functional bits can be viewed (and re-viewed) more easily.

@charles-turner-1
Copy link
Collaborator

Just a comment, if you're running an opinionated python formatting tool it's best to do that as a separate commit so the functional bits can be viewed (and re-viewed) more easily.

Crap - I've got ruff configured to format on save by default, I'll fix it up in vim.

I think most of these changes are sensible but I'll get the formatting changes out and then request a review @aidanheerdegen - I'm not sure I fully grok the intent of some of the remaining broken tests.

@charles-turner-1
Copy link
Collaborator

collected 16 items

test/test_manifest.py ..FF............                                                                                                                 [100%]

========================================================================== FAILURES ==========================================================================
____________________________________________________________________ test_manifest_netcdf ____________________________________________________________________

    def test_manifest_netcdf():

        with cd(os.path.join('test','testfiles')):

            mf1 = mf.Manifest('mf1.yaml')

            for filepath in glob.glob('*.nc'):
                mf1.add(filepath,['md5','sha1'])

            mf1.dump()

        with cd(os.path.join('test','testfiles_copy')):

            mf2 = mf.Manifest('mf2.yaml')

            for filepath in glob.glob('*.nc'):
                mf2.add(filepath,['md5','sha1'])

            mf2.dump()

        # Unequal because they contain different fullpaths
        assert(mf1.equals(mf2) == False)
        # Equal when paths ignored
        assert(mf1.equals(mf2,paths=False) == True)

        # Test with array of filepaths
        with cd(os.path.join('test','testfiles_copy')):

            mf1 = mf.Manifest('mf1.yaml')

            mf1.add(glob.glob('*.nc'),['binhash'])
            mf1.add(hashfn=['md5','sha1'])

>       assert(mf1.equals(mf2))
E       assert False
E        +  where False = equals(<yamanifest.manifest.Manifest object at 0x1083ceb10>)
E        +    where equals = <yamanifest.manifest.Manifest object at 0x10850b350>.equals

test/test_manifest.py:148: AssertionError
_____________________________________________________________ test_manifest_netcdf_changed_time ______________________________________________________________

    def test_manifest_netcdf_changed_time():

        with cd(os.path.join('test','testfiles_copy')):

            mf3 = mf.Manifest('mf3.yaml')

            for filepath in glob.glob('*.nc'):
                touch(filepath)
                mf3.add(filepath,['md5','sha1'])

            mf3.dump()

            mf2 = mf.Manifest('mf2.yaml')
            mf2.load()

>           assert(not mf3.equals(mf2))
E           assert not True
E            +  where True = equals(<yamanifest.manifest.Manifest object at 0x10842c550>)
E            +    where equals = <yamanifest.manifest.Manifest object at 0x10850bbd0>.equals

test/test_manifest.py:165: AssertionError

@aidanheerdegen
Copy link
Member

I'm wondering why the test break. test_manifest_netcdf breaks because seemingly mf1 and mf2 are equal, even though they have different fullpaths. The internal logic seems unchanged, so I'm not sure why it is returning True.

As for the second test if it doesn't have binhash in the list of hashes it won't detect a changed timestamp. I'd give a specific code recommendation but I've run out of time for the day!

@charles-turner-1
Copy link
Collaborator

Bit of a left-field idea - I've been rewriting manageable sized stuff in Rust recently as a learning exercise - nchash is a pretty small package - does it make any sense at all to use a Rust-reimplementation of nchash which would be completely dependency free & avoid having to clean out the tests?

Made a fair whack of progress with the reimplentation already

@aidanheerdegen
Copy link
Member

Made a fair whack of progress with the reimplentation already

nchash was an idea to try and leverage the metadata available from a netCDF file to identify if it had changed.

As a change detection hash though it is limited to only netCDF files, so was quite limited in that regarded. binhash is much more brute-force, but also works on any file with a deterministic upper-bound on runtime, which is good.

I'd just remove nchash and replace with binhash in the tests, as it also detects when the timestamp changes.

Copy link
Member

@aidanheerdegen aidanheerdegen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thanks.

@charles-turner-1 charles-turner-1 merged commit dd8b552 into master Feb 27, 2025
5 checks passed
@charles-turner-1
Copy link
Collaborator

@aidanheerdegen Are you happy for me to trigger a new release with the removed dependency?

@aidanheerdegen
Copy link
Member

Sure, go for it

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants