Skip to content

_frontend/widget: Allow showing the CAS digest of built artifacts#1994

Merged
gtristan merged 2 commits intoapache:masterfrom
Kekun:show-cas-digest
May 15, 2025
Merged

_frontend/widget: Allow showing the CAS digest of built artifacts#1994
gtristan merged 2 commits intoapache:masterfrom
Kekun:show-cas-digest

Conversation

@Kekun
Copy link

@Kekun Kekun commented Feb 14, 2025

This adds the artifact-cas-digest format string to the show command.

This is a first step implementing #1984.

  • Add tests, e.g. building an artifact and comparing the printed digest

@Kekun Kekun force-pushed the show-cas-digest branch 2 times, most recently from d2b2537 to 541cc9e Compare February 20, 2025 16:48
@Kekun Kekun changed the title _frontend/widget: Allow showing the CAS digest _frontend/widget: Allow showing the CAS digest of built artifacts Feb 28, 2025
@Kekun Kekun force-pushed the show-cas-digest branch from 1125a57 to 6bc0a07 Compare March 6, 2025 14:53
@Kekun
Copy link
Author

Kekun commented Mar 6, 2025

I added a test, it's probably not enough though…

@gtristan
Copy link
Contributor

gtristan commented Mar 6, 2025

I feel that cas-digest is too generic. This is actually the cas digest of the built artifact, we could (maybe not now, but in the future) have cas digests of other things (like the source code, or the buildtree if we have one)

I tend to agree, while the “CAS” is an implementation detail it’s fairly well known, I think I’d rather avoid exposing the word in APIs if we don’t already have it.

Also valid point about source caches.

let’s use another name, perhaps content-hash ? Maybe that doesn’t capture the implication that it is the artifact content… maybe output-hash or artifact-content-hash ?

Also, was this man page updated using the automated generation ? We should only update the command description in cli.py and use the python command to automatically update all the man pages and commit that as a separate commit… that command is documented… somewhere in our HACKING file(s)….

@Kekun
Copy link
Author

Kekun commented Mar 7, 2025

Isn't "hash" more technical than "digest" too (genuine question, I don't know)? What about just artifact-digest (or artifact-hash if you prefer)? Any reason to prefer "content"? I'll let you two decide the right words to use as you are the BuildStream specialists. :)

The man page what updated manually, I'll update it in a different commit then.

@gtristan
Copy link
Contributor

Regarding testing in this PR: Yes the added test is good to add, but not at all sufficient IMO.

What I would like to see here...

  • Take the tests/frontend/artifact_checkout.py test as an example of a test that:
    • Actually builds something
    • Pushing the artifact to a remote artifact cache
  • Make sure you create a new directory for the test project data
    • In the past we had many tests using the same directory, e.g. tests/frontend/project still has a bit of this mess going on
    • We made efforts to tear this apart ensuring that individual tests have their own static committed project directories, please take this new approach as it is easier to maintain, and modifying one test does not subtly affect adjacent tests
  • You can add this new test to tests/frontend/show.py, or to a new python file (do not sneak the new unrelated test additional test into artifact_checkout.py of course ;-) )
  • The test, like the artifact checkout test, can use a simple import element of a committed text file
    • This is expected to be 100% reproducible, so the CAS digest should always be the same
    • Record this CAS digest (whether we call this a "hash" or whatever in the final CLI UI, just take note of it)
  • The test body essentially does this:
    • Build the element
    • Run bst show --format ... to obtain the CAS digest
    • Assert that the CAS digest is exactly the one you previously recorded, i.e. hard coded CAS digest
  • Test multiple scenarios
    • Building with and without the remote cache enabled
    • Building with remote cache enabled, deleting the artifact locally, before bst show
      • In this scenario I think we expect to not have the CAS digest
    • Building with remote cache enabled, delete the artifact locally, then bst artifact pull the artifact, and bst show
      • In this scenario we expect to have exactly the hard coded CAS digest

I realize this is a lot of text, but it shouldn't be too difficult to do in practice, there are examples of all of this in the tests already.

@Kekun Kekun force-pushed the show-cas-digest branch 2 times, most recently from 2ee7cd4 to 435a56b Compare March 13, 2025 15:45
@pytest.mark.parametrize(
"target, expected_digests",
[
("target.bst", {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that, as I am asking for a separate project directory be created for this test to stand on its own and not mingle data with other tests... I really think it is sufficient to just have a project with less elements.

I'd be happy enough with one element, but it would be useful to test various different import elements:

  • A regular file with some simple example content e.g. "The perverted pink pony mounts the unwilling hippopotamus"
  • A regular file with execute permission
  • A symbolic link

The value here might be to check that we indeed get differing content hashes for individual files with different modes/types, however... while I didn't think of asking for this, and I'm highly confident that such content hashes are going to differ, it might be interesting to actually know.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I confirm the hash changes when a file is executable or not. I tested it with a working test, then running chmod +x on an imported file, ran the tests and got a different digest, then chmod -x on the same file and the old digest matched anew.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right so lets include some different samples / hard coded digests in the tests if it's not too much trouble.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nah it's trivial, I did it already (locally). 🙂

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As mentioned in my new comments, let's take care to:

  • Have one test which cares about what the digests are
  • This test is parametrized (and uses the ids especially since it produces these super long test names)
  • Not use parametrization for cases where the digest value itself doesn't matter.

This is just about only testing what is relevant to test, and keeping overall test time to a minimum.

result = cli.run(project=project, silent=True, args=["show", "--format", "%{name},%{artifact-cas-digest}", target])
result.assert_success()

digests = dict(line.split(",", 2) for line in result.output.splitlines())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel this is over engineered, you're testing a single element each time.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is because I carried it over from my previous test which also tested dependencies. 🙂 I'm considering adding dependencies to a test, just so we can check they were built and we can access their digest. Otherwise indeed it could and should be simplified.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel ambivalent about this test, however I don't know that it is relevant, it seems irrelevant

  • We are not testing the user's ability to parse bst show output here
  • Are we testing that a cas digest of an element with dependencies has a specific digest ?
    • If so, we can add that single element in the same test with the rest of the other elements which have potentially different cas digests
    • Note that the dependencies of an artifact does not affect its digest, because what we are interested in is the digest of the content of the artifact, not the digest of the overall artifact envelope.

Perhaps, it is relevant to test that artifacts which have the same content but differing dependencies have exactly the same cas digest ? if only as a means of validating that we are reporting the correct digest here ?

@Kekun Kekun force-pushed the show-cas-digest branch from f6b0d0c to 2b0a67a Compare April 15, 2025 09:02
@gtristan
Copy link
Contributor

gtristan commented May 4, 2025

Overall, thanks for updating the tests, its a great update and sorry I didn't get back to it sooner.

There are some more comments which should be straight forward to address, and one comment which I find concerning, regarding whether we are actually reporting the correct digest of the content and not the digest of the artifact envelope: #1994 (comment)

You will also need to apply 51227ac to this branch in order to pass CI, since upstream setuptools decided to break the whole world. Scratch that, for this problem you will just need to rebase against buildstream master.

@Kekun Kekun force-pushed the show-cas-digest branch 2 times, most recently from f1124db to d2a6b91 Compare May 6, 2025 09:32
@gtristan
Copy link
Contributor

gtristan commented May 6, 2025

Interestingly, this is the first time bst show is requiring artifact data in order to work, the only tangentially similar thing is the %{state}, which has knowledge about whether an artifact is cached (or has build dependencies cached).

Asides from the testing we've already added in this PR around what happens with remote caches and locally deleted artifacts, this also raises the question of how this behaves with remote execution and options such as storage-service (https://docs.buildstream.build/master/using_config.html#cache-control).

It seems however fine that the digests will be empty unless a bst artifact pull command has completed for the artifact.

@Kekun Kekun force-pushed the show-cas-digest branch 2 times, most recently from 06512b9 to 5d5ffc2 Compare May 9, 2025 09:19
@Kekun
Copy link
Author

Kekun commented May 9, 2025

I believe I addressed all your comments, please let me know if I missed something.

@Kekun Kekun force-pushed the show-cas-digest branch 2 times, most recently from a981379 to eecc2a8 Compare May 13, 2025 09:10
Adrien Plazas added 2 commits May 14, 2025 12:51
This adds the artifact-cas-digest format string to the show command,
allowing to show the CAS digest of the built artifact.
@Kekun Kekun force-pushed the show-cas-digest branch from eecc2a8 to 0a55272 Compare May 14, 2025 10:52
@gtristan
Copy link
Contributor

Thanks for your relentless work on this !

@gtristan gtristan merged commit 4e1404b into apache:master May 15, 2025
17 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants