Skip to content

Conversation

@nicks
Copy link
Contributor

@nicks nicks commented Feb 21, 2023

- What I did
context: adjust the file write logic to avoid corrupt context meta.json files

- How I did it
Write to a tempfile then move, so that if the
process dies mid-write it doesn't corrupt the store.

Also improve error messaging so that if a file does get corrupted, the user has some hope of figuring
out which file is broken.

For background, see:
docker/for-win#13180
docker/for-win#12561

I have other fixes in DD 4.17 that will help with these issues as well.

- How to verify it

I've been using this sandbox to repro the issue:
https://github.com/nicks/contextstore-sandbox

I verified that after these changes, the test.sh script no longer leaves files corrupt.

- Description for the changelog
context: adjust the file write logic to avoid corrupt context meta.json files

- A picture of a cute animal (not mandatory but encouraged)
PXL_20211217_145609846

@thaJeztah
Copy link
Member

Haven't compared the implementations, but wondering if using ioutils.AtomicWriteFile would make sense for this (although I'm not sure if it's already used elsewhere in the CLI);

// AtomicWriteFile atomically writes data to a file named by filename.
func AtomicWriteFile(filename string, data []byte, perm os.FileMode) error {
f, err := NewAtomicFileWriter(filename, perm)

@nicks
Copy link
Contributor Author

nicks commented Feb 21, 2023

@thaJeztah oh, that's perfect! ya, that code uses a similar approach. will switch to using that.

Write to a tempfile then move, so that if the
process dies mid-write it doesn't corrupt the store.

Also improve error messaging so that if a file does
get corrupted, the user has some hope of figuring
out which file is broken.

For background, see:
docker/for-win#13180
docker/for-win#12561

For a repro case, see:
https://github.com/nicks/contextstore-sandbox

Signed-off-by: Nick Santos <nick.santos@docker.com>
@nicks
Copy link
Contributor Author

nicks commented Feb 21, 2023

worked like a charm!

@thaJeztah
Copy link
Member

worked like a charm!

Ah, great! I don't think that implementation is the "end-all-be-all" canonical writer, but at least it could help finding locations where we consider an "atomic write" to be important in case we'd ever decide to improve it (or replace it with a more complete "atomic" writer - I know there's some projects (e.g. https://github.com/google/renameio).

We should probably still create a tracking ticket to think about concurrent access, as none of this code was written with that in mind (and could be relevant if used as library code as well).

Copy link
Member

@thaJeztah thaJeztah left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

perhaps others could have a quick look as well on the (non) error wrapping and if there's any concerns about that.

Comment on lines -80 to +83
return Metadata{}, err
return Metadata{}, fmt.Errorf("parsing %s: %v", fileName, err)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This changes is masking the original error (%v). I don't THINK we use any of these errors as sentinel errors (had a cursory glance at call-sites for this code, and I think we only use the "not found" case as special), so we probably should be fine, but I'd 🤦 if I didn't comment, and regret it later.

I guess there's not much we can do in these cases, other than report "something was wrong; here's some info", so I guess we should be fine.

Only alternative would be to make this an errdefs.InvalidParam() (if we consider this "input is invalid", and to consistently return on of the errdefs types), but probably something we could still consider in a follow-up if we see a need.

@nicks
Copy link
Contributor Author

nicks commented Feb 22, 2023

re: atomic writes - i'm more worried about "this process got killed mid-write" then i'm worried about "two processes try to write at the same time". I know there's also been discussion about making Go's internal file locking library public - golang/go#33974

re: masking the original error - ya, I'm open to better patterns for this. mainly wanted to avoid users seeing a context-free json parse error.

@thaJeztah
Copy link
Member

i'm more worried about "this process got killed mid-write" then i'm worried about "two processes try to write at the same time".

Agreed.

There still may be other paths that write files (docker login etc updating the config.json) I always hated that, and we should still look at it (but probably as part of a re-visioning of configuration as a whole).

I know there's also been discussion about making Go's internal file locking library public - golang/go#33974

Yes, having something like that would be good. I had to LOL on the pretty standard "why don't you use <some fork of stdlib>" comment. Which seems to be the standard response in many cases (sigh!).

re: masking the original error - ya, I'm open to better patterns for this. mainly wanted to avoid users seeing a context-free json parse error.

I think it's fine as-is. Just (lessons learned) thought it was better to leave a (possibly fine to ignore) comment than to regret not commenting in hindsight.

@thaJeztah
Copy link
Member

@vvoland @cpuguy83 @neersighted ptal (just a couple more eyes never huts)

Copy link
Member

@neersighted neersighted left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, I also have no concerns about concurrent access. Clobbering will happen, but in a way that is much more likely to not break things than today.

@thaJeztah
Copy link
Member

Thanks all for the extra set of eyes!

I'll bring this one in: I also marked it for cherry-picking, as I think this change should be fairly safe, and it may help resolve some of these issues.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants