Conversation
Also add log when cleaning cache
lib/git-fastclone.rb
Outdated
| # the cache directory entirely. This may cause the current clone to fail, but if the | ||
| # underlying error from git is transient it will not affect future clones. | ||
| puts '[WARN] Removing the fastclone cache.' | ||
| FileUtils.remove_entry_secure(mirror, force: true) |
There was a problem hiding this comment.
I wonder if we should be more thoughtful here and only clear the cache directory in the case of certain failures that we know aren't recoverable, given that removing the cache can lead to thundering herd issues for us.
For example, for the recent issue where we had two case insensitive matching branch names, I don't think clearing the cache helped resolve the issue, at least not until the upstream ref has been deleted. We'd have to experiment to see if clearing the cache was actually necessary to resolve the broken state once the bad ref was removed server side.
error: cannot lock ref 'refs/heads/<branch>': is at <sha1> but expected <sha2>
There was a problem hiding this comment.
So right now we clear cache only if we consider it as a retriable error, so if we converge the logic here to the one at the end (rescue block for. with_git_mirror method) at least for above error we face, we won't clear cache and will just fail hard? I think it's totally reasonable as we have it better than before (clear cache blindly)
Also use the same retriable logic around another git clone operation
|
@justinseanmartin Another look? In my manual test i have every call to fail_on_error/fail_pipe_on_error called with submodules and with/without The output generated when no -v is still more than the original very silent operation. But
|
justinseanmartin
left a comment
There was a problem hiding this comment.
Looks great! Few small things, but otherwise LGTM!
While cd will change where the current dir is, because it happens in a thread it should not impact the subsequent operations. This is also what used be before #53 is landed
Why this change
In our CI cluster, we run into fastclone got stuck because of this bug: thoughtbot/terrapin#5
We verified that after this change it run smoothly after consistently reproducing it.
Our codebase is so big that even
git remote update --prunereturns a huge output causing this problem to manifestWhen we interrupt the stuck process, we also see it stuck at the read stream function as indicated by above PR
Also verbose mode does not redirect output from git operations inside Terrapin to stdout for some reason. Therefore we think this is a good fix.
Also add log when cleaning cache because when lots of CI machiens got cache cleaned, we will throttle the network by trying to populate the cache at the same time. So adding log to warn about this.