Replace Terrapin with BuildExecution to fix process deadlock when output to stderr is fills buffer by gyfelton · Pull Request #53 · square/git-fastclone

gyfelton · 2023-03-08T15:21:23Z

Why this change

In our CI cluster, we run into fastclone got stuck because of this bug: thoughtbot/terrapin#5
We verified that after this change it run smoothly after consistently reproducing it.
Our codebase is so big that even git remote update --prune returns a huge output causing this problem to manifest
When we interrupt the stuck process, we also see it stuck at the read stream function as indicated by above PR
Also verbose mode does not redirect output from git operations inside Terrapin to stdout for some reason. Therefore we think this is a good fix.

Also add log when cleaning cache because when lots of CI machiens got cache cleaned, we will throttle the network by trying to populate the cache at the same time. So adding log to warn about this.

Fix spec, should have spec failing or need to add some
Update version file and Gemfile.lock

Also add log when cleaning cache

lib/git-fastclone.rb

justinseanmartin · 2023-03-08T16:44:15Z

lib/git-fastclone.rb

      # the cache directory entirely. This may cause the current clone to fail, but if the
      # underlying error from git is transient it will not affect future clones.
+      puts '[WARN] Removing the fastclone cache.'
      FileUtils.remove_entry_secure(mirror, force: true)


I wonder if we should be more thoughtful here and only clear the cache directory in the case of certain failures that we know aren't recoverable, given that removing the cache can lead to thundering herd issues for us.

For example, for the recent issue where we had two case insensitive matching branch names, I don't think clearing the cache helped resolve the issue, at least not until the upstream ref has been deleted. We'd have to experiment to see if clearing the cache was actually necessary to resolve the broken state once the bad ref was removed server side.

error: cannot lock ref 'refs/heads/<branch>': is at <sha1> but expected <sha2>

So right now we clear cache only if we consider it as a retriable error, so if we converge the logic here to the one at the end (rescue block for. with_git_mirror method) at least for above error we face, we won't clear cache and will just fail hard? I think it's totally reasonable as we have it better than before (clear cache blindly)

Also use the same retriable logic around another git clone operation

lib/git-fastclone.rb

spec/git_fastclone_runner_spec.rb

gyfelton · 2023-03-09T21:28:15Z

@justinseanmartin Another look? In my manual test i have every call to fail_on_error/fail_pipe_on_error called with submodules and with/without -v set against git fastclone.

The output generated when no -v is still more than the original very silent operation. But
I also tested adding GIT_ALLOW_PROTOCOL=file at the beginning and indeed it failed with https protocol not supported error
The last thing I need to test is probably the two rescue blocks changed/added: plan is to manually raise error with/without the matching output and verify it indeed clear cache and retry

Do above manual raising of error

Gemfile.lock

lib/git-fastclone.rb

justinseanmartin

Looks great! Few small things, but otherwise LGTM!

lib/git-fastclone.rb

This reverts commit fbff44e.

lib/git-fastclone.rb

While cd will change where the current dir is, because it happens in a thread it should not impact the subsequent operations. This is also what used be before #53 is landed

Fix stuck at handling extra long stderr during clone or remote update

725380c

Also add log when cleaning cache

gyfelton requested a review from justinseanmartin March 8, 2023 15:21

gyfelton and others added 5 commits March 8, 2023 10:23

Update git_fastclone_runner_spec.rb

f0ada43

Update version.rb

cbaa7db

Update Gemfile.lock

f307043

Update git-fastclone.rb

3fdc53d

Update git-fastclone.rb

49a2be6

justinseanmartin reviewed Mar 8, 2023

View reviewed changes

gyfelton requested a review from justinseanmartin March 8, 2023 21:36

gyfelton added 2 commits March 9, 2023 16:18

Use build_execution wrapper for Open3

0a09e25

Also use the same retriable logic around another git clone operation

fixup

8958595

gyfelton commented Mar 9, 2023

View reviewed changes

lib/git-fastclone.rb Outdated Show resolved Hide resolved

gyfelton commented Mar 9, 2023

View reviewed changes

spec/git_fastclone_runner_spec.rb Outdated Show resolved Hide resolved

justinseanmartin reviewed Mar 10, 2023

View reviewed changes

gyfelton added 4 commits March 10, 2023 15:14

Remove Terrapin

f575e9a

address comments

ea6e7e5

Fix specs

0dc8b43

fixup

59e8e14

gyfelton changed the title ~~Fix stuck at handling extra long stderr during clone or remote update~~ Replace Terrapin with BuildExecution to fix process deadlock when output to stderr is fills buffer Mar 10, 2023

gyfelton added 2 commits March 10, 2023 17:24

rubocop fix

d4d01c9

exclude runner_execution from rubocop

fbff44e

justinseanmartin approved these changes Mar 10, 2023

View reviewed changes

lib/git-fastclone.rb Outdated Show resolved Hide resolved

lib/git-fastclone.rb Outdated Show resolved Hide resolved

gyfelton added 5 commits March 10, 2023 17:42

address comments

8f552aa

Revert "exclude runner_execution from rubocop"

0e69db4

This reverts commit fbff44e.

Fix wrong logic in retriable_error

280d328

Include coverage for verbose mode in rspec

6937364

Disable rubocop for runner_execution since it is cargo culted

4e5cca5

gyfelton commented Mar 13, 2023

View reviewed changes

lib/git-fastclone.rb Show resolved Hide resolved

fixup

6033b7d

gyfelton merged commit 0ac13db into master Mar 13, 2023

gyfelton deleted the gyfelton/fix-read-stream-stuck branch March 13, 2023 16:34

gyfelton mentioned this pull request Aug 31, 2023

Replace all places that can call Dir.chdir in thread with passing chdir to Open3.popen2e #60

Merged

tripeasy approved these changes Jan 4, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Replace Terrapin with BuildExecution to fix process deadlock when output to stderr is fills buffer#53

Replace Terrapin with BuildExecution to fix process deadlock when output to stderr is fills buffer#53
gyfelton merged 20 commits intomasterfrom
gyfelton/fix-read-stream-stuck

gyfelton commented Mar 8, 2023 •

edited

Loading

Uh oh!

Uh oh!

justinseanmartin Mar 8, 2023

Uh oh!

gyfelton Mar 8, 2023

Uh oh!

Uh oh!

Uh oh!

gyfelton commented Mar 9, 2023 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

justinseanmartin left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

gyfelton commented Mar 8, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Why this change

Uh oh!

Uh oh!

justinseanmartin Mar 8, 2023

Choose a reason for hiding this comment

Uh oh!

gyfelton Mar 8, 2023

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

gyfelton commented Mar 9, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

justinseanmartin left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

gyfelton commented Mar 8, 2023 •

edited

Loading

gyfelton commented Mar 9, 2023 •

edited

Loading