Skip to content

feat: add GroupWorkPool for managing global bitswap concurrency#406

Merged
rvagg merged 5 commits intomainfrom
rvagg/bitswap-global-concurrency
Sep 8, 2023
Merged

feat: add GroupWorkPool for managing global bitswap concurrency#406
rvagg merged 5 commits intomainfrom
rvagg/bitswap-global-concurrency

Conversation

@rvagg
Copy link
Member

@rvagg rvagg commented Sep 4, 2023

  • Removes the per-retrieval preload concurrency handling
  • Adds a per-Lassie concurrency pool which has a maximum number of parallel workers and a maximum number of workers per "group"; each retrieval is a "group"
  • New default total bitswap concurrency is 32
  • New default per-retrieval concurrency is 12

@rvagg rvagg requested a review from hannahhoward September 4, 2023 05:48
@rvagg rvagg force-pushed the rvagg/bitswap-global-concurrency branch 5 times, most recently from d22123e to 48abb35 Compare September 4, 2023 05:54
@codecov-commenter
Copy link

codecov-commenter commented Sep 4, 2023

Codecov Report

Merging #406 (153e9b8) into main (36ebbef) will increase coverage by 0.59%.
The diff coverage is 100.00%.

Additional details and impacted files

Impacted file tree graph

@@            Coverage Diff             @@
##             main     #406      +/-   ##
==========================================
+ Coverage   76.05%   76.65%   +0.59%     
==========================================
  Files          84       85       +1     
  Lines        6228     6275      +47     
==========================================
+ Hits         4737     4810      +73     
+ Misses       1242     1224      -18     
+ Partials      249      241       -8     
Files Changed Coverage Δ
cmd/lassie/daemon.go 56.79% <ø> (ø)
cmd/lassie/flags.go 54.34% <ø> (ø)
cmd/lassie/main.go 47.66% <100.00%> (+3.10%) ⬆️
pkg/lassie/lassie.go 76.15% <100.00%> (+2.20%) ⬆️
...ever/bitswaphelpers/groupworkpool/groupworkpool.go 100.00% <100.00%> (ø)
.../retriever/bitswaphelpers/preloadcachingstorage.go 71.02% <100.00%> (-1.71%) ⬇️
pkg/retriever/bitswapretriever.go 95.87% <100.00%> (-0.15%) ⬇️

... and 8 files with indirect coverage changes

📢 Have feedback on the report? [Share it here](https://about.codecov.io/codecov-pr-comment-feedback/?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=filecoin-project).

@rvagg rvagg force-pushed the rvagg/bitswap-global-concurrency branch from 48abb35 to 636fff5 Compare September 4, 2023 06:05
@rvagg
Copy link
Member Author

rvagg commented Sep 4, 2023

Open to suggestions on defaults .. 32 max and 12 per retrieval is a bit arbitrary. Apparently Kubo maintains a 32 global parallelism somewhere; 12 is .. double the current 6, but 32 is not a multiple of 12 but 🤷.

32 & 8, 48 & 12, 32 & 32?

@rvagg
Copy link
Member Author

rvagg commented Sep 4, 2023

got flaky failures on windows:

=== RUN   TestHttpFetch/bitswap_block_timeout_from_missing_block
    http_fetch_test.go:1019: random seed: 1693807954030586700
    http_fetch_test.go:1099: Fetching http://127.0.0.1:61300/ipfs/bafybeidw7elt4j2lxwsuxlyk5jqh6lw6t3zooj3m4desdw6y52luvzqnxe
    http_fetch_test.go:1213: 
        	Error Trace:	D:/a/lassie/lassie/pkg/internal/itest/http_fetch_test.go:1213
        	            				D:/a/lassie/lassie/pkg/internal/itest/http_fetch_test.go:250
        	            				D:/a/lassie/lassie/pkg/internal/itest/http_fetch_test.go:1177
        	Error:      	Received unexpected error:
        	            	EOF
        	Test:       	TestHttpFetch/bitswap_block_timeout_from_missing_block
=== RUN   TestHttpFetch/same_content,_http_missing_block,_bitswap_completes
    http_fetch_test.go:[1019](https://github.com/filecoin-project/lassie/actions/runs/6069733198/job/16464569544?pr=406#step:14:1021): random seed: 1693807955546617800
    http_fetch_test.go:1099: Fetching http://127.0.0.1:61302/ipfs/bafybeihbhsbz22zhgsnqepgx6leb2syrbmyahqn2zm3xlae7w7ymq4rroa
    http_fetch_test.go:1159: 
        	Error Trace:	D:/a/lassie/lassie/pkg/internal/itest/http_fetch_test.go:1159
        	Error:      	Received unexpected error:
        	            	http: unexpected EOF reading trailer
        	Test:       	TestHttpFetch/same_content,_http_missing_block,_bitswap_completes

I could suspect the first one is because of #398 but the second one should have completed. Need to investigate that I think.

@rvagg rvagg force-pushed the rvagg/bitswap-global-concurrency branch 2 times, most recently from 6d2793c to 645bf1d Compare September 5, 2023 06:55
@rvagg
Copy link
Member Author

rvagg commented Sep 5, 2023

Experimenting with pulling in fix attempts from #398 into here, btw

@rvagg
Copy link
Member Author

rvagg commented Sep 5, 2023

Relevant failure to investigate: https://github.com/filecoin-project/lassie/actions/runs/6081427236/job/16497917067?pr=406

=== NAME  TestBitswapRetriever/with_preloader/successful_full_remote_fetch
    bitswapretriever_test.go:462: 
        	Error Trace:	D:/a/lassie/lassie/pkg/retriever/bitswapretriever_test.go:462
        	Error:      	did not receive result
        	Test:       	TestBitswapRetriever/with_preloader/successful_full_remote_fetch

@rvagg rvagg force-pushed the rvagg/bitswap-global-concurrency branch from 8484bad to 106a4b9 Compare September 5, 2023 11:29
@rvagg
Copy link
Member Author

rvagg commented Sep 5, 2023

At this stage, with the current form of #398 included, I'm willing to believe that flakies that are bitswap focused here are going to be the fault of the bitswap client itself. One expected difference here is that we're hitting the client with a little bit more parallelism than we currently are on main due to the increases.
After many CI re-runs and I'm not seeing much that's concerning.

@rvagg rvagg force-pushed the rvagg/trustless-utils branch 2 times, most recently from 78dc756 to cf96aa6 Compare September 8, 2023 05:05
Base automatically changed from rvagg/trustless-utils to main September 8, 2023 05:36
* new total bitswap concurrency is 32
* new per-retrieval concurrency is 12
* `daemon` has both `--bitswap-concurrency` and `--bitswap-concurrency-per-retrieval`
* `fetch` just has `--bitswap-concurrency` that sets both values to be the same
@rvagg rvagg force-pushed the rvagg/bitswap-global-concurrency branch from 89ce40f to 2c6819b Compare September 8, 2023 05:39
@rvagg
Copy link
Member Author

rvagg commented Sep 8, 2023

Argh! failures from bitswap flakies that I thought we'd addressed in #398

  • TestHttpFetch/bitswap_block_timeout_from_missing_block
  • TestHttpFetch/same_content,_http_missing_block,_bitswap_completes

@rvagg rvagg force-pushed the rvagg/bitswap-global-concurrency branch 5 times, most recently from bb4c0f3 to 2cea5ef Compare September 8, 2023 06:39
@rvagg rvagg force-pushed the rvagg/bitswap-global-concurrency branch from 2e5541c to 153e9b8 Compare September 8, 2023 07:24
@rvagg
Copy link
Member Author

rvagg commented Sep 8, 2023

Those 2 flakies have been fixed https://github.com/filecoin-project/lassie/compare/2c6819b8e32258bc02ef6afb4399173ce2205614..68b96aa49ee3f18a926e93e6842c7ce4ed8dd65b

turns out it was about provider timeouts, 500ms is too short on slower windows machines, bitswap doesn't quite get set up in time. So I've extended the timeout on one and removed it on another.

@rvagg rvagg merged commit a6efec9 into main Sep 8, 2023
@rvagg rvagg deleted the rvagg/bitswap-global-concurrency branch September 8, 2023 07:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants