WIP: Fix incomplete Minio downloads when downloading many files concurrently#697
WIP: Fix incomplete Minio downloads when downloading many files concurrently#697kyungeonchoi wants to merge 17 commits intomasterfrom
Conversation
for more information, see https://pre-commit.ci
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #697 +/- ##
==========================================
- Coverage 98.27% 93.92% -4.35%
==========================================
Files 29 27 -2
Lines 2085 2074 -11
==========================================
- Hits 2049 1948 -101
- Misses 36 126 +90
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
| # Now just wait until all of our tasks complete | ||
| await asyncio.gather(*download_tasks) | ||
| MAX_INFLIGHT = 100 | ||
| if len(download_tasks) >= MAX_INFLIGHT: |
There was a problem hiding this comment.
don't we already control this via sempahore?
There was a problem hiding this comment.
Number of downloads is controlled by a semaphore in the minio_adapter.py, but these new lines control the number of concurrent asyncio tasks . Maybe it's okay to await for thousands of tasks but only downloading small number of tasks. At least this doesn't fix the problem.
| localsize = path.stat().st_size | ||
|
|
||
| # Ensure filesystem flush visibility | ||
| await asyncio.sleep(0.05) |
There was a problem hiding this comment.
this is a magic number and might not work in general. Would it be enough to download to the .part file, rename, then stat the new file?
|
Here is an example size difference which leads to a download failure |
Story
MinioAdapterclass (e.g.transferconfigfor aioboto3, concurrency setting (_file_transfer_sem) in thedownload_filefunction) but nothing fixes the issue.servicex transforms download <request-id>works fine.Updates