Prevent .tar file corruption by patching short reads#261
Prevent .tar file corruption by patching short reads#261joost-j wants to merge 5 commits intofox-it:mainfrom
Conversation
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #261 +/- ##
==========================================
+ Coverage 44.75% 45.17% +0.41%
==========================================
Files 26 26
Lines 3546 3582 +36
==========================================
+ Hits 1587 1618 +31
- Misses 1959 1964 +5
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
57a9c5b to
5b22f2c
Compare
acquire/outputs/tar.py
Outdated
| for _ in range(blocks): | ||
| # Prevents "long reads" because it reads at max bufsize bytes at a time | ||
| buf = fh.read(bufsize) | ||
| if len(buf) < bufsize: |
There was a problem hiding this comment.
I think you can generalize this case instead of doing it twice. Keep track of how many bytes you actually wrote (i.e. using .tell() and only pad once.
There was a problem hiding this comment.
Don't know exactly what you mean by this, since it looks like we need to calculate the remainder for two different sizes (bufsize vs tarfile.BLOCKSIZE). But please do change / refactor the code if you can write this more concise.
acquire/outputs/tar.py
Outdated
|
|
||
| info = copy.copy(info) | ||
|
|
||
| buf = info.tobuf(self.tar.format, self.tar.encoding, self.tar.errors) |
There was a problem hiding this comment.
You could make this even safer by truncating to the previous offset/tar member end if any exception occurs while writing.
There was a problem hiding this comment.
Done, added a try/except around it to restore back to the original offset, plus a test case.
|
Any idea when this is getting fixed? It's affecting me as well. Let me know if there's anything I can do to help! |
00a211c to
e990898
Compare
Co-authored-by: Erik Schamper <1254028+Schamper@users.noreply.github.com>
e990898 to
186a3e6
Compare
Fixes an issue where a
.taroutput file would contain inconsistencies with regards to expected and actual file size of the included files.In some cases, a file on disk can report a size of X bytes, but at the time of actually reading X bytes from the file, less than X bytes are actually available in the file (a short read). Acquire would report these issues as
OSErrorin the resulting Acquisition log file, because the Python stdlibtarfile.pyhandles it that way. Data may however already be written to the destination archive at that point.Afterwards, Acquire continues to add new files to the archive. When trying to untar the file using
tar -xvf <FILE>this would show as atar: Skipping to next headererror and finally, the process exists with a nonzero exit code.Included a test case which simulates a file that actually returns less bytes than its reported size, to test this case.