-
Notifications
You must be signed in to change notification settings - Fork 26
Description
I've been working a bit with sparse files. I might be misunderstanding exactly how things work, but my mental model is that things in most filesystems are either "fully occupying the storage" or "sparse files" (long runs of zeros within a file, which don't actually occupy any storage).
I'm using this tool to evaluate the performance of storing compressed disk images. As such, it'd be helpful to see the performance of the compressed chunks vs. the parts that are stored as sparse files.
Are sparse files actually just heavily compressed blocks in btrfs?
Testing
Here's a hacky python script to generate 200 MiB of random data, 300MiB of sparse file, then another 200MiB of different random data. Thus, 700 MiB of total file.
import os
MB = 1024 * 1024 # FIXME: Rename to MiB.
OUTPUT_FILE = "test_data.bin"
def write_random(f, size):
chunk = 4 * MB # write in 4 MB chunks
remaining = size
while remaining > 0:
to_write = min(chunk, remaining)
f.write(os.urandom(to_write))
remaining -= to_write
with open(OUTPUT_FILE, "wb") as f:
# 1) First 200 MB random data
write_random(f, 200 * MB)
# 2) 300 MB sparse region (seek forward, no write)
f.seek(300 * MB, os.SEEK_CUR)
# 3) Another 200 MB of random data (different random bytes)
write_random(f, 200 * MB)
print("File created:", OUTPUT_FILE)The result currently is as follows:
compsize test_data.bin
Processed 1 file, 803 regular extents (803 refs), 0 inline.
Type Perc Disk Usage Uncompressed Referenced
TOTAL 100% 400M 400M 400M
none 100% 400M 400M 400M
du -ah --apparent-size
700M ./test_data.bin
ll
# Shows 700 MiB.
Conclusion: Currently, the 300MB of sparse file in the middle aren't shown in "Disk Usage" NOR in "Uncompressed". This isn't thaaaat bad, but it does lead to some confusing missing data when reviewing, for example, a virtual machine image.