Skip to content

Feature Request: Add a line item for "sparse" #55

@DeflateAwning

Description

@DeflateAwning

I've been working a bit with sparse files. I might be misunderstanding exactly how things work, but my mental model is that things in most filesystems are either "fully occupying the storage" or "sparse files" (long runs of zeros within a file, which don't actually occupy any storage).

I'm using this tool to evaluate the performance of storing compressed disk images. As such, it'd be helpful to see the performance of the compressed chunks vs. the parts that are stored as sparse files.

Are sparse files actually just heavily compressed blocks in btrfs?

Testing

Here's a hacky python script to generate 200 MiB of random data, 300MiB of sparse file, then another 200MiB of different random data. Thus, 700 MiB of total file.

import os

MB = 1024 * 1024  # FIXME: Rename to MiB.
OUTPUT_FILE = "test_data.bin"

def write_random(f, size):
    chunk = 4 * MB  # write in 4 MB chunks
    remaining = size
    while remaining > 0:
        to_write = min(chunk, remaining)
        f.write(os.urandom(to_write))
        remaining -= to_write

with open(OUTPUT_FILE, "wb") as f:
    # 1) First 200 MB random data
    write_random(f, 200 * MB)

    # 2) 300 MB sparse region (seek forward, no write)
    f.seek(300 * MB, os.SEEK_CUR)

    # 3) Another 200 MB of random data (different random bytes)
    write_random(f, 200 * MB)

print("File created:", OUTPUT_FILE)

The result currently is as follows:

compsize test_data.bin  
Processed 1 file, 803 regular extents (803 refs), 0 inline.
Type       Perc     Disk Usage   Uncompressed Referenced  
TOTAL      100%      400M         400M         400M       
none       100%      400M         400M         400M       

du -ah --apparent-size 
700M    ./test_data.bin

ll
# Shows 700 MiB.

Conclusion: Currently, the 300MB of sparse file in the middle aren't shown in "Disk Usage" NOR in "Uncompressed". This isn't thaaaat bad, but it does lead to some confusing missing data when reviewing, for example, a virtual machine image.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions