Skip to content

Increasingly slow reads due to seek in FileSystemLine #10

@clphillips

Description

@clphillips

I was seeing increasing slow reads in my fork when reading large files. I was attempting to read 5000 lines at a time from a 1.4 GB file, each line being 89 bytes. My results were:

read 5000 (1-5000) records in 2.89sec
read 5000 (5001-10000) records in 4.62sec
read 5000 (10001-15000) records in 8.73sec
read 5000 (15001-20000) records in 18.44sec

Turns out, issuing the fseek() for each line read is wicked slow, and only increases as we get further down the file. This is because the call to fseek issues an offset from the beginning of the file, forcing the file pointer to iterator over the file from the beginning every time, causing the linear degradation I show above.

When I allowed the PHP to perform sequential reads, instead (see my fork) here were my results:

read 5000 (1-5000) records in 0.92sec
read 5000 (5001-10000) records in 0.92sec
read 5000 (10001-15000) records in 0.93sec
read 5000 (15001-20000) records in 0.92sec

So my suggestion is to modify FileSystemLine to allow sequential reading on fileObject. I did this by allowing the requested line number to be null, so it can still support reading a specific line, but 99.9999% of the time, you'll probably just want to iterate through lines in order so all that seeking is a complete waste.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions