Skip to content

Use non-blocking IO for non-wildcard norm queries #14

@RunDevelopment

Description

@RunDevelopment

Some innocent queries can generate thousands of non-wildcard norm queries.

E.g. #I #love #humans. #I is equivalent to 10 words, #love to 36, and #humans to 11 for a total of 3960 non-wildcard norm queries.

Each of those non-wildcard norm queries requires a BigHashMap lookup, so 1 disk access. The problem is that all of those are done sequentially. This isn't very fast even with IO caches.

The solution would be to use AIO (posix) to do these IO ops concurrently. We already do that with the phrase corpus.


Just to illustrate how slow sequential access is: We have to do between 100 and 1000 random accesses per regex query. A typical HDD spins with about 100 revolutions per second. So one (uncached) regex query will need 50 to 500 revolutions or 0.5s to 5s just for IO access. Doing everything asynchronously could reduce this to just a few revolutions for a speedup of at least 10x.

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions