Fast Scan Resume

The current fault tolerance achieves it's goal but if it resumes a very large scan it will spend a period of time hashing documents and determining that it has already seen them. We would like to provide a configuration option (via a method on the builder for the scanner) to skip this and pick up where we left off without wasting as much CPU. 

One possible route for this is to log the scanned id's after we've reported status for the initial document, and then load that log of id's into a Trie structure that can be used to check the id's directly without hashing document contents. (Hashing still remains and is required for subsequent scan). Completing a scan should clear out the log preventing this Trie from being built if the previous scan completed successfully.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Fast Scan Resume #183

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Uh oh!

Fast Scan Resume #183

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions