Skip to content

Pattern matching with Partial Suffix Arrays #83

@antonioaversa

Description

@antonioaversa

Implement text pattern matching with Partial Suffix Arrays, Burrows-Wheeler Transform and its sorted version.

Suffix Arrays are a space-efficient alternative to Suffix Trees, due to their Θ(n) with c=1 space used, when the size of an index can be considered constant w.r.t. the size of the input strings.

There are, however, cases where even the linear Space Complexity of a Suffix Array is considered too high, and a smaller data structure has to be stored, potentially at the cost of the algorithm runtime.

Text pattern matching with Suffix Arrays can also be done with partial structures: for example, only the elements at index i of the Suffix Array, such that i % k == 0, where k is a constant, can be stored; the other ones discarded or even not calculated.

Pattern matching with Partial Suffix Arrays can be implemented using the Last-First property of BWT and Sorted BWT. The property is used to iteratively identify the char preceding the found char, until a char at a position i is found, such that the i-th element of the Suffix Array is known (e.g. i % k == 0).

Metadata

Metadata

Assignees

Labels

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions