Skip to content

Neighbor annotation

eduard valera i zorita edited this page Sep 18, 2017 · 9 revisions

Neighbor annotation

Annotation data structure

Temporary data structure used in the annotation process

bytes 0-1: Exact number of different neighbors.
byte    2: Distance to closest neighbor.
bytes 3-M: bitfield of M bytes encoding the mutated positions to recompute the closest neighbors.

M is max(3,tau).

Special values:
bytes 0-1 set to 0xFFFF: "NO_INFO", search returned no matches within distance 'd'.
bytes 0-1 all set to 0: Take same information as the previous suffix (if any).

NO_INFO: the partial search of the k-mer did not produce any match but could have been matched by a subsequent query.

If the number of mutations is greater than M, the alignment information is not stored. When the computation has finished, the information is compressed in a single byte.

Compressed annotation data structure

bit 7 - Locus alignment info.
bit 6 - Locus alignment flag.
bit 5 *
bit 4 - Distance to closest neighbor (00: 1, 01: 2, 10: 3, 11: 4).
bit 3 *
bit 2 *
bit 1 * 
bit 0 - Neighbor count

Considerations:

  • If the locus has no neighbors (closest at distance > tau), Neighbor count is set to 0.
  • If Locus alignment flag is set, there is alignment information available for this k-mer.
  • The locus alignment info is the bitfield that contains information of mismatched positions. This bit corresponds to the 0-th position of the current locus, the 1st position of the previous locus, and so on.

IMPORTANT NOTE: Exact repeats are not considered in the annotation and must be manually checked by seeding during the mapping process.

How to read the neighbor count field:

Neighbor count:
  value          neighbors
  0x00..0x0A       0..10  
  0x0B            11..20
  0x0C            21..50
  0x0D            51..100
  0x0E           101..500
  0x0F              > 500