Skip to content

Output format of per_read.txt.gz #28

@nepena

Description

@nepena

Hello,

Thank you for this lightweight tool for STR detection! I have a documentation question about the format of the per_read.txt.gz output following superSTR processing.

$ zcat per_read.txt.gz | tail @LH00469:269:22NMTNLT3:8:2298:51475:29439_CCTGCCAAAGTTGCTG_NNNNNNNN AG:11:2:134:144:0.000000 @LH00469:269:22NMTNLT3:8:2298:29006:29536_CCTGCCAAAGTTGCTG_GATCGGAA ACC:13:3:62:74:0.000000 @LH00469:269:22NMTNLT3:8:2298:27231:29568_CCTGCCAAAGTTGCTG_NNNNNNNN ATC:12:3:124:135:0.000000 @LH00469:269:22NMTNLT3:8:2298:7324:29584_CCTGCCAAAGTTGCTG_NNNNNNNN AAAAC:15:5:30:44:0.000000 @LH00469:269:22NMTNLT3:8:2298:4005:29664_CCTGCCAAAGTTGCTG_NNNNNNNN AACAGC:35:6:69:103:0.000000 @LH00469:269:22NMTNLT3:8:2298:17319:29664_CCTGCCAAAGTTGCTG_NNNNNNNN AAAAAC:23:6:22:44:0.000000 @LH00469:269:22NMTNLT3:8:2298:7223:29696_CCTGCCAAAGTTGCTG_NNNNNNNN AAAAG:15:5:95:109:0.000000 @LH00469:269:22NMTNLT3:8:2298:9516:29696_CCTGCCAAAGTTGCTG_NNNNNNNN AAGC:13:4:46:58:0.000000 @LH00469:269:22NMTNLT3:8:2298:49727:29712_CCTGCCAAAGTTGCTG_GATTCGCG CCG:14:3:132:145:0.000000 Total 37884010

As I understand, each line is for reading that contains a motif. With the right-hand tab corresponding to the read and then the next column corresponds to a string with information about the motif identifed. My question is, what does each value separated by the ":" correspond to?

My current interpretation from reading the multiparse.py is that there needs to be five values in this column organized as:
0:1:2:3:4:5

0 = Motif (ex. AG) ?
1 = Position of motif in read ?
2 = length of motif ? (ex. 2 mer)
3 = read length ?
4 = max read length?
5 = information score ?

I understand if this is more intuitive from the multiparse.py, but I just wanted to take a moment to understand correctly.

Thank you for taking the time to read this message.

Best,
Noah

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions