Output format of per_read.txt.gz

Hello, 

Thank you for this lightweight tool for STR detection! I have a documentation question about the format of the per_read.txt.gz output following superSTR processing. 

`$ zcat per_read.txt.gz | tail
@LH00469:269:22NMTNLT3:8:2298:51475:29439_CCTGCCAAAGTTGCTG_NNNNNNNN	AG:11:2:134:144:0.000000
@LH00469:269:22NMTNLT3:8:2298:29006:29536_CCTGCCAAAGTTGCTG_GATCGGAA	ACC:13:3:62:74:0.000000
@LH00469:269:22NMTNLT3:8:2298:27231:29568_CCTGCCAAAGTTGCTG_NNNNNNNN	ATC:12:3:124:135:0.000000
@LH00469:269:22NMTNLT3:8:2298:7324:29584_CCTGCCAAAGTTGCTG_NNNNNNNN	AAAAC:15:5:30:44:0.000000
@LH00469:269:22NMTNLT3:8:2298:4005:29664_CCTGCCAAAGTTGCTG_NNNNNNNN	AACAGC:35:6:69:103:0.000000
@LH00469:269:22NMTNLT3:8:2298:17319:29664_CCTGCCAAAGTTGCTG_NNNNNNNN	AAAAAC:23:6:22:44:0.000000
@LH00469:269:22NMTNLT3:8:2298:7223:29696_CCTGCCAAAGTTGCTG_NNNNNNNN	AAAAG:15:5:95:109:0.000000
@LH00469:269:22NMTNLT3:8:2298:9516:29696_CCTGCCAAAGTTGCTG_NNNNNNNN	AAGC:13:4:46:58:0.000000
@LH00469:269:22NMTNLT3:8:2298:49727:29712_CCTGCCAAAGTTGCTG_GATTCGCG	CCG:14:3:132:145:0.000000
Total 37884010`


As I understand, each line is for reading that contains a motif. With the right-hand tab corresponding to the read and then the next column corresponds to a string with information about the motif identifed. My question is, what does each value separated by the ":" correspond to? 

My current interpretation from reading the multiparse.py is that there needs to be five values in this column organized as: 
0:1:2:3:4:5

0 = Motif (ex. AG) ?
1 = Position of motif in read ? 
2 = length of motif ? (ex. 2 mer) 
3 = read length ?
4 = max read length? 
5 = information score ?

I understand if this is more intuitive from the multiparse.py, but I just wanted to take a moment to understand correctly.

Thank you for taking the time to read this message. 

Best,
Noah 




Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Output format of per_read.txt.gz #28

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Output format of per_read.txt.gz #28

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions