Hello,
Thank you for this lightweight tool for STR detection! I have a documentation question about the format of the per_read.txt.gz output following superSTR processing.
$ zcat per_read.txt.gz | tail @LH00469:269:22NMTNLT3:8:2298:51475:29439_CCTGCCAAAGTTGCTG_NNNNNNNN AG:11:2:134:144:0.000000 @LH00469:269:22NMTNLT3:8:2298:29006:29536_CCTGCCAAAGTTGCTG_GATCGGAA ACC:13:3:62:74:0.000000 @LH00469:269:22NMTNLT3:8:2298:27231:29568_CCTGCCAAAGTTGCTG_NNNNNNNN ATC:12:3:124:135:0.000000 @LH00469:269:22NMTNLT3:8:2298:7324:29584_CCTGCCAAAGTTGCTG_NNNNNNNN AAAAC:15:5:30:44:0.000000 @LH00469:269:22NMTNLT3:8:2298:4005:29664_CCTGCCAAAGTTGCTG_NNNNNNNN AACAGC:35:6:69:103:0.000000 @LH00469:269:22NMTNLT3:8:2298:17319:29664_CCTGCCAAAGTTGCTG_NNNNNNNN AAAAAC:23:6:22:44:0.000000 @LH00469:269:22NMTNLT3:8:2298:7223:29696_CCTGCCAAAGTTGCTG_NNNNNNNN AAAAG:15:5:95:109:0.000000 @LH00469:269:22NMTNLT3:8:2298:9516:29696_CCTGCCAAAGTTGCTG_NNNNNNNN AAGC:13:4:46:58:0.000000 @LH00469:269:22NMTNLT3:8:2298:49727:29712_CCTGCCAAAGTTGCTG_GATTCGCG CCG:14:3:132:145:0.000000 Total 37884010
As I understand, each line is for reading that contains a motif. With the right-hand tab corresponding to the read and then the next column corresponds to a string with information about the motif identifed. My question is, what does each value separated by the ":" correspond to?
My current interpretation from reading the multiparse.py is that there needs to be five values in this column organized as:
0:1:2:3:4:5
0 = Motif (ex. AG) ?
1 = Position of motif in read ?
2 = length of motif ? (ex. 2 mer)
3 = read length ?
4 = max read length?
5 = information score ?
I understand if this is more intuitive from the multiparse.py, but I just wanted to take a moment to understand correctly.
Thank you for taking the time to read this message.
Best,
Noah
Hello,
Thank you for this lightweight tool for STR detection! I have a documentation question about the format of the per_read.txt.gz output following superSTR processing.
$ zcat per_read.txt.gz | tail @LH00469:269:22NMTNLT3:8:2298:51475:29439_CCTGCCAAAGTTGCTG_NNNNNNNN AG:11:2:134:144:0.000000 @LH00469:269:22NMTNLT3:8:2298:29006:29536_CCTGCCAAAGTTGCTG_GATCGGAA ACC:13:3:62:74:0.000000 @LH00469:269:22NMTNLT3:8:2298:27231:29568_CCTGCCAAAGTTGCTG_NNNNNNNN ATC:12:3:124:135:0.000000 @LH00469:269:22NMTNLT3:8:2298:7324:29584_CCTGCCAAAGTTGCTG_NNNNNNNN AAAAC:15:5:30:44:0.000000 @LH00469:269:22NMTNLT3:8:2298:4005:29664_CCTGCCAAAGTTGCTG_NNNNNNNN AACAGC:35:6:69:103:0.000000 @LH00469:269:22NMTNLT3:8:2298:17319:29664_CCTGCCAAAGTTGCTG_NNNNNNNN AAAAAC:23:6:22:44:0.000000 @LH00469:269:22NMTNLT3:8:2298:7223:29696_CCTGCCAAAGTTGCTG_NNNNNNNN AAAAG:15:5:95:109:0.000000 @LH00469:269:22NMTNLT3:8:2298:9516:29696_CCTGCCAAAGTTGCTG_NNNNNNNN AAGC:13:4:46:58:0.000000 @LH00469:269:22NMTNLT3:8:2298:49727:29712_CCTGCCAAAGTTGCTG_GATTCGCG CCG:14:3:132:145:0.000000 Total 37884010As I understand, each line is for reading that contains a motif. With the right-hand tab corresponding to the read and then the next column corresponds to a string with information about the motif identifed. My question is, what does each value separated by the ":" correspond to?
My current interpretation from reading the multiparse.py is that there needs to be five values in this column organized as:
0:1:2:3:4:5
0 = Motif (ex. AG) ?
1 = Position of motif in read ?
2 = length of motif ? (ex. 2 mer)
3 = read length ?
4 = max read length?
5 = information score ?
I understand if this is more intuitive from the multiparse.py, but I just wanted to take a moment to understand correctly.
Thank you for taking the time to read this message.
Best,
Noah