You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: README.md
+30-9Lines changed: 30 additions & 9 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -221,13 +221,14 @@ The two output formats are tables of comma-separated values with a header.
221
221
| Start | Positive integer | Starting position of the feature (inclusive) |
222
222
| End | Positive integer | Ending position of the feature (inclusive) |
223
223
| Strand |`1` or `-1`| Whether the features is located on the positive (5'->3') or negative (3'->5') strand |
224
-
| CoveredSites | Positive integer | Number of sites in the feature that satisfy the minimum level of coverage |
225
-
| GenomeBases | Comma-separated positive integers | Frequencies of the bases in the feature in the reference genome (order: A, C, G, T) |
226
-
| SiteBasePairings | Comma-separated positive integers | Number of sites in which each genome-variant base pairings is found in the feature (order: AA, AC, AG, AT, CA, CC, CG, CT, GA, GC, GG, GT, TA, TC, TG, TT) |
227
-
| ReadBasePairings | Comma-separated positive integers | Frequencies of genome-variant base pairings in the feature (order: AA, AC, AG, AT, CA, CC, CG, CT, GA, GC, GG, GT, TA, TC, TG, TT) |
224
+
| TotalSites | Positive integer | Number of sites in the feature |
225
+
| ObservedBases | Comma-separated positive integers | Number and type of the bases in the feature in the reference genome (order: A, C, G, T) observed. The total of the 4 values corresponds to the total observed sites (reported by the editing tools e.g. Reditools3) |
226
+
| QualifiedBases | Comma-separated positive integers | Number and type of of the bases in the feature in the reference genome (order: A, C, G, T) that satisfy the minimum level of coverage and editing. The total of the 4 values corresponds to the total qualified sites (> cov) |
227
+
| SiteBasePairingsQualified| Comma-separated positive integers | Number of sites in which each genome-variant base pairings is found at reference level in the feature (order: AA, AC, AG, AT, CA, CC, CG, CT, GA, GC, GG, GT, TA, TC, TG, TT) that satisfy the minimum level of coverage and editing |
228
+
| ReadBasePairingsQualified | Comma-separated positive integers | Number of sites in which each genome-variant base pairings is found at reads level in the feature (order: AA, AC, AG, AT, CA, CC, CG, CT, GA, GC, GG, GT, TA, TC, TG, TT) that satisfy the minimum level of coverage and editing |
228
229
229
230
> [!note]
230
-
> The number of **CoveredSites** can be higher than the sum of **SiteBasePairings** because of the presence of ambiguous bases (e.g. N)
231
+
> The number of **QualifiedBases** can differ from sum of AA,CC,GG,TT from **SiteBasePairingsQualified** because we can have site 100% edited that will not fall into one of these categories.
231
232
232
233
An example of the feature output format is shown below, with some alterations to make the text line up in columns.
233
234
@@ -275,10 +276,11 @@ This hierarchical information is provided in the same manner in the aggregate fi
275
276
| ParentType | String | Type of the parent of the feature under which the aggregation was done|
276
277
| AggregateType | String | Type of the features that are aggregated |
277
278
| AggregationMode |`all_isoforms`, `longest_isoform`, `chimaera`, `feature` or `all-sites`| Way in which the aggregation was performed |
278
-
| CoveredSites | Positive integer | Number of sites in the aggregated features that satisfy the minimum level of coverage |
279
-
| GenomeBases | Comma-separated positive integers | Frequencies of the bases in the aggregated features in the reference genome (order: A, C, G, T) |
280
-
| SiteBasePairings | Comma-separated positive integers | Number of sites in which each genome-variant base pairings is found in the aggregated features (order: AA, AC, AG, AT, CA, CC, CG, CT, GA, GC, GG, GT, TA, TC, TG, TT) |
281
-
| ReadBasePairings | Comma-separated positive integers | Frequencies of genome-variant base pairings in the aggregated features (order: AA, AC, AG, AT, CA, CC, CG, CT, GA, GC, GG, GT, TA, TC, TG, TT) |
279
+
| TotalSites | Positive integer | Number of sites in the aggregated features |
280
+
| ObservedBases | Comma-separated positive integers | Number and type of the bases in the aggregated features in the reference genome (order: A, C, G, T) observed. The total of the 4 values corresponds to the total observed sites (reported by the editing tools e.g. Reditools3) ||
281
+
| QualifiedBases | Comma-separated positive integers | Number and type of of the bases in the aggregated features in the reference genome (order: A, C, G, T) that satisfy the minimum level of coverage and editing. The total of the 4 values corresponds to the total qualified sites (> cov) ||
282
+
| SiteBasePairingsQualifed | Comma-separated positive integers | Number of sites in which each genome-variant base pairings is found at reference level in the aggregated features (order: AA, AC, AG, AT, CA, CC, CG, CT, GA, GC, GG, GT, TA, TC, TG, TT) observed |
283
+
| ReadBasePairingsQualifed | Comma-separated positive integers | Number of sites in which each genome-variant base pairings is found at reads level in the aggregated features (order: AA, AC, AG, AT, CA, CC, CG, CT, GA, GC, GG, GT, TA, TC, TG, TT) that satisfy the minimum level of coverage and editing|
282
284
283
285
In the output of Pluviometer, **aggregation** is the sum of counts from several features of the same type at some feature level. For instance, exons can be aggregated at transcript level, gene level, chromosome level, and genome level.
Drip retains a line only if at least one metric value is neither NA nor zero (i.e., at least one edit has been detected somewhere). Lines containing only NA values, only 0.0 values, or a mix of both are removed by default.
363
+
347
364
</details>
348
365
349
366
@@ -355,3 +372,7 @@ Jacques Dainat (@Juke34)
355
372
## Contributing
356
373
357
374
Contributions from the community are welcome ! See the [Contributing guidelines](https://github.com/Juke34/rain/blob/main/CONTRIBUTING.md)
375
+
376
+
## TODO
377
+
378
+
update pluviometer to set NA for start end and strand instead of . to be able to use column as int64 in drip and barometer e.g. dtype={"SeqID": str, "Start": "Int64", "End": "Int64", "Strand": str}
Post-processing tool for pluviometer feature output. Analyzes RNA editing from feature TSV files, calculating editing metrics (espf and espr) for all 16 genome-variant base pair combinations across multiple samples. Combines data into unified matrix format.
Post-processing tool for pluviometer aggregate output. Similar to drip_features.py but operates on aggregate-level data, calculating editing metrics for aggregated genomic regions across samples.
0 commit comments