Merge pull request #32 from Juke34/syncaline

eascarrunz · web-flow · commit daade524389e · 2025-06-19T16:25:05.000+02:00
Syncaline
diff --git a/CONTRIBUTING.md b/CONTRIBUTING.md
@@ -0,0 +1,89 @@
+# Contributing guidelines
+
+We thank you in advance :thumbsup: :tada: for taking the time to contribute, whether with *code* or with *ideas*, to the project.
+
+
+## Did you find a bug?
+
+* Ensure that the bug was not already reported by [searching under Issues].
+
+* If you're unable to find an (open) issue addressing the problem, [open a new one]. Be sure to prefix the issue title with **[BUG]** and to include:
+
+  - a *clear* description,
+  - as much relevant information as possible, and
+  - a *code sample* or an (executable) *test case* demonstrating the expected behaviour that is not occurring.
+
+## How to work on a new feature/bug
+
+Create an issue on Github or you can alternatively pick one already created.
+
+Assign yourself to that issue.
+
+Discussions on how to proceed about that issue take place in the comment section on that issue.
+
+Some of the work might have been done already by somebody, hence we avoid unnecessary work duplication and a waste of time and effort. Other reason for discussing the issue beforehand is to communicate with the team the changes as some of the features might impact different components, and we can plan accordingly.
+
+## How we work with Git
+
+All work should take place in a dedicated branch with a short descriptive name.
+
+Use comments in your code, choose variable and function names that clearly show what you intend to implement.
+
+Once the feature is done you can request it to be merged back into `main` by making a Pull Request.
+
+Before making the pull request it is a good idea to rebase your branch to `main` to ensure that eventual conflicts with the `main` branch is solved before the PR is reviewed and we can therefore have a clean merge.
+
+
+### General stuff about git and commit messages
+
+In general it is better to commit often. Small commits are easier to roll back and also makes the code easier to review.
+
+Write helpful commit messages that describes the changes and possibly why they were necessary.
+
+Each commit should contain changes that are functionally connected and/or related. If you for example want to write _and_ in the first line of the commit message this is an indicator that it should have been two commits.
+
+Learn how to select chunks of changed files to do multiple separate commits of unrelated things. This can be done with either `git add -p` or `git commit -p`.
+
+
+#### Helpful commit messages
+
+The commit messages may be seen as meta-comments on the code that are incredibly helpful for anyone who wants to know how this piece of software is working, including colleagues (current and future) and external users.
+
+Some tips about writing helpful commit messages:
+
+ 1. Separate subject (the first line of the message) from body with a blank line.
+ 2. Limit the subject line to 50 characters.
+ 3. Capitalize the subject line.
+ 4. Do not end the subject line with a period.
+ 5. Use the imperative mood in the subject line.
+ 6. Wrap the body at 72 characters.
+ 7. Use the body to explain what and why vs. how.
+
+For an in-depth explanation of the above points, please see [How to Write a Git Commit Message](http://chris.beams.io/posts/git-commit/).
+
+
+### How we do code reviews
+
+A code review is initiated when someone has made a Pull Request in the appropriate repo on github.
+
+Work should not continue on the branch _unless_ it is a [Draft Pull Request](https://github.blog/2019-02-14-introducing-draft-pull-requests/). Once the PR is marked ready the review can start.
+
+The initiator of the PR should recruit a reviewer that get assigned reviewer duty on the branch.
+
+Other people may also look at and review the code.
+
+A reviewers job is to:
+
+  * Write polite and friendly comments - remember that it can be tough to have other people critizising your work, a little kindness goes a long way. This does not mean we should not comment on things that need to be changed of course.
+  * Read the code and make sure it is understandable
+  * Make sure that commit messages and commits are structured so that it is possible to understand why certain changes were made.
+
+Once the review is positive the Pull Request can be _merged_ into `main` and the feature branch deleted.
+
+
+----
+
+Thanks again.
+
+[searching under Issues]: https://github.com/Juke34/RAIN/issues?utf8=%E2%9C%93&q=is%3Aissue%20label%3Abug%20%5BBUG%5D%20in%3Atitle
+[open a new one]: https://github.com/Juke34/RAIN/issues/new?title=%5BBUG%5D
diff --git a/README.md b/README.md
@@ -0,0 +1,205 @@
+![GitHub CI](https://github.com/Juke34/RAIN/actions/workflows/main.yml/badge.svg)
+
+# RAIN - RNA Alterations Investigation using Nextflow
+
+RAIN is a Nextflow workflow designed for epitranscriptomic analyses, enabling the detection of RNA modifications in comparison to a reference genome.
+Its primary goal is to distinguish true RNA editing events from genomic variants such as SNPs, with a particular emphasis on identifying A-to-I (Adenosine-to-Inosine) editing.
+
+<img src="doc/img/IRD.png" width="300" height="100" /> <img src="doc/img/MIVEGEC.png" width="150" height="100" />
+
+<img src="doc/img/baargin_flowchart.jpg" width="900" height="500" />
+
+## Table of Contents
+
+   * [Foreword](#foreword)
+   * [Flowchart](#flowchart)
+   * [Installation](#installation)
+      * [Nextflow](#nextflow)
+      * [Container platform](#container-platform)
+        * [Docker](#docker)
+        * [Singularity](#singularity)  
+   * [Usage and test](#usage)
+   * [Parameters](#parameters)
+   * [Output](#output)
+   * [Author](#author-and-contributors)
+   * [Contributing](#contributing)
+
+
+## Foreword
+
+...
+
+## Flowchart
+
+...
+
+## Installation
+
+The prerequisites to run the pipeline are:  
+
+  * [Nextflow](https://www.nextflow.io/)  >= 22.04.0
+  * [Docker](https://www.docker.com) or [Singularity](https://sylabs.io/singularity/)  
+
+### Nextflow 
+
+  * Via conda 
+
+    <details>
+      <summary>See here</summary>
+      
+      ```bash
+      conda create -n nextflow
+      conda activate nextflow
+      conda install bioconda::nextflow
+      ```  
+    </details>
+
+  * Manually
+    <details>
+      <summary>See here</summary>
+      Nextflow runs on most POSIX systems (Linux, macOS, etc) and can typically be installed by running these commands:
+
+      ```bash
+      # Make sure 11 or later is installed on your computer by using the command:
+      java -version
+      
+      # Install Nextflow by entering this command in your terminal(it creates a file nextflow in the current dir):
+      curl -s https://get.nextflow.io | bash 
+      
+      # Add Nextflow binary to your user's PATH:
+      mv nextflow ~/bin/
+      # OR system-wide installation:
+      # sudo mv nextflow /usr/local/bin
+      ```
+    </details>
+
+### Container platform
+
+To run the workflow you will need a container platform: docker or singularity.
+
+### Docker
+
+Please follow the instructions at the [Docker website](https://docs.docker.com/desktop/)
+
+### Singularity
+
+Please follow the instructions at the [Singularity website](https://docs.sylabs.io/guides/latest/admin-guide/installation.html)
+
+## Usage
+
+### Help
+
+You can first check the available options and parameters by running:
+
+```bash
+nextflow run Juke34/RAIN -r v1.5.0 --help
+```
+
+### Profile
+
+To run the workflow you must select a profile according to the container platform you want to use:   
+- `singularity`, a profile using Singularity to run the containers
+- `docker`, a profile using Docker to run the containers
+
+The command will look like that: 
+
+```bash
+nextflow run Juke34/RAIN -r vX.X.X -profile docker <rest of paramaters>
+```
+
+Another profile is available (/!\\ actually not yet implemented):
+
+- `slurm`, to add if your system has a slurm executor (local by default) 
+
+The use of the `slurm` profile  will give a command like this one:
+
+```bash
+nextflow run Juke34/RAIN -r vX.X.X -profile singularity,slurm <rest of paramaters>
+```
+
+### Test
+
+With nextflow and docker available you can run (where vX.X.X is the release version you wish to use):
+
+```bash
+nextflow run -profile docker,test Juke34/RAIN -r vX.X.X
+```
+
+Or via a clone of the repository: 
+
+```
+git clone https://github.com/Juke34/rain.git
+cd rain
+nextflow run -profile docker,test rain.nf
+```
+
+## Parameters
+
+```
+RAIN - RNA Alterations Investigation using Nextflow - v0.1
+
+        Usage example:
+    nextflow run rain.nf -profile docker --genome /path/to/genome.fa --annotation /path/to/annotation.gff3 --reads /path/to/reads_folder --output /path/to/output --aligner hisat2
+
+        Parameters:
+    --help                      Prints the help section
+
+        Input sequences:
+    --annotation                Path to the annotation file (GFF or GTF)
+    --reads                     path to the reads file, folder or csv. If a folder is provided, all the files with proper extension in the folder will be used. You can provide remote files (commma separated list).
+                                    file extension expected : <.fastq.gz>, <.fq.gz>, <.fastq>, <.fq> or <.bam>. 
+                                                              for paired reads extra <_R1_001> or <_R2_001> is expected where <R> and <_001> are optional. e.g. <sample_id_1.fastq.gz>, <sample_id_R1.fastq.gz>, <sample_id_R1_001.fastq.gz>)
+                                    csv input expects 6 columns: sample, fastq_1, fastq_2, strandedness and read_type. 
+                                    fastq_2 is optional and can be empty. Strandedness, read_type expects same values as corresponding RAIN parameter; If a value is provided via RAIN paramter, it will override the value in the csv file.
+                                    Example of csv file:
+                                        sample,fastq_1,fastq_2,strandedness,read_type
+                                        control1,path/to/data1.fastq.bam,,auto,short_single
+                                        control2,path/to/data2_R1.fastq.gz,path/to/data2_R2.fastq.gz,auto,short_paired
+    --genome                    Path to the reference genome in FASTA format.
+    --read_type                 Type of reads among this list [short_paired, short_single, pacbio, ont] (no default)
+
+        Output:
+    --output                    Path to the output directory (default: result)
+
+       Optional input:
+    --aligner                   Aligner to use [default: hisat2]
+    --edit_site_tool            Tool used for detecting edited sites. Default: reditools3
+    --strandedness              Set the strandedness for all your input reads (default: null). In auto mode salmon will guess the library type for each fastq sample. [ 'U', 'IU', 'MU', 'OU', 'ISF', 'ISR', 'MSF', 'MSR', 'OSF', 'OSR', 'auto' ]
+    --edit_threshold            Minimal number of edited reads to count a site as edited (default: 1)
+    --aggregation_mode          Mode for aggregating edition counts mapped on genomic features. See documentation for details. Options are: "all" (default) or "cds_longest"
+    --clipoverlap               Clip overlapping sequences in read pairs to avoid double counting. (default: false)
+
+        Nextflow options:
+    -profile                    Change the profile of nextflow both the engine and executor more details on github README [debug, test, itrop, singularity, local, docker]
+```
+
+## Output
+
+Here the description of typical ouput you will get from RAIN:  
+
+```
+└── rain_results                                         # Output folder set using --outdir. Default: <alignment_results>
+    │
+    ├── AliNe                                            # Folder containing AliNe alignment pipeline result (see https://github.com/Juke34/AliNe)
+    │   ├── alignment                                    # bam alignment used by RAIN
+    │   ├── salmon_strandedness                          # strandedness collected by AliNe in case auto mode was in used for fastq files
+    │   └── ...      
+    │
+    ├── bam_indicies                                     # bam and indices bam.bai
+    │
+    ├── FastQC                                           # bam and indices bam.bai
+    │
+    ├── gatk_markduplicates                              # metrics and bam after markduplicates
+    │
+    └── Reditools2/Reditools3/Jacusa/sapin/              # Editing output from corresponding tool
+    │
+    └── feature_edits                                    # Editing computed at different level (genomic features, chromosome, global)
+
+## Author and contributors
+
+Eduardo Ascarrunz (@eascarrunz)
+Jacques Dainat  (@Juke34)
+
+## Contributing
+
+Contributions from the community are welcome ! See the [Contributing guidelines](https://github.com/Juke34/rain/blob/main/CONTRIBUTING.md)
diff --git a/build_images.sh b/build_images.sh
@@ -28,11 +28,9 @@ do
     echo ██████████████████▓▒░   Building ${imgname}   ░▒▓██████████████████
     
     # Reditools2 does not compile on arm64, force using amd64 compilation
-    if [[ $dir =~ "reditools2" ]];then
-        if [[ "$arch" == arm* || "$arch" == "aarch64" ]]; then
-            echo "Reditools2 does not compile on arm64, force using amd64 compilation"
-            docker_arch_option=" --platform linux/amd64"
-        fi
+    if [[ "$arch" == arm* || "$arch" == "aarch64" ]]; then
+        echo "Reditools2 does not compile on arm64, force using amd64 compilation"
+        docker_arch_option=" --platform linux/amd64"
     fi
 
     docker build ${docker_arch_option} -t ${imgname} .
diff --git a/data/chr21/chr21_small.bam b/data/chr21/chr21_small.bam
diff --git a/data/chr21/chr21_small.csv b/data/chr21/chr21_small.csv
@@ -0,0 +1,3 @@
+sample,input_1,input_2,strandedness,read_type
+test1,/Users/jacquesdainat/git/Juke34/rain/data/chr21/chr21_small_R1.fastq.gz,,auto,short_single
+test2,/Users/jacquesdainat/git/Juke34/rain/data/chr21/chr21_small_R2.fastq.gz,,ISR,short_single
diff --git a/modules/aline.nf b/modules/aline.nf
@@ -34,6 +34,7 @@ process AliNe {
                 read_type,
                 aligner,
                 library_type,
+                "--data_type rna",
                 "--outdir $task.workDir/AliNe",
         ].join(" ")
         // Copy command to shell script in work dir for reference/debugging.
diff --git a/modules/reditools3.nf b/modules/reditools3.nf
@@ -14,19 +14,19 @@ process reditools3 {
     script:
         // Set the strand orientation parameter from the library type parameter
         // Terms explained in https://salmon.readthedocs.io/en/latest/library_type.html
-        if (meta.libtype in ["ISR", "SR"]) {
+        if (meta.strandedness in ["ISR", "SR"]) {
             // First-strand oriented
             strand_orientation = "2"
         } else if (meta.libtype in ["ISF", "SF"]) {
             // Second-strand oriented
             strand_orientation = "1"
-        } else if (meta.libtype in ["IU", "U"]) {
+        } else if (meta.strandedness in ["IU", "U"]) {
             // Unstranded
             strand_orientation = "0"
         } else {
             // Unsupported: Pass the library type string so that it's reported in
             // the reditools error message
-            strand_orientation = meta.libtype
+            strand_orientation = "0"
         }
         base_name = bam.BaseName
 
diff --git a/nextflow.config b/nextflow.config
@@ -55,10 +55,10 @@ profiles {
     test {
         params.aline_profiles = "${baseDir}/config/resources/base_aline.config" 
         params.aligner        = "STAR" 
-        params.reads          = "${baseDir}/data/chr21/chr21_small_R1.fastq.gz "
+        params.reads          = "${baseDir}/data/chr21/chr21_small_R1.fastq.gz"
         params.genome         = "${baseDir}/data/chr21/chr21_small.fasta.gz"
         params.annotation     = "${baseDir}/data/chr21/chr21_small_filtered.gff3.gz"
-        params.library_type   = "ISR" 
+        params.strandedness   = "ISR" 
         params.read_type      = "short_single"
     }
     test2 {
@@ -67,7 +67,7 @@ profiles {
         params.reads          = "${baseDir}/data/chr21/"
         params.genome         = "${baseDir}/data/chr21/chr21_small.fasta.gz" 
         params.annotation     = "${baseDir}/data/chr21/chr21_small_filtered.gff3.gz"
-        params.library_type   = "ISR" 
+        params.strandedness   = "ISR" 
         params.read_type      = "short_paired"
     }
 }
diff --git a/rain.nf b/rain.nf

Original file line number	Diff line number	Diff line change
`@@ -0,0 +1,3 @@`
	`1`	`+sample,input_1,input_2,strandedness,read_type`
	`2`	`+test1,/Users/jacquesdainat/git/Juke34/rain/data/chr21/chr21_small_R1.fastq.gz,,auto,short_single`
	`3`	`+test2,/Users/jacquesdainat/git/Juke34/rain/data/chr21/chr21_small_R2.fastq.gz,,ISR,short_single`