From 65e54f5c37ecc2bdedc04f23260ed97461276d4d Mon Sep 17 00:00:00 2001
From: "github-actions[bot]" <github-actions[bot]@users.noreply.github.com>
Date: Fri, 13 Mar 2026 03:51:34 +0000
Subject: [PATCH 1/2] =?UTF-8?q?chore:=20=F0=9F=A4=96=20sync=20copilot=20in?=
 =?UTF-8?q?structions=20-=202026-03-13?=
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

---
 .github/copilot-instructions.md | 164 ++++++++++++++++++++++++++++++++
 1 file changed, 164 insertions(+)
 create mode 100644 .github/copilot-instructions.md

diff --git a/.github/copilot-instructions.md b/.github/copilot-instructions.md
new file mode 100644
index 0000000..ab79ac9
--- /dev/null
+++ b/.github/copilot-instructions.md
@@ -0,0 +1,164 @@
+# CoPilot Instructions for CCBR Repositories
+
+## Reviewer guidance (what to look for in PRs)
+
+- Reviewers must validate enforcement rules: no secrets, container specified, and reproducibility pins.
+- If code is AI-generated, reviewers must ensure the author documents what was changed and why, and that the PR is labeled `generated-by-AI`.
+- Reviewers should verify license headers and ownership metadata (for example, `CODEOWNERS`) are present.
+- Reviews must read the code and verify that it adheres to the project's coding standards, guidelines, and best practices in software engineering.
+
+## CI & enforcement suggestions (automatable)
+
+1. **PR template**: include optional AI-assistance disclosure fields (model used, high-level prompt intent, manual review confirmation).
+2. **Pre-merge check (GitHub Action)**: verify `.github/copilot-instructions.md` is present in the repository and that new pipeline files include a `# CRAFT:` header.
+3. **Lint jobs**: `ruff` for Python, `shellcheck` for shell, `lintr` for R, and `nf-core lint` or Snakemake lint checks where applicable.
+4. **Secrets scan**: run `TruffleHog` or `Gitleaks` on PRs to detect accidental credentials.
+5. **AI usage label**: if AI usage is declared, an Action should add `generated-by-AI` label (create this label if it does not exist); the PR body should end with the italicized Markdown line: _Generated using AI_, and any associated commit messages should end with the plain footer line: `Generated using AI`.
+
+_Sample GH Action check (concept): if AI usage is declared, require an AI-assistance disclosure field in the PR body._
+
+## Security & compliance (mandatory)
+
+- Developers must not send PHI or sensitive NIH internal identifiers to unapproved external AI services; use synthetic examples.
+- Repository content must only be sent to model providers approved by NCI/NIH policy (for example, Copilot for Business or approved internal proxies).
+- For AI-assisted actions, teams must keep an auditable record including: user, repository, action, timestamp, model name, and endpoint.
+- If using a server wrapper (Option C), logs must include the minimum metadata above and follow institutional retention policy.
+- If policy forbids external model use for internal code, teams must use approved local/internal LLM workflows.
+
+## Operational notes (practical)
+
+- `copilot-instructions.md` should remain concise and prescriptive; keep only high-value rules and edge-case examples.
+- Developers should include the CRAFT block in edited files when requesting substantial generated code to improve context quality.
+- CoPilot must ask the user for permission before deleting any file unless the file was created by CoPilot for a temporary run or test.
+- CoPilot must not edit any files outside of the current open workspace.
+
+## Code authoring guidance
+
+- Code must not include hard-coded secrets, credentials, or sensitive absolute paths on disk.
+- Code should be designed for modularity, reusability, and maintainability. It should ideally be platform-agnostic, with special support for running on the Biowulf HPC.
+- Use pre-commit to enforce code style and linting during the commit process.
+
+### Pipelines
+
+- Authors must review existing CCBR pipelines first: <https://github.com/CCBR>.
+- New pipelines should follow established CCBR conventions for folder layout, rule/process naming, config structure, and test patterns.
+- Pipelines must define container images and pin tool/image versions for reproducibility.
+- Contributions should include a test dataset and a documented example command.
+
+#### Snakemake
+
+- In general, new pipelines should be created with Nextflow rather than Snakemake, unless there is a compelling reason to use Snakemake.
+- Generate new pipelines from the CCBR_SnakemakeTemplate repo: <https://github.com/CCBR/CCBR_SnakemakeTemplate>
+- For Snakemake, run `snakemake --lint` and a dry-run before PR submission.
+
+#### Nextflow
+
+- Generate new pipelines from the CCBR_NextflowTemplate repo: <https://github.com/CCBR/CCBR_NextflowTemplate>
+- For Nextflow pipelines, authors must follow nf-core patterns and references: <https://nf-co.re>.
+- Nextflow code must use DSL2 only (DSL1 is not allowed).
+- For Nextflow, run `nf-core lint` (or equivalent checks) before PR submission.
+- Where possible, reuse modules and subworkflows from CCBR/nf-modules or nf-core/modules.
+- New modules and subworkflows should be tested with `nf-test`.
+
+### Python scripts and packages
+
+- Python scripts must include module and function/class docstrings.
+- Where a standard CLI framework is adopted, Python CLIs should use `click` or `typer` for consistency with existing components.
+- Scripts must support `--help` and document required/optional arguments.
+- Python code must follow [PEP 8](https://peps.python.org/pep-0008/), use `snake_case`, and include type hints for public functions.
+- Scripts must raise descriptive error messages on failure and warnings when applicable. Prefer raising an exception over printing an error message, and over returning an error code.
+- Python code should pass `ruff`;
+- Each script must include a documented example usage in comments or README.
+- Tests should be written with `pytest`. Other testing frameworks may be used if justified.
+- Do not catch bare exceptions. The exception type must always be specified.
+- Only include one return statement at the end of a function.
+
+### R scripts and packages
+
+- R scripts must include function and class docstrings via roxygen2.
+- CLIs must be defined using the `argparse` package.
+- CLIs must support `--help` and document required/optional arguments.
+- R code should pass `lintr` and `air`.
+- Tests should be written with `testthat`.
+- Packages should pass `devtools::check()`.
+- R code should adhere to the tidyverse style guide. https://style.tidyverse.org/
+- Only include one return statement at the end of a function, if a return statement is used at all. Explicit returns are preferred but not required for R functions.
+
+## AI-generated commit messages (Conventional Commits)
+
+- Commit messages must follow [Conventional Commits](https://www.conventionalcommits.org/en/v1.0.0/) (as enforced in `CONTRIBUTING.md`).
+- Generate messages from staged changes only (`git diff --staged`); do not include unrelated work.
+- Commits should be atomic: one logical change per commit.
+- If mixed changes are present, split into multiple logical commits; the number of commits does not need to equal the number of files changed.
+- Subject format must be: `<type>(optional-scope): short imperative summary` (<=72 chars), e.g., `fix(profile): update release table parser`.
+- Add a body only when needed to explain **why** and notable impact; never include secrets, tokens, PHI, or large diffs.
+- For AI-assisted commits, add this final italicized footer line in the commit message body: _commit message is ai-generated_
+
+Suggested prompt for AI tools:
+
+```text
+Create a Conventional Commit message from this staged diff.
+Rules:
+1) Use one of: feat|fix|docs|style|refactor|perf|test|build|ci|chore|revert.
+2) Keep subject <= 72 chars, imperative mood, no trailing period.
+3) Include optional scope when clear.
+4) Add a short body only if needed (why/impact), wrapped at ~72 chars.
+5) Output only the final commit message.
+```
+
+## Pull Requests
+
+When opening a pull request, use the repository's pull request template (usually it is `.github/PULL_REQUEST_TEMPLATE.md`).
+Different repos have different PR templates depending on their needs.
+Ensure that the pull request follows the repository's PR template and includes all required information.
+Do not allow the developer to proceed with opening a PR if it does not fill out all sections of the template.
+Before a PR can be moved from draft to "ready for review", all of the relevant checklist items must be checked, and any
+irrelevant checklist items should be crossed out.
+
+When new features, bug fixes, or other behavioral changes are introduced to the code,
+unit tests must be added or updated to cover the new or changed functionality.
+
+If there are any API or other user-facing changes, the documentation must be updated both inline via docstrings and long-form docs in the `docs/` or `vignettes/` directory.
+
+When a repo contains a build workflow (i.e. a workflow file in `.github/workflows` starting with `build` or named `R-CMD-check`),
+the build workflow must pass before the PR can be approved.
+
+### Changelog
+
+The changelog for the repository should be maintained in a `CHANGELOG.md` file
+(or `NEWS.md` for R packages) at the root of the repository. Each pull request
+that introduces user-facing changes must include a concise entry with the PR
+number and author username tagged. Developer-only changes (i.e. updates to CI
+workflows, development notes, etc.) should never be included in the changelog.
+Example:
+
+```
+## development version
+
+- Fix bug in `detect_absolute_paths()` to ignore comments. (#123, @username)
+```
+
+## Onboarding checklist for new developers
+
+- [ ] Read `.github/CONTRIBUTING.md` and `.github/copilot-instructions.md`.
+- [ ] Configure VSCode workspace to open `copilot-instructions.md` by default (so Copilot Chat sees it).
+- [ ] Install pre-commit and run `pre-commit install`.
+
+## Appendix: VSCode snippet (drop into `.vscode/snippets/craft.code-snippets`)
+
+```json
+{
+  "Insert CRAFT prompt": {
+    "prefix": "craft",
+    "body": [
+      "/* C: Context: Repo=${workspaceFolderBasename}; bioinformatics pipelines; NIH HPC (Biowulf/Helix); containers: quay.io/ccbr */",
+      "/* R: Rules: no PHI, no secrets, containerize, pin versions, follow style */",
+      "/* F: Flow: inputs/ -> results/, conf/, tests/ */",
+      "/* T: Tests: provide a one-line TEST_CMD and expected output */",
+      "",
+      "A: $1"
+    ],
+    "description": "Insert CRAFT prompt and place cursor at Actions"
+  }
+}
+```

From 9fc008657a5e34cb57e30e06aa5b8b1d1e9b1bb3 Mon Sep 17 00:00:00 2001
From: "pre-commit-ci[bot]"
 <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Date: Fri, 13 Mar 2026 03:53:34 +0000
Subject: [PATCH 2/2] [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci
---
 .tests/lint_workdir/ref/dummy                 |   1 -
 config/samples.tsv.fulltest                   |   2 +-
 docker/bowtie1/environment.txt                |   2 +-
 docker/circRNA_finder/environment.txt         |   2 +-
 docker/cutadapt_fqfilter/environment.yml      |   8 +-
 docker/dcc/environment.yml                    |   6 +-
 docker/star_ucsc_cufflinks/environment.yml    |  42 +-
 docs/dryrun_example.txt                       |   2 +-
 resources/NCLscan.config.template             |   5 +-
 ...ruSeq_and_nextera_adapters.consolidated.fa |   2 +-
 resources/argparse.bash                       |   1 -
 resources/cluster.json.highmem                |   2 +-
 resources/collapse_bed_by_names.py            |  44 +-
 resources/dockers/ccbr_clear/Dockerfile       |   4 +-
 resources/merge_dataframes.R                  |  22 +-
 workflow/envs/clear.yaml                      |   8 +-
 workflow/rules/preprocessing.smk              |   4 +-
 .../Create_circExplorer_BSJ_count_matrix.py   |  75 +-
 .../Create_circExplorer_count_matrix.py       | 152 +--
 workflow/scripts/Create_ciri_count_matrix.py  | 106 +-
 workflow/scripts/_add_geneid2genepred.py      |  57 +-
 .../_append_splice_site_flanks_to_BSJs.py     | 106 +-
 .../scripts/_bam_filter_BSJ_for_HQonly.py     | 190 ++--
 workflow/scripts/_bam_get_alignment_stats.py  |  73 +-
 workflow/scripts/_bamtobed2readendsbed.py     |  58 +-
 workflow/scripts/_bedintersect_to_rid2jid.py  |  67 +-
 workflow/scripts/_bedpe2bed.py                |  30 +-
 .../scripts/_circExplorer_BSJ_get_strand.py   |  53 +-
 workflow/scripts/_collapse_find_circ.py       |  37 +-
 workflow/scripts/_compare_lists.py            |  81 +-
 .../_create_circExplorer_BSJ_bam_pe.py        | 804 +++++++++------
 .../_create_circExplorer_BSJ_bam_se.py        | 743 ++++++++------
 .../_create_circExplorer_BSJ_hqonly_pe.py     | 825 ++++++++++------
 .../_extract_circExplorer_linear_reads.py     | 681 ++++++++-----
 ...filter_linear_spliced_readids_w_rid2jid.py | 160 +--
 workflow/scripts/_make_master_counts_table.py |  85 +-
 .../_merge_circExplorer_found_counts.py       |  65 +-
 .../scripts/_merge_per_sample_counts_table.py | 922 +++++++++++-------
 .../scripts/_multifasta2separatefastas.sh     |   2 +-
 workflow/scripts/_process_bamtobed.py         | 109 ++-
 workflow/scripts/annotate_clear_quant.py      |  50 +-
 workflow/scripts/apply_junction_filters.py    | 135 ++-
 workflow/scripts/bam_get_max_readlen.py       |  12 +-
 workflow/scripts/bam_split_by_regions.py      | 185 ++--
 workflow/scripts/bam_to_bigwig.sh             |   2 +-
 ...xplorer_get_annotated_counts_per_sample.py | 324 ++++--
 .../scripts/create_circExplorer_linear_bam.py | 804 ++++++++-------
 ...te_circExplorer_per_sample_counts_table.py |  63 +-
 .../create_dcc_per_sample_counts_table.py     | 126 ++-
 ...reate_mapsplice_per_sample_counts_table.py | 324 ++++--
 .../create_nclscan_per_sample_counts_table.py | 213 +++-
 workflow/scripts/filter_bam.py                |  40 +-
 workflow/scripts/filter_bam_by_readids.py     |  66 +-
 workflow/scripts/filter_bam_for_BSJs.py       | 323 +++---
 .../scripts/filter_bam_for_linear_reads.py    |  81 +-
 .../scripts/filter_bam_for_splice_reads.py    | 140 +--
 workflow/scripts/filter_ciriout.py            | 223 +++--
 workflow/scripts/filter_dcc.py                | 228 +++--
 workflow/scripts/filter_junction.py           |   8 +-
 workflow/scripts/filter_junction_human.py     |   8 +-
 workflow/scripts/fix_gtfs.py                  | 156 +--
 workflow/scripts/fix_refseq_gtf.py            | 319 +++---
 workflow/scripts/gather_cluster_stats.sh      |   2 +-
 workflow/scripts/get_index_rl.py              |  15 +-
 workflow/scripts/junctions2readids.py         |  58 +-
 workflow/scripts/make_star_index.sh           |   2 +-
 workflow/scripts/merge_ReadsPerGene_counts.R  |   8 +-
 .../merge_counts_tables_2_counts_matrix.py    | 269 +++--
 workflow/scripts/reformat_hg38_2_hg19.py      | 102 +-
 workflow/scripts/transcript2gene.py           |  39 +-
 ...e_BSJ_reads_and_split_BSJ_bam_by_strand.py | 688 ++++++-------
 71 files changed, 6578 insertions(+), 4073 deletions(-)

diff --git a/.tests/lint_workdir/ref/dummy b/.tests/lint_workdir/ref/dummy
index 8b13789..e69de29 100644
--- a/.tests/lint_workdir/ref/dummy
+++ b/.tests/lint_workdir/ref/dummy
@@ -1 +0,0 @@
-
diff --git a/config/samples.tsv.fulltest b/config/samples.tsv.fulltest
index 1a883ec..8f5913f 100644
--- a/config/samples.tsv.fulltest
+++ b/config/samples.tsv.fulltest
@@ -1,3 +1,3 @@
 sampleName	path_to_R1_fastq	path_to_R2_fastq
 GI1_N	/data/Ziegelbauer_lab/circRNADetection/rawdata/ccbr983/fastq2/5_GI112118_norm_S4_R1_001.fastq.gz	/data/Ziegelbauer_lab/circRNADetection/rawdata/ccbr983/fastq2/5_GI112118_norm_S4_R2_001.fastq.gz
-GI1_T	/data/Ziegelbauer_lab/circRNADetection/rawdata/ccbr983/fastq2/6_GI112118_tum_S5_R1_001.fastq.gz
\ No newline at end of file
+GI1_T	/data/Ziegelbauer_lab/circRNADetection/rawdata/ccbr983/fastq2/6_GI112118_tum_S5_R1_001.fastq.gz
diff --git a/docker/bowtie1/environment.txt b/docker/bowtie1/environment.txt
index 14ff580..1edfbc3 100644
--- a/docker/bowtie1/environment.txt
+++ b/docker/bowtie1/environment.txt
@@ -1 +1 @@
-bowtie=1.3.1
\ No newline at end of file
+bowtie=1.3.1
diff --git a/docker/circRNA_finder/environment.txt b/docker/circRNA_finder/environment.txt
index fd233e3..2514f91 100644
--- a/docker/circRNA_finder/environment.txt
+++ b/docker/circRNA_finder/environment.txt
@@ -1,2 +1,2 @@
 samtools
-STAR
\ No newline at end of file
+STAR
diff --git a/docker/cutadapt_fqfilter/environment.yml b/docker/cutadapt_fqfilter/environment.yml
index 4bc48f7..c73a55f 100644
--- a/docker/cutadapt_fqfilter/environment.yml
+++ b/docker/cutadapt_fqfilter/environment.yml
@@ -1,6 +1,6 @@
 channels:
-    - conda-forge
-    - bioconda
+  - conda-forge
+  - bioconda
 dependencies:
-    - cutadapt
-    - fastq-filter
\ No newline at end of file
+  - cutadapt
+  - fastq-filter
diff --git a/docker/dcc/environment.yml b/docker/dcc/environment.yml
index 18e2f93..5b0daba 100644
--- a/docker/dcc/environment.yml
+++ b/docker/dcc/environment.yml
@@ -1,5 +1,5 @@
 channels:
-    - conda-forge
-    - bioconda
+  - conda-forge
+  - bioconda
 dependencies:
-    - bioconda::dcc=0.5.0
\ No newline at end of file
+  - bioconda::dcc=0.5.0
diff --git a/docker/star_ucsc_cufflinks/environment.yml b/docker/star_ucsc_cufflinks/environment.yml
index 07edb22..23006e4 100644
--- a/docker/star_ucsc_cufflinks/environment.yml
+++ b/docker/star_ucsc_cufflinks/environment.yml
@@ -1,23 +1,23 @@
 channels:
-    - conda-forge
-    - bioconda
+  - conda-forge
+  - bioconda
 dependencies:
-    - argparse
-    - bedtools=2.29.0
-    - blat=35
-    - bowtie2=2.5.1
-    - bwa=0.7.17
-    - cufflinks=2.2.1
-    - gffread
-    - HTSeq
-    - novoalign=3.07.00
-    - numpy
-    - pandas
-    - pysam
-    - python=3.6
-    - sambamba=0.8.2
-    - samtools=1.16.1
-    - star=2.7.6a
-    - ucsc-bedgraphtobigwig
-    - ucsc-bedsort
-    - ucsc-gtftogenepred
\ No newline at end of file
+  - argparse
+  - bedtools=2.29.0
+  - blat=35
+  - bowtie2=2.5.1
+  - bwa=0.7.17
+  - cufflinks=2.2.1
+  - gffread
+  - HTSeq
+  - novoalign=3.07.00
+  - numpy
+  - pandas
+  - pysam
+  - python=3.6
+  - sambamba=0.8.2
+  - samtools=1.16.1
+  - star=2.7.6a
+  - ucsc-bedgraphtobigwig
+  - ucsc-bedsort
+  - ucsc-gtftogenepred
diff --git a/docs/dryrun_example.txt b/docs/dryrun_example.txt
index 7f14f1b..6a56ed5 100644
--- a/docs/dryrun_example.txt
+++ b/docs/dryrun_example.txt
@@ -502,4 +502,4 @@ Job counts:
 	2	star1p
 	2	star2p
 	20
-This was a dry-run (flag -n). The order of jobs does not reflect the order of execution.
\ No newline at end of file
+This was a dry-run (flag -n). The order of jobs does not reflect the order of execution.
diff --git a/resources/NCLscan.config.template b/resources/NCLscan.config.template
index 0f53d48..179682f 100644
--- a/resources/NCLscan.config.template
+++ b/resources/NCLscan.config.template
@@ -68,7 +68,7 @@ SeqOut_bin              = {NCLscan_bin}/SeqOut
 ### Advanced parameters ###
 ###########################
 
-## The following two parameters indicate the maximal read length (L) and fragment size of the used paired-end RNA-seq data (FASTQ files), where fragment size = 2L + insert size. 
+## The following two parameters indicate the maximal read length (L) and fragment size of the used paired-end RNA-seq data (FASTQ files), where fragment size = 2L + insert size.
 ## If L > 151, the users should change these two parameters to (L, 2L + insert size).
 max_read_len      = 151
 max_fragment_size = 500
@@ -96,6 +96,3 @@ bwa-mem-t = 56
 ## NOTE: The memory usage of each blat process would be up to 4 GB!
 ##
 mp_blat_process = 56
-
-
-
diff --git a/resources/TruSeq_and_nextera_adapters.consolidated.fa b/resources/TruSeq_and_nextera_adapters.consolidated.fa
index de67830..8fb4b76 100755
--- a/resources/TruSeq_and_nextera_adapters.consolidated.fa
+++ b/resources/TruSeq_and_nextera_adapters.consolidated.fa
@@ -91,4 +91,4 @@ TTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTT
 >Barcode_Index25_F
 ACTGAT
 >Barcode_Index25_R
-ATCAGT
\ No newline at end of file
+ATCAGT
diff --git a/resources/argparse.bash b/resources/argparse.bash
index ed1029b..25f935e 100755
--- a/resources/argparse.bash
+++ b/resources/argparse.bash
@@ -79,4 +79,3 @@ echo "INFILE: \${INFILE}"
 echo "OUTFILE: \${OUTFILE}"
 FOO
 fi
-
diff --git a/resources/cluster.json.highmem b/resources/cluster.json.highmem
index b6d24f2..9a88f74 100644
--- a/resources/cluster.json.highmem
+++ b/resources/cluster.json.highmem
@@ -36,5 +36,5 @@
         "threads": "56",
         "time": "48:00:00",
         "partition": "largemem"
-    }    
+    }
 }
diff --git a/resources/collapse_bed_by_names.py b/resources/collapse_bed_by_names.py
index 5c23e3b..716098f 100644
--- a/resources/collapse_bed_by_names.py
+++ b/resources/collapse_bed_by_names.py
@@ -2,9 +2,10 @@
 import sys
 import textwrap
 
-usage_txt=textwrap.dedent("""\
+usage_txt = textwrap.dedent(
+    """\
 	Description:
-	The script collapses bed entries, ie, if the bed file has repeated 
+	The script collapses bed entries, ie, if the bed file has repeated
 	regions but with different names, then they are all collaped into
 	a single bed entry and the names are reported as a comma separated
 	list in the 4th column
@@ -13,29 +14,32 @@
 	@Parameters:
 	1. <inputBed6>: BED6 file that needs to be collapsed by name
 	2. <outputBed6>: BED6 collaped output file
-""".format(__file__))
+""".format(
+        __file__
+    )
+)
 
 
-if len(sys.argv)!=3:
-	exit(usage_txt)
+if len(sys.argv) != 3:
+    exit(usage_txt)
 
 with open(sys.argv[1]) as f:
-	inputBedLines=f.readlines()
+    inputBedLines = f.readlines()
 
-names=dict()
+names = dict()
 for l in inputBedLines:
-	l=l.strip().split("\t")
-	tmp=[l[0],l[1],l[2],l[5]]
-	region_id="##".join(tmp)
-	if not region_id in names:
-		names[region_id]=list()
-	names[region_id].append(l[3])
+    l = l.strip().split("\t")
+    tmp = [l[0], l[1], l[2], l[5]]
+    region_id = "##".join(tmp)
+    if not region_id in names:
+        names[region_id] = list()
+    names[region_id].append(l[3])
 
-outbed = open(sys.argv[2],'w')
-for region_id,name in names.items():
-	tmp=region_id.split("##")
-	namelist=",".join(name)
-	tmp.insert(3,namelist)
-	tmp.insert(4,"0")
-	outbed.write("\t".join(tmp)+"\n")
+outbed = open(sys.argv[2], "w")
+for region_id, name in names.items():
+    tmp = region_id.split("##")
+    namelist = ",".join(name)
+    tmp.insert(3, namelist)
+    tmp.insert(4, "0")
+    outbed.write("\t".join(tmp) + "\n")
 outbed.close()
diff --git a/resources/dockers/ccbr_clear/Dockerfile b/resources/dockers/ccbr_clear/Dockerfile
index 809e588..975e122 100755
--- a/resources/dockers/ccbr_clear/Dockerfile
+++ b/resources/dockers/ccbr_clear/Dockerfile
@@ -18,9 +18,9 @@ ENV PATH="/opt2:$PATH"
 # Circexplorer2 --> bowtie
 ADD bowtie-1.1.2.tar.gz /opt2
 ENV PATH="/opt2/bowtie-1.1.2:$PATH"
-# Circexplorer2 --> UCSC bedtools tophat 
+# Circexplorer2 --> UCSC bedtools tophat
 RUN apt-get install -y bedtools
-# Circexplorer2 --> UCSC tophat 
+# Circexplorer2 --> UCSC tophat
 ADD tophat-2.1.0.Linux_x86_64.tar.gz /opt2
 ENV PATH="/opt2/tophat-2.1.0.Linux_x86_64:$PATH"
 # Circexplorer2 --> UCSC boostlibraries cufflinks
diff --git a/resources/merge_dataframes.R b/resources/merge_dataframes.R
index 105658c..512adff 100644
--- a/resources/merge_dataframes.R
+++ b/resources/merge_dataframes.R
@@ -1,27 +1,27 @@
 #!/usr/bin/env Rscript --vanilla
 
 # suppressPackageStartupMessages(library("argparse"))
-# 
+#
 # # create parser object
 # parser <- ArgumentParser()
-# 
-# # specify our desired options 
-# # by default ArgumentParser will add an help option 
+#
+# # specify our desired options
+# # by default ArgumentParser will add an help option
 # parser$add_argument("-v", "--verbose", action="store_true", default=TRUE,
 #                     help="Print extra output [default]")
-# parser$add_argument("--df1", 
+# parser$add_argument("--df1",
 #                     dest="df1", help="dataframe1")
-# parser$add_argument("--df1_colname", 
+# parser$add_argument("--df1_colname",
 #                     help="dataframe1 columnname to merge by")
-# parser$add_argument("--df2", 
+# parser$add_argument("--df2",
 #                     dest="df2", help="dataframe2")
-# parser$add_argument("--df2_colname", 
+# parser$add_argument("--df2_colname",
 #                     help="dataframe2 columnname to merge by")
-# parser$add_argument("--out", 
+# parser$add_argument("--out",
 #                     dest="out", help="out filename")
-# 
+#
 # # get command line options, if help option encountered print help and exit,
-# # otherwise if options not found on command line then set defaults, 
+# # otherwise if options not found on command line then set defaults,
 # args <- parser$parse_args()
 
 setwd("~/Ziegelbauer_lab/circRNADetection/scripts/circRNA/resources")
diff --git a/workflow/envs/clear.yaml b/workflow/envs/clear.yaml
index a41d6ae..cd0f57b 100644
--- a/workflow/envs/clear.yaml
+++ b/workflow/envs/clear.yaml
@@ -25,7 +25,7 @@ dependencies:
   - wheel=0.36.2=pyhd3deb0d_0
   - zlib=1.2.11=h516909a_1010
   - pip:
-    - clear==1.0.1
-    - pybedtools==0.8.1
-    - pysam==0.16.0.1
-    - six==1.15.0
\ No newline at end of file
+      - clear==1.0.1
+      - pybedtools==0.8.1
+      - pysam==0.16.0.1
+      - six==1.15.0
diff --git a/workflow/rules/preprocessing.smk b/workflow/rules/preprocessing.smk
index 1abeaed..8be3335 100644
--- a/workflow/rules/preprocessing.smk
+++ b/workflow/rules/preprocessing.smk
@@ -54,7 +54,7 @@ rule cutadapt:
             -j {threads} \\
             -o {params.tmpdir}/${{of1bn}} -p {params.tmpdir}/${{of2bn}} \\
             {input.R1} {input.R2}
-            
+
         # filter for average read quality
             fastq-filter \\
                 -q {params.cutadapt_q} \\
@@ -73,7 +73,7 @@ rule cutadapt:
             -j {threads} \\
             -o {params.tmpdir}/${{of1bn}} \\
             {input.R1}
-            
+
             touch {output.of2}
 
         # filter for average read quality
diff --git a/workflow/scripts/Create_circExplorer_BSJ_count_matrix.py b/workflow/scripts/Create_circExplorer_BSJ_count_matrix.py
index ddd6895..1b96e70 100755
--- a/workflow/scripts/Create_circExplorer_BSJ_count_matrix.py
+++ b/workflow/scripts/Create_circExplorer_BSJ_count_matrix.py
@@ -11,21 +11,21 @@
 import os
 import matplotlib.pyplot as plt
 import sys
-lookupfile=sys.argv[1]
-hostID=sys.argv[2]
+
+lookupfile = sys.argv[1]
+hostID = sys.argv[2]
 # In[27]:
 
 
 def readthefile(f):
-    sampleName=f.name.replace(".back_spliced_junction.bed","")
-    x=pandas.read_csv(f,sep="\t",header=None)
-    x.columns=["chr","start","end","name_count","score","strand"]
-    x['id'] = x["chr"]+":"+x["start"].map(str)+"-"+x["end"].map(str)
-    x[['name',sampleName]] = x.name_count.str.split("/",expand=True)
-    x=x.loc[:,["id",sampleName]]
-    x.set_index(["id"],inplace=True)
-    return(x)
-        
+    sampleName = f.name.replace(".back_spliced_junction.bed", "")
+    x = pandas.read_csv(f, sep="\t", header=None)
+    x.columns = ["chr", "start", "end", "name_count", "score", "strand"]
+    x["id"] = x["chr"] + ":" + x["start"].map(str) + "-" + x["end"].map(str)
+    x[["name", sampleName]] = x.name_count.str.split("/", expand=True)
+    x = x.loc[:, ["id", sampleName]]
+    x.set_index(["id"], inplace=True)
+    return x
 
 
 # In[2]:
@@ -38,26 +38,31 @@ def atof(text):
         retval = text
     return retval
 
+
 def natural_keys(text):
-    '''
+    """
     alist.sort(key=natural_keys) sorts in human order
     http://nedbatchelder.com/blog/200712/human_sorting.html
     (See Toothy's implementation in the comments)
     float regex comes from https://stackoverflow.com/a/12643073/190597
-    '''
-    return [ atof(c) for c in re.split(r'[+-]?([0-9]+(?:[.][0-9]*)?|[.][0-9]+)', str(text)) ]
+    """
+    return [
+        atof(c) for c in re.split(r"[+-]?([0-9]+(?:[.][0-9]*)?|[.][0-9]+)", str(text))
+    ]
 
 
 # In[3]:
-outfilename1="circExplorer_BSJ_count_matrix.txt"
-outfilename="circExplorer_BSJ_count_matrix_with_annotations.txt"
-
-files_circExplorer=list(Path(os.getcwd()).rglob("*.back_spliced_junction.bed"))
-files_circExplorer=list(filter(lambda x:"_only.back" not in str(x),files_circExplorer))
-files_circExplorer=list(filter(lambda x: os.stat(x).st_size !=0, files_circExplorer))
+outfilename1 = "circExplorer_BSJ_count_matrix.txt"
+outfilename = "circExplorer_BSJ_count_matrix_with_annotations.txt"
+
+files_circExplorer = list(Path(os.getcwd()).rglob("*.back_spliced_junction.bed"))
+files_circExplorer = list(
+    filter(lambda x: "_only.back" not in str(x), files_circExplorer)
+)
+files_circExplorer = list(filter(lambda x: os.stat(x).st_size != 0, files_circExplorer))
 files_circExplorer.sort(key=natural_keys)
-if len(files_circExplorer)==0:
-    for f in [outfilename1,outfilename]:
+if len(files_circExplorer) == 0:
+    for f in [outfilename1, outfilename]:
         if os.path.exists(f):
             os.remove(f)
         os.mknod(f)
@@ -67,33 +72,35 @@ def natural_keys(text):
 # In[35]:
 
 
-circE_count_matrix=readthefile(files_circExplorer[0])
+circE_count_matrix = readthefile(files_circExplorer[0])
 print(circE_count_matrix.head())
 
 
 # In[36]:
 
 
-for j in range(1,len(files_circExplorer)):
-    x=readthefile(files_circExplorer[j])
-    circE_count_matrix=pandas.concat([circE_count_matrix,x],axis=1,join="outer",sort=False)
-circE_count_matrix=circE_count_matrix.sort_index()
+for j in range(1, len(files_circExplorer)):
+    x = readthefile(files_circExplorer[j])
+    circE_count_matrix = pandas.concat(
+        [circE_count_matrix, x], axis=1, join="outer", sort=False
+    )
+circE_count_matrix = circE_count_matrix.sort_index()
 print(circE_count_matrix.head())
 
 
 # In[37]:
 
 
-circE_count_matrix.fillna(0,inplace=True)
+circE_count_matrix.fillna(0, inplace=True)
 circE_count_matrix.head()
-circE_count_matrix.to_csv(outfilename1,sep="\t",header=True)
+circE_count_matrix.to_csv(outfilename1, sep="\t", header=True)
 
 
 # In[38]:
 
 
-annotations=pandas.read_csv(lookupfile,sep="\t",header=0)
-annotations.set_index([hostID],inplace=True)
+annotations = pandas.read_csv(lookupfile, sep="\t", header=0)
+annotations.set_index([hostID], inplace=True)
 annotations.head()
 
 
@@ -106,8 +113,8 @@ def natural_keys(text):
 # In[39]:
 
 
-x=circE_count_matrix.join(annotations)
-x.to_csv(outfilename,sep="\t",header=True)
+x = circE_count_matrix.join(annotations)
+x.to_csv(outfilename, sep="\t", header=True)
 
 
 # In[14]:
@@ -115,5 +122,3 @@ def natural_keys(text):
 
 print(circE_count_matrix.shape)
 print(x.shape)
-
-
diff --git a/workflow/scripts/Create_circExplorer_count_matrix.py b/workflow/scripts/Create_circExplorer_count_matrix.py
index 62fa2f7..1116823 100755
--- a/workflow/scripts/Create_circExplorer_count_matrix.py
+++ b/workflow/scripts/Create_circExplorer_count_matrix.py
@@ -11,114 +11,123 @@
 import os
 import matplotlib.pyplot as plt
 import sys
-#get_ipython().run_line_magic('matplotlib', 'inline')
 
-lookupfile=sys.argv[1]
-hostID=sys.argv[2]
+# get_ipython().run_line_magic('matplotlib', 'inline')
+
+lookupfile = sys.argv[1]
+hostID = sys.argv[2]
 # In[2]:
 
 
 def atof(text):
-	try:
-		retval = float(text)
-	except ValueError:
-		retval = text
-	return retval
+    try:
+        retval = float(text)
+    except ValueError:
+        retval = text
+    return retval
+
 
 def natural_keys(text):
-	'''
-	alist.sort(key=natural_keys) sorts in human order
-	http://nedbatchelder.com/blog/200712/human_sorting.html
-	(See Toothy's implementation in the comments)
-	float regex comes from https://stackoverflow.com/a/12643073/190597
-	'''
-	return [ atof(c) for c in re.split(r'[+-]?([0-9]+(?:[.][0-9]*)?|[.][0-9]+)', str(text)) ]
+    """
+    alist.sort(key=natural_keys) sorts in human order
+    http://nedbatchelder.com/blog/200712/human_sorting.html
+    (See Toothy's implementation in the comments)
+    float regex comes from https://stackoverflow.com/a/12643073/190597
+    """
+    return [
+        atof(c) for c in re.split(r"[+-]?([0-9]+(?:[.][0-9]*)?|[.][0-9]+)", str(text))
+    ]
 
 
 # In[3]:
 
-outfilename1="circExplorer_count_matrix.txt"
-outfilename="circExplorer_count_matrix_with_annotations.txt"
+outfilename1 = "circExplorer_count_matrix.txt"
+outfilename = "circExplorer_count_matrix_with_annotations.txt"
 
-files_circExplorer=list(Path(os.getcwd()).rglob("*.circularRNA_known.txt"))
-files_circExplorer=list(filter(lambda x: False if str(x).find("low_conf")!=-1 else True, files_circExplorer))
-files_circExplorer=list(filter(lambda x: os.stat(x).st_size !=0, files_circExplorer))
+files_circExplorer = list(Path(os.getcwd()).rglob("*.circularRNA_known.txt"))
+files_circExplorer = list(
+    filter(
+        lambda x: False if str(x).find("low_conf") != -1 else True, files_circExplorer
+    )
+)
+files_circExplorer = list(filter(lambda x: os.stat(x).st_size != 0, files_circExplorer))
 files_circExplorer.sort(key=natural_keys)
 print(files_circExplorer)
-if len(files_circExplorer)==0:
-	for f in [outfilename1,outfilename]:
-		if os.path.exists(f):
-			os.remove(f)
-		os.mknod(f)
-	exit()
+if len(files_circExplorer) == 0:
+    for f in [outfilename1, outfilename]:
+        if os.path.exists(f):
+            os.remove(f)
+        os.mknod(f)
+    exit()
 
 # In[12]:
 
 
-f=files_circExplorer[0]
-sampleName=f.name.replace(".circularRNA_known.txt","")
-print("Reading file:",f)
-print("Sample Name:",sampleName)
-x=pandas.read_csv(f,sep="\t",header=None,usecols=[0,1,2,12])
-x[hostID]=x[0].astype(str)+":"+x[1].astype(str)+"-"+x[2].astype(str)
-x[sampleName+"_circE"]=x[12].astype(str)
-x.drop([0,1,2,12],inplace=True,axis=1)
-x.set_index([hostID],inplace=True)
-circE_count_matrix=x
+f = files_circExplorer[0]
+sampleName = f.name.replace(".circularRNA_known.txt", "")
+print("Reading file:", f)
+print("Sample Name:", sampleName)
+x = pandas.read_csv(f, sep="\t", header=None, usecols=[0, 1, 2, 12])
+x[hostID] = x[0].astype(str) + ":" + x[1].astype(str) + "-" + x[2].astype(str)
+x[sampleName + "_circE"] = x[12].astype(str)
+x.drop([0, 1, 2, 12], inplace=True, axis=1)
+x.set_index([hostID], inplace=True)
+circE_count_matrix = x
 
 
 # In[8]:
 
 
-
-print(circE_count_matrix.head(),circE_count_matrix.tail())
+print(circE_count_matrix.head(), circE_count_matrix.tail())
 print(circE_count_matrix.shape)
 
 
 # In[13]:
 
-for i in range(1,len(files_circExplorer)):
-	f=files_circExplorer[i]
-	print("Currently reading file:"+str(f))
-	x=pandas.read_csv(f,sep="\t",header=None,usecols=[0,1,2,12])
-	print("Head of this file looks like this:")
-	print(x.head())
-	sampleName=f.name.replace(".circularRNA_known.txt","")
-	# x=pandas.read_csv(f,sep="\t",header=None,usecols=[0,1,2,12])
-	print("SampleName is:"+sampleName)
-	x[hostID]=x[0].astype(str)+":"+x[1].astype(str)+"-"+x[2].astype(str)
-	x[sampleName+"_circE"]=x[12].astype(str)
-	print(x.head())
-	x.drop([0,1,2,12],inplace=True,axis=1)
-	x.set_index([hostID],inplace=True)
-	print(x.head())
-	print("Before concat")
-	print(circE_count_matrix.head())
-
-
-# In[14]:
-
-
-	circE_count_matrix = circE_count_matrix.loc[~circE_count_matrix.index.duplicated(keep='first')]
-	x = x.loc[~x.index.duplicated(keep='first')]
-	circE_count_matrix=pandas.concat([circE_count_matrix,x],axis=1,join="outer",sort=False)
-	print("After concat")
-	print(circE_count_matrix.head())
+for i in range(1, len(files_circExplorer)):
+    f = files_circExplorer[i]
+    print("Currently reading file:" + str(f))
+    x = pandas.read_csv(f, sep="\t", header=None, usecols=[0, 1, 2, 12])
+    print("Head of this file looks like this:")
+    print(x.head())
+    sampleName = f.name.replace(".circularRNA_known.txt", "")
+    # x=pandas.read_csv(f,sep="\t",header=None,usecols=[0,1,2,12])
+    print("SampleName is:" + sampleName)
+    x[hostID] = x[0].astype(str) + ":" + x[1].astype(str) + "-" + x[2].astype(str)
+    x[sampleName + "_circE"] = x[12].astype(str)
+    print(x.head())
+    x.drop([0, 1, 2, 12], inplace=True, axis=1)
+    x.set_index([hostID], inplace=True)
+    print(x.head())
+    print("Before concat")
+    print(circE_count_matrix.head())
+
+    # In[14]:
+
+    circE_count_matrix = circE_count_matrix.loc[
+        ~circE_count_matrix.index.duplicated(keep="first")
+    ]
+    x = x.loc[~x.index.duplicated(keep="first")]
+    circE_count_matrix = pandas.concat(
+        [circE_count_matrix, x], axis=1, join="outer", sort=False
+    )
+    print("After concat")
+    print(circE_count_matrix.head())
 
 
 # In[9]:
 
 
-circE_count_matrix.fillna(0,inplace=True)
+circE_count_matrix.fillna(0, inplace=True)
 print(circE_count_matrix.head())
-circE_count_matrix.to_csv(outfilename1,sep="\t",header=True)
+circE_count_matrix.to_csv(outfilename1, sep="\t", header=True)
 
 
 # In[10]:
 
 
-annotations=pandas.read_csv(lookupfile,sep="\t",header=0)
-annotations.set_index([hostID],inplace=True)
+annotations = pandas.read_csv(lookupfile, sep="\t", header=0)
+annotations.set_index([hostID], inplace=True)
 annotations.head()
 
 
@@ -131,8 +140,8 @@ def natural_keys(text):
 # In[12]:
 
 
-x=circE_count_matrix.join(annotations)
-x.to_csv(outfilename,sep="\t",header=True)
+x = circE_count_matrix.join(annotations)
+x.to_csv(outfilename, sep="\t", header=True)
 
 
 # In[14]:
@@ -140,4 +149,3 @@ def natural_keys(text):
 
 print(circE_count_matrix.shape)
 print(x.shape)
-
diff --git a/workflow/scripts/Create_ciri_count_matrix.py b/workflow/scripts/Create_ciri_count_matrix.py
index d38d21e..ebebc06 100755
--- a/workflow/scripts/Create_ciri_count_matrix.py
+++ b/workflow/scripts/Create_ciri_count_matrix.py
@@ -11,10 +11,11 @@
 import os
 import matplotlib.pyplot as plt
 import sys
-#get_ipython().run_line_magic('matplotlib', 'inline')
 
-lookupfile=sys.argv[1]
-hostID=sys.argv[2]
+# get_ipython().run_line_magic('matplotlib', 'inline')
+
+lookupfile = sys.argv[1]
+hostID = sys.argv[2]
 # In[2]:
 
 
@@ -25,41 +26,55 @@ def atof(text):
         retval = text
     return retval
 
+
 def natural_keys(text):
-    '''
+    """
     alist.sort(key=natural_keys) sorts in human order
     http://nedbatchelder.com/blog/200712/human_sorting.html
     (See Toothy's implementation in the comments)
     float regex comes from https://stackoverflow.com/a/12643073/190597
-    '''
-    return [ atof(c) for c in re.split(r'[+-]?([0-9]+(?:[.][0-9]*)?|[.][0-9]+)', str(text)) ]
+    """
+    return [
+        atof(c) for c in re.split(r"[+-]?([0-9]+(?:[.][0-9]*)?|[.][0-9]+)", str(text))
+    ]
 
 
 # In[3]:
 
 
-#files_circExplorer=list(Path(os.getcwd()).rglob("*_human_only.circularRNA_known.txt"))
-files_ciri=list(Path(os.getcwd()).rglob("*.ciri.out"))
-#filter out files in the "old" folder
-#files_circExplorer=list(filter(lambda x: not re.search('/old/', str(x)), files_circExplorer))
-files_ciri=list(filter(lambda x: not re.search('/old/', str(x)), files_ciri))
-#files_circExplorer.sort(key=natural_keys)
+# files_circExplorer=list(Path(os.getcwd()).rglob("*_human_only.circularRNA_known.txt"))
+files_ciri = list(Path(os.getcwd()).rglob("*.ciri.out"))
+# filter out files in the "old" folder
+# files_circExplorer=list(filter(lambda x: not re.search('/old/', str(x)), files_circExplorer))
+files_ciri = list(filter(lambda x: not re.search("/old/", str(x)), files_ciri))
+# files_circExplorer.sort(key=natural_keys)
 files_ciri.sort(key=natural_keys)
 
 
 # In[4]:
 
 
-f=files_ciri[0]
-sampleName=f.name.replace(".ciri.out","")
-x=pandas.read_csv(f,sep="\t",header=0,usecols=["chr","circRNA_start","circRNA_end","#junction_reads"])
+f = files_ciri[0]
+sampleName = f.name.replace(".ciri.out", "")
+x = pandas.read_csv(
+    f,
+    sep="\t",
+    header=0,
+    usecols=["chr", "circRNA_start", "circRNA_end", "#junction_reads"],
+)
 print(x.head())
-x["circRNA_start"]=x["circRNA_start"].astype(int)-1
-x[hostID]=x["chr"].astype(str)+":"+x["circRNA_start"].astype(str)+"-"+x["circRNA_end"].astype(str)
-x[sampleName+"_ciri"]=x["#junction_reads"].astype(str)
-x.drop(["chr","circRNA_start","circRNA_end","#junction_reads"],inplace=True,axis=1)
-x.set_index([hostID],inplace=True)
-ciri_count_matrix=x
+x["circRNA_start"] = x["circRNA_start"].astype(int) - 1
+x[hostID] = (
+    x["chr"].astype(str)
+    + ":"
+    + x["circRNA_start"].astype(str)
+    + "-"
+    + x["circRNA_end"].astype(str)
+)
+x[sampleName + "_ciri"] = x["#junction_reads"].astype(str)
+x.drop(["chr", "circRNA_start", "circRNA_end", "#junction_reads"], inplace=True, axis=1)
+x.set_index([hostID], inplace=True)
+ciri_count_matrix = x
 print(ciri_count_matrix.head())
 
 
@@ -67,43 +82,54 @@ def natural_keys(text):
 
 
 for f in files_ciri[1:]:
-    sampleName=f.name.replace(".ciri.out","")
-    print(f,sampleName)
-    x=pandas.read_csv(f,sep="\t",header=0,usecols=["chr","circRNA_start","circRNA_end","#junction_reads"])
-    x["circRNA_start"]=x["circRNA_start"].astype(int)-1
-    x[hostID]=x["chr"].astype(str)+":"+x["circRNA_start"].astype(str)+"-"+x["circRNA_end"].astype(str)
-    x[sampleName+"_ciri"]=x["#junction_reads"].astype(str)
-    x.drop(["chr","circRNA_start","circRNA_end","#junction_reads"],inplace=True,axis=1)
-    x.set_index([hostID],inplace=True)
-    ciri_count_matrix=pandas.concat([ciri_count_matrix,x],axis=1,join="outer",sort=False)
+    sampleName = f.name.replace(".ciri.out", "")
+    print(f, sampleName)
+    x = pandas.read_csv(
+        f,
+        sep="\t",
+        header=0,
+        usecols=["chr", "circRNA_start", "circRNA_end", "#junction_reads"],
+    )
+    x["circRNA_start"] = x["circRNA_start"].astype(int) - 1
+    x[hostID] = (
+        x["chr"].astype(str)
+        + ":"
+        + x["circRNA_start"].astype(str)
+        + "-"
+        + x["circRNA_end"].astype(str)
+    )
+    x[sampleName + "_ciri"] = x["#junction_reads"].astype(str)
+    x.drop(
+        ["chr", "circRNA_start", "circRNA_end", "#junction_reads"], inplace=True, axis=1
+    )
+    x.set_index([hostID], inplace=True)
+    ciri_count_matrix = pandas.concat(
+        [ciri_count_matrix, x], axis=1, join="outer", sort=False
+    )
 ciri_count_matrix.head()
 
 
 # In[6]:
 
 
-ciri_count_matrix.fillna(0,inplace=True)
+ciri_count_matrix.fillna(0, inplace=True)
 ciri_count_matrix.head()
-ciri_count_matrix.to_csv("ciri_count_matrix.txt",sep="\t",header=True)
+ciri_count_matrix.to_csv("ciri_count_matrix.txt", sep="\t", header=True)
 
 
 # In[7]:
 
 
-annotations=pandas.read_csv(lookupfile,sep="\t",header=0)
-annotations.set_index([hostID],inplace=True)
+annotations = pandas.read_csv(lookupfile, sep="\t", header=0)
+annotations.set_index([hostID], inplace=True)
 annotations.head()
 
 
 # In[8]:
 
 
-x=ciri_count_matrix.join(annotations)
-x.to_csv("ciri_count_matrix_with_annotations.txt",sep="\t",header=True)
+x = ciri_count_matrix.join(annotations)
+x.to_csv("ciri_count_matrix_with_annotations.txt", sep="\t", header=True)
 
 
 # In[ ]:
-
-
-
-
diff --git a/workflow/scripts/_add_geneid2genepred.py b/workflow/scripts/_add_geneid2genepred.py
index 6f21698..6f7bf6d 100755
--- a/workflow/scripts/_add_geneid2genepred.py
+++ b/workflow/scripts/_add_geneid2genepred.py
@@ -1,34 +1,35 @@
 import sys
-def get_id(s,whatid):
-	s=s.split()
-	for i,j in enumerate(s):
-		if j==whatid:
-			r=s[i+1]
-	r=r.replace('"','')
-	r=r.replace(';','')
-	return r
 
-		
 
-gtffile=sys.argv[1]
-transcript2gene=dict()
+def get_id(s, whatid):
+    s = s.split()
+    for i, j in enumerate(s):
+        if j == whatid:
+            r = s[i + 1]
+    r = r.replace('"', "")
+    r = r.replace(";", "")
+    return r
+
+
+gtffile = sys.argv[1]
+transcript2gene = dict()
 for i in open(gtffile).readlines():
-	if i.startswith("#"):
-		continue
-	i=i.strip().split("\t")
-	if i[2]!="transcript":
-		continue
-	gid=get_id(i[8],"gene_id")
-	tid=get_id(i[8],"transcript_id")
-#	print("%s\t%s"%(tid,gid))
-	transcript2gene[tid]=gid
+    if i.startswith("#"):
+        continue
+    i = i.strip().split("\t")
+    if i[2] != "transcript":
+        continue
+    gid = get_id(i[8], "gene_id")
+    tid = get_id(i[8], "transcript_id")
+    # 	print("%s\t%s"%(tid,gid))
+    transcript2gene[tid] = gid
 
 for i in open(sys.argv[2]).readlines():
-	j=i.strip().split("\t")
-	x=[]
-	tid=j.pop(0)
-	gid=transcript2gene[tid]
-	x.append(gid)
-	x.append(tid)
-	x.extend(j)
-	print("\t".join(x))
+    j = i.strip().split("\t")
+    x = []
+    tid = j.pop(0)
+    gid = transcript2gene[tid]
+    x.append(gid)
+    x.append(tid)
+    x.extend(j)
+    print("\t".join(x))
diff --git a/workflow/scripts/_append_splice_site_flanks_to_BSJs.py b/workflow/scripts/_append_splice_site_flanks_to_BSJs.py
index dd077ba..7dab224 100755
--- a/workflow/scripts/_append_splice_site_flanks_to_BSJs.py
+++ b/workflow/scripts/_append_splice_site_flanks_to_BSJs.py
@@ -6,41 +6,45 @@
 
 
 class BSJ:
-    def __init__(self,linestr):
-        l=linestr.strip().split("\t")
-        self.chrom=l[0]
-        self.start=l[1]
-        self.end=l[2]
-        self.name=l[3]
-        self.score=l[4]
-        self.strand=l[5]
-        self.bitids=l[6]
-        self.rids=l[7]
-        self.splice_site_flank_5="" #donor
-        self.splice_site_flank_3="" #acceptor
-    
+    def __init__(self, linestr):
+        l = linestr.strip().split("\t")
+        self.chrom = l[0]
+        self.start = l[1]
+        self.end = l[2]
+        self.name = l[3]
+        self.score = l[4]
+        self.strand = l[5]
+        self.bitids = l[6]
+        self.rids = l[7]
+        self.splice_site_flank_5 = ""  # donor
+        self.splice_site_flank_3 = ""  # acceptor
+
     def get_jid(self):
-        jid=self.chrom+"##"+str(self.start)+"##"+str(self.end)
+        jid = self.chrom + "##" + str(self.start) + "##" + str(self.end)
         return jid
-    
-    def add_flanks(self,sequences):
-        if self.strand == '+':
+
+    def add_flanks(self, sequences):
+        if self.strand == "+":
             coord = int(self.end)
-            self.splice_site_flank_5 = sequences[self.chrom][coord:coord+2]
+            self.splice_site_flank_5 = sequences[self.chrom][coord : coord + 2]
             coord = int(self.start)
-            self.splice_site_flank_3 = sequences[self.chrom][coord-2:coord]
-        elif self.strand == '-':
+            self.splice_site_flank_3 = sequences[self.chrom][coord - 2 : coord]
+        elif self.strand == "-":
             coord = int(self.end)
-            myseq = HTSeq.Sequence(bytes(sequences[self.chrom][coord:coord+2],'utf-8'),"myseq")
-            revcomp = myseq.get_reverse_complement().seq.decode('utf-8')
+            myseq = HTSeq.Sequence(
+                bytes(sequences[self.chrom][coord : coord + 2], "utf-8"), "myseq"
+            )
+            revcomp = myseq.get_reverse_complement().seq.decode("utf-8")
             self.splice_site_flank_3 = revcomp
             coord = int(self.start)
-            myseq = HTSeq.Sequence(bytes(sequences[self.chrom][coord-2:coord],'utf-8'),"myseq")
-            revcomp = myseq.get_reverse_complement().seq.decode('utf-8')
+            myseq = HTSeq.Sequence(
+                bytes(sequences[self.chrom][coord - 2 : coord], "utf-8"), "myseq"
+            )
+            revcomp = myseq.get_reverse_complement().seq.decode("utf-8")
             self.splice_site_flank_5 = revcomp
 
-    def write_out_BSJ(self,outbed):
-        t=[]
+    def write_out_BSJ(self, outbed):
+        t = []
         t.append(self.chrom)
         t.append(str(self.start))
         t.append(str(self.end))
@@ -49,8 +53,9 @@ def write_out_BSJ(self,outbed):
         t.append(self.strand)
         t.append(self.bitids)
         t.append(self.rids)
-        t.append("##".join([self.splice_site_flank_5,self.splice_site_flank_3]))
-        outbed.write("\t".join(t)+"\n")	        
+        t.append("##".join([self.splice_site_flank_5, self.splice_site_flank_3]))
+        outbed.write("\t".join(t) + "\n")
+
 
 def main():
     # debug = True
@@ -58,32 +63,51 @@ def main():
     parser = argparse.ArgumentParser(
         description="Append the BSJ Donor##Acceptor column to BSJ bed file. Input BSJ bed file is output from _create_circExplorer_BSJ_bam_pe or _create_circExplorer_BSJ_bam_se scripts."
     )
-    parser.add_argument("--reffa",dest="reffa",required=True,type=argparse.FileType('r'),default=sys.stdin,
-        help="reference fasta file")
-    parser.add_argument("--inbsjbedgz",dest="inbsjbedgz",required=True,type=str,
-        help="BSJ BED in gzip format")
-    parser.add_argument("--outbsjbedgz",dest="outbsjbedgz",required=True,type=str,
-        help="BSJ BED in gzip format")
+    parser.add_argument(
+        "--reffa",
+        dest="reffa",
+        required=True,
+        type=argparse.FileType("r"),
+        default=sys.stdin,
+        help="reference fasta file",
+    )
+    parser.add_argument(
+        "--inbsjbedgz",
+        dest="inbsjbedgz",
+        required=True,
+        type=str,
+        help="BSJ BED in gzip format",
+    )
+    parser.add_argument(
+        "--outbsjbedgz",
+        dest="outbsjbedgz",
+        required=True,
+        type=str,
+        help="BSJ BED in gzip format",
+    )
     args = parser.parse_args()
 
     print("Reading...reference sequences...")
-    sequences = dict((s[1], s[0]) for s in HTSeq.FastaReader(args.reffa, raw_iterator=True))
-    print("Done reading...%d sequences!"%(len(sequences)))
+    sequences = dict(
+        (s[1], s[0]) for s in HTSeq.FastaReader(args.reffa, raw_iterator=True)
+    )
+    print("Done reading...%d sequences!" % (len(sequences)))
 
     print("Reading/Writing...BSJs...")
     bsjs = dict()
-    with gzip.open(args.outbsjbedgz,'wt') as bsjfile:
-        with gzip.open(args.inbsjbedgz,'rt') as tfile:
+    with gzip.open(args.outbsjbedgz, "wt") as bsjfile:
+        with gzip.open(args.inbsjbedgz, "rt") as tfile:
             for l in tfile:
                 bsj = BSJ(l)
                 bsj.add_flanks(sequences)
                 bsj.write_out_BSJ(bsjfile)
-                bsjs[bsj.get_jid()]=1
+                bsjs[bsj.get_jid()] = 1
 
     tfile.close()
     bsjfile.close()
-    print("Done reading/writing...%d BSJs!"%(len(bsjs)))
+    print("Done reading/writing...%d BSJs!" % (len(bsjs)))
     print("Finished!")
 
+
 if __name__ == "__main__":
-    main()
\ No newline at end of file
+    main()
diff --git a/workflow/scripts/_bam_filter_BSJ_for_HQonly.py b/workflow/scripts/_bam_filter_BSJ_for_HQonly.py
index 8315b0b..7a7ae6c 100755
--- a/workflow/scripts/_bam_filter_BSJ_for_HQonly.py
+++ b/workflow/scripts/_bam_filter_BSJ_for_HQonly.py
@@ -3,43 +3,47 @@
 import pysam
 import os
 
-def read_regions(regionsfile,host,additives,viruses):
-    host=host.split(",")
-    additives=additives.split(",")
-    viruses=viruses.split(",")
-    infile=open(regionsfile,'r')
-    regions=dict()
+
+def read_regions(regionsfile, host, additives, viruses):
+    host = host.split(",")
+    additives = additives.split(",")
+    viruses = viruses.split(",")
+    infile = open(regionsfile, "r")
+    regions = dict()
     for l in infile.readlines():
         l = l.strip().split("\t")
-        region_name=l[0]
-        regions[region_name]=dict()
-        regions[region_name]['sequences']=dict()
+        region_name = l[0]
+        regions[region_name] = dict()
+        regions[region_name]["sequences"] = dict()
         if region_name in host:
-            regions[region_name]['host_additive_virus']="host"
+            regions[region_name]["host_additive_virus"] = "host"
         elif region_name in additives:
-            regions[region_name]['host_additive_virus']="additive"
+            regions[region_name]["host_additive_virus"] = "additive"
         elif region_name in viruses:
-            regions[region_name]['host_additive_virus']="virus"
+            regions[region_name]["host_additive_virus"] = "virus"
         else:
             exit("%s has unknown region. Its not a host or a additive or a virus!!")
-        sequence_names=l[1].split()
+        sequence_names = l[1].split()
         for s in sequence_names:
-            regions[region_name]['sequences'][s]=1
+            regions[region_name]["sequences"][s] = 1
     return regions
 
-def _get_host_additive_virus(regions,seqname):
-    for k,v in regions.items():
-        if seqname in v['sequences']:
-            return v['host_additive_virus']
+
+def _get_host_additive_virus(regions, seqname):
+    for k, v in regions.items():
+        if seqname in v["sequences"]:
+            return v["host_additive_virus"]
     else:
-        exit("Sequence: %s does not have a region."%(seqname))
+        exit("Sequence: %s does not have a region." % (seqname))
 
-def _get_regionname_from_seqname(regions,seqname):
-    for k,v in regions.items():
-        if seqname in v['sequences']:
+
+def _get_regionname_from_seqname(regions, seqname):
+    for k, v in regions.items():
+        if seqname in v["sequences"]:
             return k
     else:
-        exit("Sequence: %s does not have a region."%(seqname))     
+        exit("Sequence: %s does not have a region." % (seqname))
+
 
 def main():
     # debug = True
@@ -49,76 +53,128 @@ def main():
         This RG is used to extract reads from inbam and save them.
         """
     )
-    parser.add_argument("-i","--inbam",dest="inbam",required=True,type=str,
-        help="BSJ bam with RG set")
-    parser.add_argument('-t','--sample_counts_table', dest='countstable', type=str, required=True,
-        help='final all sample counts matrix')	# get coordinates of the circRNA
+    parser.add_argument(
+        "-i",
+        "--inbam",
+        dest="inbam",
+        required=True,
+        type=str,
+        help="BSJ bam with RG set",
+    )
+    parser.add_argument(
+        "-t",
+        "--sample_counts_table",
+        dest="countstable",
+        type=str,
+        required=True,
+        help="final all sample counts matrix",
+    )  # get coordinates of the circRNA
     # parser.add_argument("-o","--outbam",dest="outbam",required=True,type=argparse.FileType('w'),
     #     help="Output bam file ... both strands")
-    parser.add_argument("-s",'--sample_name', dest='samplename', type=str, required=False, default = 'sample1',
-        help='Sample Name')
-    parser.add_argument("-o","--outbam",dest="outbam",required=True,type=str,
-        help="Output bam file ... both strands")
-    parser.add_argument('--regions', dest='regions', type=str, required=True,
-        help='regions file eg. ref.fa.regions')
-    parser.add_argument('--host', dest='host', type=str, required=True,
-        help='host name eg.hg38... single value')
-    parser.add_argument('--additives', dest='additives', type=str, required=True,
-        help='additive name(s) eg.ERCC... comma-separated list... all BSJs in this region are filtered out')
-    parser.add_argument('--viruses', dest='viruses', type=str, required=True,
-        help='virus name(s) eg.NC_009333.1... comma-separated list')
-    args = parser.parse_args()		
-
-    indf = pd.read_csv(args.countstable,sep="\t",header=0,compression='gzip')
-    indf = indf.loc[indf['HQ']=="Y"]
+    parser.add_argument(
+        "-s",
+        "--sample_name",
+        dest="samplename",
+        type=str,
+        required=False,
+        default="sample1",
+        help="Sample Name",
+    )
+    parser.add_argument(
+        "-o",
+        "--outbam",
+        dest="outbam",
+        required=True,
+        type=str,
+        help="Output bam file ... both strands",
+    )
+    parser.add_argument(
+        "--regions",
+        dest="regions",
+        type=str,
+        required=True,
+        help="regions file eg. ref.fa.regions",
+    )
+    parser.add_argument(
+        "--host",
+        dest="host",
+        type=str,
+        required=True,
+        help="host name eg.hg38... single value",
+    )
+    parser.add_argument(
+        "--additives",
+        dest="additives",
+        type=str,
+        required=True,
+        help="additive name(s) eg.ERCC... comma-separated list... all BSJs in this region are filtered out",
+    )
+    parser.add_argument(
+        "--viruses",
+        dest="viruses",
+        type=str,
+        required=True,
+        help="virus name(s) eg.NC_009333.1... comma-separated list",
+    )
+    args = parser.parse_args()
+
+    indf = pd.read_csv(args.countstable, sep="\t", header=0, compression="gzip")
+    indf = indf.loc[indf["HQ"] == "Y"]
 
     RGlist = dict()
-    for index,row in indf.iterrows():
-        jid = row['chrom']+"##"+str(row['start'])+"##"+str(row['end'])
-        RGlist[jid]=1
-    print("Number of RGs: ",len(RGlist))
+    for index, row in indf.iterrows():
+        jid = row["chrom"] + "##" + str(row["start"]) + "##" + str(row["end"])
+        RGlist[jid] = 1
+    print("Number of RGs: ", len(RGlist))
 
     samfile = pysam.AlignmentFile(args.inbam, "rb")
     samheader = samfile.header.to_dict()
 
     sequences = list()
-    for v in samheader['SQ']:
-        sequences.append(v['SN'])
-    seqname2regionname=dict()
-    hosts=set()
-    viruses=set()
-    regions = read_regions(regionsfile=args.regions,host=args.host,additives=args.additives,viruses=args.viruses)
+    for v in samheader["SQ"]:
+        sequences.append(v["SN"])
+    seqname2regionname = dict()
+    hosts = set()
+    viruses = set()
+    regions = read_regions(
+        regionsfile=args.regions,
+        host=args.host,
+        additives=args.additives,
+        viruses=args.viruses,
+    )
     for s in sequences:
-        hav = _get_host_additive_virus(regions,s)
+        hav = _get_host_additive_virus(regions, s)
         if hav == "host":
-            hostname = _get_regionname_from_seqname(regions,s)
-            seqname2regionname[s]=hostname
+            hostname = _get_regionname_from_seqname(regions, s)
+            seqname2regionname[s] = hostname
             hosts.add(hostname)
         if hav == "virus":
-            virusname = _get_regionname_from_seqname(regions,s)
-            seqname2regionname[s]=virusname
+            virusname = _get_regionname_from_seqname(regions, s)
+            seqname2regionname[s] = virusname
             viruses.add(virusname)
 
-
     outbam = pysam.AlignmentFile(args.outbam, "wb", template=samfile)
     outputbams = dict()
     outdir = os.path.dirname(args.outbam)
     for h in hosts:
-        outbamname = os.path.join(outdir,args.samplename+"."+h+".HQ_only.BSJ.bam")
-        outputbams[h] = pysam.AlignmentFile(outbamname, "wb", header = samheader)
+        outbamname = os.path.join(
+            outdir, args.samplename + "." + h + ".HQ_only.BSJ.bam"
+        )
+        outputbams[h] = pysam.AlignmentFile(outbamname, "wb", header=samheader)
 
     for v in viruses:
-        outbamname = os.path.join(outdir,args.samplename+"."+v+".HQ_only.BSJ.bam")
-        outputbams[v] = pysam.AlignmentFile(outbamname, "wb", header = samheader)
-
+        outbamname = os.path.join(
+            outdir, args.samplename + "." + v + ".HQ_only.BSJ.bam"
+        )
+        outputbams[v] = pysam.AlignmentFile(outbamname, "wb", header=samheader)
 
     for read in samfile.fetch():
         rg = read.get_tag("RG")
         rg = rg.split("##")
-        rg = rg[:len(rg)-1]
+        rg = rg[: len(rg) - 1]
         rg = "##".join(rg)
         if rg in RGlist:
-            regionname=_get_regionname_from_seqname(regions,read.reference_name)
+            regionname = _get_regionname_from_seqname(regions, read.reference_name)
             if regionname in hosts:
                 outputbams[regionname].write(read)
             if regionname in viruses:
@@ -126,7 +182,7 @@ def main():
             outbam.write(read)
     samfile.close()
     outbam.close()
-    for k,v in outputbams.items():
+    for k, v in outputbams.items():
         v.close()
 
 
diff --git a/workflow/scripts/_bam_get_alignment_stats.py b/workflow/scripts/_bam_get_alignment_stats.py
index 61d0720..edd9ae6 100755
--- a/workflow/scripts/_bam_get_alignment_stats.py
+++ b/workflow/scripts/_bam_get_alignment_stats.py
@@ -1,52 +1,71 @@
 #!/usr/bin/env python3
 import argparse
 import pysam
-    
+
+
 def read_regions(regionsfile):
-    infile=open(regionsfile,'r')
-    regions=dict()
+    infile = open(regionsfile, "r")
+    regions = dict()
     for l in infile.readlines():
         l = l.strip().split("\t")
-        region_name=l[0]
-        regions[region_name]=dict()
-        regions[region_name]['sequences']=dict()
-        sequence_names=l[1].split()
+        region_name = l[0]
+        regions[region_name] = dict()
+        regions[region_name]["sequences"] = dict()
+        sequence_names = l[1].split()
         for s in sequence_names:
-            regions[region_name]['sequences'][s]=dict()
-    return regions        
+            regions[region_name]["sequences"][s] = dict()
+    return regions
 
 
 def main():
-    parser = argparse.ArgumentParser(description='Find BAM alignment stats for each region.')
-    parser.add_argument('--inbam', dest='inbam', type=str, required=True,
-        help='Input BAM file')
-    parser.add_argument('--regions', dest='regions', type=str, required=True,
-        help='regions file eg. ref.fa.regions')
+    parser = argparse.ArgumentParser(
+        description="Find BAM alignment stats for each region."
+    )
+    parser.add_argument(
+        "--inbam", dest="inbam", type=str, required=True, help="Input BAM file"
+    )
+    parser.add_argument(
+        "--regions",
+        dest="regions",
+        type=str,
+        required=True,
+        help="regions file eg. ref.fa.regions",
+    )
     # parser.add_argument("--out",dest="outjson",required=True,type=str,
     #     help="Output stats in JSON format")
-    parser.add_argument('-p',"--pe",dest="pe",required=False,action='store_true', default=False,
-        help="set this if BAM is paired end")
-    args = parser.parse_args()		
+    parser.add_argument(
+        "-p",
+        "--pe",
+        dest="pe",
+        required=False,
+        action="store_true",
+        default=False,
+        help="set this if BAM is paired end",
+    )
+    args = parser.parse_args()
     samfile = pysam.AlignmentFile(args.inbam, "rb")
     regions = read_regions(regionsfile=args.regions)
     region_names = regions.keys()
     for read in samfile.fetch():
-        if args.pe and ( read.reference_id != read.next_reference_id ): continue    # only works for PE ... for SE read.next_reference_id is -1
-        if args.pe and ( not read.is_proper_pair ): continue
-        if read.is_secondary or read.is_supplementary or read.is_unmapped : continue
+        if args.pe and (read.reference_id != read.next_reference_id):
+            continue  # only works for PE ... for SE read.next_reference_id is -1
+        if args.pe and (not read.is_proper_pair):
+            continue
+        if read.is_secondary or read.is_supplementary or read.is_unmapped:
+            continue
         rid = read.query_name
         refname = samfile.get_reference_name(read.reference_id)
         for region in region_names:
-            if refname in regions[region]['sequences']:
-                regions[region]['sequences'][refname][rid]=1
+            if refname in regions[region]["sequences"]:
+                regions[region]["sequences"][refname][rid] = 1
                 break
     samfile.close()
     for region in regions:
-        counts=0
-        for refname in regions[region]['sequences'].keys():
-            counts += len(regions[region]['sequences'][refname])
-        print("%d\t%s"%(counts,region))
+        counts = 0
+        for refname in regions[region]["sequences"].keys():
+            counts += len(regions[region]["sequences"][refname])
+        print("%d\t%s" % (counts, region))
 
 
 if __name__ == "__main__":
-    main()
\ No newline at end of file
+    main()
diff --git a/workflow/scripts/_bamtobed2readendsbed.py b/workflow/scripts/_bamtobed2readendsbed.py
index 4113f3d..28e5b87 100755
--- a/workflow/scripts/_bamtobed2readendsbed.py
+++ b/workflow/scripts/_bamtobed2readendsbed.py
@@ -2,23 +2,30 @@
 
 import argparse
 
+
 def main():
     # debug = True
     debug = False
-    parser = argparse.ArgumentParser(
-    )
+    parser = argparse.ArgumentParser()
     # INPUTs
-    parser.add_argument("-i","--inbed",dest="inbed",required=True,type=str,
-        help="Input bamtobed bed file")
-    parser.add_argument('-o',"--outbed",dest="outbed",required=True,type=str,
-        help="Output bed file")
+    parser.add_argument(
+        "-i",
+        "--inbed",
+        dest="inbed",
+        required=True,
+        type=str,
+        help="Input bamtobed bed file",
+    )
+    parser.add_argument(
+        "-o", "--outbed", dest="outbed", required=True, type=str, help="Output bed file"
+    )
     args = parser.parse_args()
-    outbed = open(args.outbed,'w')
-    with open(args.inbed,'r') as inbed:
+    outbed = open(args.outbed, "w")
+    with open(args.inbed, "r") as inbed:
         for l in inbed:
-            l=l.strip().split("\t")
-            l1=[]
-            l2=[]
+            l = l.strip().split("\t")
+            l1 = []
+            l2 = []
             l1.append(l[0])
             l2.append(l[0])
             l1.append(l[1])
@@ -26,31 +33,32 @@ def main():
             l2.append(l[2])
             l2.append(l[2])
             if "/" in l[3]:
-                x=l[3].split("/")
-                readname=x[0]
-                if x[1]=="1":
-                    strand=l[5]
+                x = l[3].split("/")
+                readname = x[0]
+                if x[1] == "1":
+                    strand = l[5]
                 else:
-                    if l[5]=="-": 
-                        strand="+"
-                    elif l[5]=="+":
-                        strand="-"
+                    if l[5] == "-":
+                        strand = "+"
+                    elif l[5] == "+":
+                        strand = "-"
                     else:
-                        strand=l[5]
+                        strand = l[5]
             else:
-                strand=l[5]
-                readname=l[3]
-            readname+="##"+strand
+                strand = l[5]
+                readname = l[3]
+            readname += "##" + strand
             l1.append(readname)
             l2.append(readname)
             l1.append(".")
             l2.append(".")
             l1.append(strand)
             l2.append(strand)
-            outbed.write("\t".join(l1)+"\n")
-            outbed.write("\t".join(l2)+"\n")
+            outbed.write("\t".join(l1) + "\n")
+            outbed.write("\t".join(l2) + "\n")
     inbed.close()
     outbed.close()
 
+
 if __name__ == "__main__":
     main()
diff --git a/workflow/scripts/_bedintersect_to_rid2jid.py b/workflow/scripts/_bedintersect_to_rid2jid.py
index 231073c..38d5830 100755
--- a/workflow/scripts/_bedintersect_to_rid2jid.py
+++ b/workflow/scripts/_bedintersect_to_rid2jid.py
@@ -2,38 +2,73 @@
 import sys
 import gzip
 
+
 def main():
     # debug = True
     debug = False
-    parser = argparse.ArgumentParser(
+    parser = argparse.ArgumentParser()
+    parser.add_argument(
+        "-i",
+        "--bedinteresection",
+        dest="bedint",
+        required=True,
+        type=argparse.FileType("r"),
+        default=sys.stdin,
+        help="Input BED intersection file",
+    )
+    parser.add_argument(
+        "-o",
+        "--rid2jid",
+        dest="outtsv",
+        required=True,
+        type=str,
+        help="Output tsv... gziped",
+    )
+    parser.add_argument(
+        "-m",
+        "--maxdist",
+        dest="maxdist",
+        required=True,
+        type=int,
+        help="Max dist from BSJ coordinate",
     )
-    parser.add_argument("-i","--bedinteresection",dest="bedint",required=True,type=argparse.FileType('r'),default=sys.stdin,
-        help="Input BED intersection file")
-    parser.add_argument("-o","--rid2jid",dest="outtsv",required=True,type=str,
-        help="Output tsv... gziped")
-    parser.add_argument("-m","--maxdist",dest="maxdist",required=True,type=int,
-        help="Max dist from BSJ coordinate")
     args = parser.parse_args()
     # outfile = open(args.outtsv,'w')
     # for l in args.bedint.readlines():
-    with gzip.open(args.outtsv,'wt') as outfile:
+    with gzip.open(args.outtsv, "wt") as outfile:
         for l in args.bedint:
-            l=l.strip().split("\t")
+            l = l.strip().split("\t")
             # print(l)
             # print(" abs(int(l[2])-int(l[10])) <= args.maxdist :", abs(int(l[2])-int(l[10])),(abs(int(l[2])-int(l[10])) <= args.maxdist ))
             # print(" abs(int(l[1])-int(l[9])) <= args.maxdist :",  abs(int(l[1])-int(l[9])),(abs(int(l[1])-int(l[9])) <= args.maxdist))
             # print(" abs(int(l[2])-int(l[9])) <= args.maxdist : ", abs(int(l[2])-int(l[9])),(abs(int(l[2])-int(l[9])) <= args.maxdist))
             # print(" abs(int(l[1])-int(l[10])) <= args.maxdist :", abs(int(l[1])-int(l[10])),(abs(int(l[1])-int(l[10])) <= args.maxdist))
-            if ( abs(int(l[2])-int(l[11])) <= args.maxdist ) or ( abs(int(l[1])-int(l[10])) <= args.maxdist ) or ( abs(int(l[2])-int(l[10])) <= args.maxdist ) or ( abs(int(l[1])-int(l[11])) <= args.maxdist ):
-                jid=l[0]+"##"+l[1]+"##"+str(int(l[2])-1)+"##"+l[5]+"##"+l[-1] # jid format is chrom##start##end##strand##read_strand
+            if (
+                (abs(int(l[2]) - int(l[11])) <= args.maxdist)
+                or (abs(int(l[1]) - int(l[10])) <= args.maxdist)
+                or (abs(int(l[2]) - int(l[10])) <= args.maxdist)
+                or (abs(int(l[1]) - int(l[11])) <= args.maxdist)
+            ):
+                jid = (
+                    l[0]
+                    + "##"
+                    + l[1]
+                    + "##"
+                    + str(int(l[2]) - 1)
+                    + "##"
+                    + l[5]
+                    + "##"
+                    + l[-1]
+                )  # jid format is chrom##start##end##strand##read_strand
                 # outl=l[3:]
-                rid=l[12]
-                outl=[rid]
+                rid = l[12]
+                outl = [rid]
                 outl.append(jid)
-                outstr="\t".join(outl)
-                outfile.write("%s\n"%(outstr))
+                outstr = "\t".join(outl)
+                outfile.write("%s\n" % (outstr))
         args.bedint.close()
     outfile.close()
 
+
 if __name__ == "__main__":
-    main()
\ No newline at end of file
+    main()
diff --git a/workflow/scripts/_bedpe2bed.py b/workflow/scripts/_bedpe2bed.py
index aa62b04..7d607b6 100755
--- a/workflow/scripts/_bedpe2bed.py
+++ b/workflow/scripts/_bedpe2bed.py
@@ -4,21 +4,23 @@
 import gzip
 import pprint
 
+
 def main():
     # debug = True
     debug = False
-    parser = argparse.ArgumentParser(
+    parser = argparse.ArgumentParser()
+    parser.add_argument(
+        "-i", "--bedpe", dest="bedpe", required=True, type=str, help="Input BEDPE file"
+    )
+    parser.add_argument(
+        "-o", "--bed", dest="bed", required=True, type=str, help="Output BED file"
     )
-    parser.add_argument("-i","--bedpe",dest="bedpe",required=True,type=str,
-        help="Input BEDPE file")
-    parser.add_argument("-o","--bed",dest="bed",required=True,type=str,
-        help="Output BED file")
     args = parser.parse_args()
-    infile = open(args.bedpe,'r')
-    outfile = open(args.bed,'w')
+    infile = open(args.bedpe, "r")
+    outfile = open(args.bed, "w")
     for x in infile.readlines():
-        x=x.strip().split("\t")
-        chrom=x[0]
+        x = x.strip().split("\t")
+        chrom = x[0]
         if int(x[1]) < int(x[4]):
             left = x[1]
         else:
@@ -29,12 +31,14 @@ def main():
             right = x[5]
         rid = x[6]
         score = x[7]
-        strand = x[8] # read1 strand
-        outfile.write("%s\t%s\t%s\t%s\t%s\t%s\n"%(chrom,left,right,rid,score,strand))
-        		
+        strand = x[8]  # read1 strand
+        outfile.write(
+            "%s\t%s\t%s\t%s\t%s\t%s\n" % (chrom, left, right, rid, score, strand)
+        )
+
     infile.close()
     outfile.close()
 
 
 if __name__ == "__main__":
-    main()
\ No newline at end of file
+    main()
diff --git a/workflow/scripts/_circExplorer_BSJ_get_strand.py b/workflow/scripts/_circExplorer_BSJ_get_strand.py
index a7d074f..6a21ee3 100755
--- a/workflow/scripts/_circExplorer_BSJ_get_strand.py
+++ b/workflow/scripts/_circExplorer_BSJ_get_strand.py
@@ -1,13 +1,17 @@
 import sys
-stats=dict()
-mreads=int(sys.argv[3]) # minreads
-#read junction.filter1
+
+stats = dict()
+mreads = int(sys.argv[3])  # minreads
+# read junction.filter1
 with open(sys.argv[1]) as junction:
     for l in junction.readlines():
-        l=l.strip().split("\t")
-        if l[0]!=l[3]:continue
-        if l[2]!=l[5]:continue
-        if l[1]==l[4]:continue
+        l = l.strip().split("\t")
+        if l[0] != l[3]:
+            continue
+        if l[2] != l[5]:
+            continue
+        if l[1] == l[4]:
+            continue
         if int(l[1]) > int(l[4]):
             end = l[1]
             start = l[4]
@@ -16,28 +20,29 @@
             start = l[1]
         jid = l[0] + "##" + start + "##" + end
         if not jid in stats:
-            stats[jid]=dict()
-            stats[jid]["+"]=0
-            stats[jid]["-"]=0
-        stats[jid][l[2]]+=1
-#read back_spliced_junction.filter2.bed
+            stats[jid] = dict()
+            stats[jid]["+"] = 0
+            stats[jid]["-"] = 0
+        stats[jid][l[2]] += 1
+# read back_spliced_junction.filter2.bed
 with open(sys.argv[2]) as bsjbed:
     for l in bsjbed.readlines():
-        l=l.strip().split("\t")
-        if l[1]==l[2]:continue
-        jname,count=l[3].split("/")
-        if int(count)<mreads:
+        l = l.strip().split("\t")
+        if l[1] == l[2]:
+            continue
+        jname, count = l[3].split("/")
+        if int(count) < mreads:
             continue
         else:
-            l[3]=str(count)
-        end=int(l[2])+1
-        bsjid=l[0] + "##" + l[1] + "##" + str(end)
+            l[3] = str(count)
+        end = int(l[2]) + 1
+        bsjid = l[0] + "##" + l[1] + "##" + str(end)
         if not bsjid in stats:
-            strand="."
+            strand = "."
         else:
             if stats[bsjid]["+"] > stats[bsjid]["-"]:
-                strand="+"
+                strand = "+"
             else:
-                strand="-"
-        l[5]=strand
-        print("\t".join(l))
\ No newline at end of file
+                strand = "-"
+        l[5] = strand
+        print("\t".join(l))
diff --git a/workflow/scripts/_collapse_find_circ.py b/workflow/scripts/_collapse_find_circ.py
index 209aea4..fa406a8 100755
--- a/workflow/scripts/_collapse_find_circ.py
+++ b/workflow/scripts/_collapse_find_circ.py
@@ -1,20 +1,21 @@
 import sys
-collection=dict()
+
+collection = dict()
 for f in sys.stdin:
-        f=f.strip().split("\t")
-        circid="##".join([f[0],f[1],f[2],f[5]])
-        if not circid in collection:
-                collection[circid]=dict()
-                collection[circid]['fullline']=f
-                collection[circid]['count']=int(f[4])
-        else:
-                collection[circid]['count']+=int(f[4])
-#header=["chrom","start","end","name","n_reads","strand","n_uniq","uniq_bridges","best_qual_left","best_qual_right","tissues","tiss_counts","edits","anchor_overlap","breakpoints","signal","strandmatch","category"]
-#print("\t".join(header))
-count=0
-for k,v in collection.items():
-        count+=1
-        x=v['fullline']
-        x[3]=str(count)
-        x[4]=str(v['count'])
-        print("\t".join(x))
\ No newline at end of file
+    f = f.strip().split("\t")
+    circid = "##".join([f[0], f[1], f[2], f[5]])
+    if not circid in collection:
+        collection[circid] = dict()
+        collection[circid]["fullline"] = f
+        collection[circid]["count"] = int(f[4])
+    else:
+        collection[circid]["count"] += int(f[4])
+# header=["chrom","start","end","name","n_reads","strand","n_uniq","uniq_bridges","best_qual_left","best_qual_right","tissues","tiss_counts","edits","anchor_overlap","breakpoints","signal","strandmatch","category"]
+# print("\t".join(header))
+count = 0
+for k, v in collection.items():
+    count += 1
+    x = v["fullline"]
+    x[3] = str(count)
+    x[4] = str(v["count"])
+    print("\t".join(x))
diff --git a/workflow/scripts/_compare_lists.py b/workflow/scripts/_compare_lists.py
index 6a8f8ff..855e82e 100755
--- a/workflow/scripts/_compare_lists.py
+++ b/workflow/scripts/_compare_lists.py
@@ -2,34 +2,55 @@
 import matplotlib
 import numpy
 import scipy
-#from matplotlib_venn import venn2
-#import matplotlib.pyplot as plt
 
-if len(sys.argv)<3:
-	print("python %s a_list b_list"%(sys.argv[0]))
-	exit()
-a_set=set(list(filter(lambda x:x!="",list(map(lambda x:x.strip().split("\t")[0],open(sys.argv[1]).readlines())))))
-b_set=set(list(filter(lambda x:x!="",list(map(lambda x:x.strip().split("\t")[0],open(sys.argv[2]).readlines())))))
-a_intersect_b=a_set.intersection(b_set)
-a_union_b=a_set.union(b_set)
-a_only=a_set-b_set
-b_only=b_set-a_set
-print("Size of a_list=%d"%(len(a_set)))
-print("Size of b_list=%d"%(len(b_set)))
-print("a interset b=%d"%(len(a_intersect_b)))
-print("a union b=%d"%(len(a_union_b)))
-print("only a=%d"%(len(a_only)))
-print("only b=%d"%(len(b_only)))
-if len(sys.argv)==4:
-	def write_list_to_file(a_set,filename):
-		o=open(filename,'w')
-		for g in a_set:
-			o.write("%s\n"%(g))
-		o.close()
-	write_list_to_file(a_intersect_b,"a_intersect_b.lst")
-	write_list_to_file(a_union_b,"a_union_b.lst")
-	write_list_to_file(a_only,"a_only.lst")
-	write_list_to_file(b_only,"b_only.lst")
-#venn2(subsets = (len(a_only), len(b_only), len(a_intersect_b)))
-#plt.savefig("ab_venn.png")
-exit()
\ No newline at end of file
+# from matplotlib_venn import venn2
+# import matplotlib.pyplot as plt
+
+if len(sys.argv) < 3:
+    print("python %s a_list b_list" % (sys.argv[0]))
+    exit()
+a_set = set(
+    list(
+        filter(
+            lambda x: x != "",
+            list(
+                map(lambda x: x.strip().split("\t")[0], open(sys.argv[1]).readlines())
+            ),
+        )
+    )
+)
+b_set = set(
+    list(
+        filter(
+            lambda x: x != "",
+            list(
+                map(lambda x: x.strip().split("\t")[0], open(sys.argv[2]).readlines())
+            ),
+        )
+    )
+)
+a_intersect_b = a_set.intersection(b_set)
+a_union_b = a_set.union(b_set)
+a_only = a_set - b_set
+b_only = b_set - a_set
+print("Size of a_list=%d" % (len(a_set)))
+print("Size of b_list=%d" % (len(b_set)))
+print("a interset b=%d" % (len(a_intersect_b)))
+print("a union b=%d" % (len(a_union_b)))
+print("only a=%d" % (len(a_only)))
+print("only b=%d" % (len(b_only)))
+if len(sys.argv) == 4:
+
+    def write_list_to_file(a_set, filename):
+        o = open(filename, "w")
+        for g in a_set:
+            o.write("%s\n" % (g))
+        o.close()
+
+    write_list_to_file(a_intersect_b, "a_intersect_b.lst")
+    write_list_to_file(a_union_b, "a_union_b.lst")
+    write_list_to_file(a_only, "a_only.lst")
+    write_list_to_file(b_only, "b_only.lst")
+# venn2(subsets = (len(a_only), len(b_only), len(a_intersect_b)))
+# plt.savefig("ab_venn.png")
+exit()
diff --git a/workflow/scripts/_create_circExplorer_BSJ_bam_pe.py b/workflow/scripts/_create_circExplorer_BSJ_bam_pe.py
index c014324..618f06e 100755
--- a/workflow/scripts/_create_circExplorer_BSJ_bam_pe.py
+++ b/workflow/scripts/_create_circExplorer_BSJ_bam_pe.py
@@ -5,9 +5,11 @@
 import os
 import time
 
+
 def get_ctime():
     return time.ctime(time.time())
 
+
 """
 
 This script first validates each read to be "valid" BSJ read and then splits a BSJ bam file by strand into:
@@ -16,10 +18,10 @@ def get_ctime():
 3. BSJ bed file with score(number of reads supporting the BSJ) and strand information
 Logic (for PE reads):
 Each BSJ is represented by a 3 alignments in the output BAM file.
-Alignment 1 is complete alignment of one of the reads in pair and 
-Alignments 2 and 3 are split alignment of the mate at two distinct loci on the same reference 
+Alignment 1 is complete alignment of one of the reads in pair and
+Alignments 2 and 3 are split alignment of the mate at two distinct loci on the same reference
 chromosome.
-These alignments are grouped together by the "HI" tags in SAM file. For example, all 3 
+These alignments are grouped together by the "HI" tags in SAM file. For example, all 3
 alignments for the same BSJ will have the same "HI" value... something like "HI:i:1".
 BSJ alignment sam bitflag combinations can have 8 different possibilities, 4 from sense strand
 and 4 from anti-sense strand:
@@ -35,12 +37,12 @@ def get_ctime():
 #         |<------------------BSJ----------------->|
 3. 83,163,2209
 4. 339,419,2465
-#         						  R1									  
-#       						<------									
+#         						  R1
+#       						<------
 #     5'--|------------------------------------------|---3'
 #     3'--|------------------------------------------|---5'
 #         |------>							  ------>|
-#         | R2.2								R2.1 | 
+#         | R2.2								R2.1 |
 #         |                                          |
 #         |<-----------------BSJ-------------------->|
 5. 99,147,2193
@@ -55,12 +57,12 @@ def get_ctime():
 #         |<------------------BSJ----------------->|
 7. 99,147,2145
 8. 355, 403, 2401
-#         						  R2									  
-#       						<------									
+#         						  R2
+#       						<------
 #     5'--|------------------------------------------|---3'
 #     3'--|------------------------------------------|---5'
 #         |------>							  ------>|
-#         | R1.2								R1.1 | 
+#         | R1.2								R1.1 |
 #         |                                          |
 #         |<-----------------BSJ-------------------->|
 """
@@ -68,38 +70,38 @@ def get_ctime():
 
 class BSJ:
     def __init__(self):
-        self.chrom=""
-        self.start=""
-        self.end=""
-        self.score=0
-        self.name="."
-        self.strand="U"
-        self.bitids=set()
-        self.rids=set()
-        
+        self.chrom = ""
+        self.start = ""
+        self.end = ""
+        self.score = 0
+        self.name = "."
+        self.strand = "U"
+        self.bitids = set()
+        self.rids = set()
+
     def plusone(self):
-        self.score+=1
-    
-    def set_strand(self,strand):
-        self.strand=strand
-    
-    def set_chrom(self,chrom):
-        self.chrom=chrom
-
-    def set_start(self,start):
-        self.start=start
-
-    def set_end(self,end):
-        self.end=end
-    
-    def append_bitid(self,bitid):
+        self.score += 1
+
+    def set_strand(self, strand):
+        self.strand = strand
+
+    def set_chrom(self, chrom):
+        self.chrom = chrom
+
+    def set_start(self, start):
+        self.start = start
+
+    def set_end(self, end):
+        self.end = end
+
+    def append_bitid(self, bitid):
         self.bitids.add(bitid)
 
-    def append_rid(self,rid):
+    def append_rid(self, rid):
         self.rids.add(rid)
-        
-    def write_out_BSJ(self,outbed):
-        t=[]
+
+    def write_out_BSJ(self, outbed):
+        t = []
         t.append(self.chrom)
         t.append(str(self.start))
         t.append(str(self.end))
@@ -108,149 +110,164 @@ def write_out_BSJ(self,outbed):
         t.append(self.strand)
         t.append(",".join(self.bitids))
         t.append(",".join(self.rids))
-        outbed.write("\t".join(t)+"\n")
+        outbed.write("\t".join(t) + "\n")
 
-    def update_score_and_found_count(self,junctions_found):
+    def update_score_and_found_count(self, junctions_found):
         self.score = len(self.rids)
-        jid = self.chrom + "##" + str(self.start) + "##" + str(int(self.end)-1) + "##" + self.strand
-        junctions_found[jid]+=self.score
+        jid = (
+            self.chrom
+            + "##"
+            + str(self.start)
+            + "##"
+            + str(int(self.end) - 1)
+            + "##"
+            + self.strand
+        )
+        junctions_found[jid] += self.score
+
 
-        
 class Readinfo:
-    def __init__(self,readid,rname):
-        self.readid=readid
-        self.refname=rname
-        self.bitflags=list()
-        self.bitid=""
-        self.strand="."
-        self.start=-1
-        self.end=-1
-        self.refcoordinates=dict()
-        self.isread1=dict()
-        self.isreverse=dict()
-        self.issecondary=dict()
-        self.issupplementary=dict()
-    
+    def __init__(self, readid, rname):
+        self.readid = readid
+        self.refname = rname
+        self.bitflags = list()
+        self.bitid = ""
+        self.strand = "."
+        self.start = -1
+        self.end = -1
+        self.refcoordinates = dict()
+        self.isread1 = dict()
+        self.isreverse = dict()
+        self.issecondary = dict()
+        self.issupplementary = dict()
+
     def __str__(self):
-        s = "readid: %s"%(self.readid)
-        s = "%s\tbitflags: %s"%(s,self.bitflags)
-        s = "%s\tbitid: %s"%(s,self.bitid)
+        s = "readid: %s" % (self.readid)
+        s = "%s\tbitflags: %s" % (s, self.bitflags)
+        s = "%s\tbitid: %s" % (s, self.bitid)
         for bf in self.bitflags:
-            s = "%s\t%s\trefcoordinates: %s"%(s,bf,", ".join(list(map(lambda x:str(x),self.refcoordinates[bf]))))
+            s = "%s\t%s\trefcoordinates: %s" % (
+                s,
+                bf,
+                ", ".join(list(map(lambda x: str(x), self.refcoordinates[bf]))),
+            )
         return s
 
-    def set_refcoordinates(self,bitflag,refpos):
-        self.refcoordinates[bitflag]=refpos
-    
-    def set_read1_reverse_secondary_supplementary(self,bitflag,read):
+    def set_refcoordinates(self, bitflag, refpos):
+        self.refcoordinates[bitflag] = refpos
+
+    def set_read1_reverse_secondary_supplementary(self, bitflag, read):
         if read.is_read1:
-            self.isread1[bitflag]="Y"
+            self.isread1[bitflag] = "Y"
         else:
-            self.isread1[bitflag]="N"
+            self.isread1[bitflag] = "N"
         if read.is_reverse:
-            self.isreverse[bitflag]="Y"
+            self.isreverse[bitflag] = "Y"
         else:
-            self.isreverse[bitflag]="N"
+            self.isreverse[bitflag] = "N"
         if read.is_secondary:
-            self.issecondary[bitflag]="Y"
+            self.issecondary[bitflag] = "Y"
         else:
-            self.issecondary[bitflag]="N"
+            self.issecondary[bitflag] = "N"
         if read.is_supplementary:
-            self.issupplementary[bitflag]="Y"
+            self.issupplementary[bitflag] = "Y"
         else:
-            self.issupplementary[bitflag]="N"
-    
-    def append_alignment(self,read):
+            self.issupplementary[bitflag] = "N"
+
+    def append_alignment(self, read):
         self.alignments.append(read)
-    
-    def append_bitflag(self,bf):
+
+    def append_bitflag(self, bf):
         self.bitflags.append(bf)
-    
+
     # def extend_ref_positions(self,refcoords):
     # 	self.refcoordinates.extend(refcoords)
-    
+
     def generate_bitid(self):
-        bitlist=sorted(self.bitflags)
-        self.bitid="##".join(list(map(lambda x:str(x),bitlist)))
-# 		self.bitid=str(bitlist[0])+"##"+str(bitlist[1])+"##"+str(bitlist[2])
-    
+        bitlist = sorted(self.bitflags)
+        self.bitid = "##".join(list(map(lambda x: str(x), bitlist)))
+
+    # 		self.bitid=str(bitlist[0])+"##"+str(bitlist[1])+"##"+str(bitlist[2])
+
     def get_strand(self):
-        if self.bitid=="83##163##2129":
-            self.strand="+"
-        elif self.bitid=="339##419##2385":
-            self.strand="+"
-        elif self.bitid=="83##163##2209":
-            self.strand="+"
-        elif self.bitid=="339##419##2465":
-            self.strand="+"		
-        elif self.bitid=="99##147##2193":
-            self.strand="-"
-        elif self.bitid=="355##403##2449":
-            self.strand="-"
-        elif self.bitid=="99##147##2145":
-            self.strand="-"
-        elif self.bitid=="355##403##2401":
-            self.strand="-"
-        elif self.bitid=="16##2064":
-            self.strand="+"
-        elif self.bitid=="272##2320":
-            self.strand="+"
-        elif self.bitid=="0##2048":
-            self.strand="-"
-        elif self.bitid=="256##2304":
-            self.strand="-"
-        elif self.bitid=="153##2201":
-            self.strand="-"
+        if self.bitid == "83##163##2129":
+            self.strand = "+"
+        elif self.bitid == "339##419##2385":
+            self.strand = "+"
+        elif self.bitid == "83##163##2209":
+            self.strand = "+"
+        elif self.bitid == "339##419##2465":
+            self.strand = "+"
+        elif self.bitid == "99##147##2193":
+            self.strand = "-"
+        elif self.bitid == "355##403##2449":
+            self.strand = "-"
+        elif self.bitid == "99##147##2145":
+            self.strand = "-"
+        elif self.bitid == "355##403##2401":
+            self.strand = "-"
+        elif self.bitid == "16##2064":
+            self.strand = "+"
+        elif self.bitid == "272##2320":
+            self.strand = "+"
+        elif self.bitid == "0##2048":
+            self.strand = "-"
+        elif self.bitid == "256##2304":
+            self.strand = "-"
+        elif self.bitid == "153##2201":
+            self.strand = "-"
         else:
-            self.strand="."
-    
+            self.strand = "."
+
     def flip_strand(self):
-        if self.strand=="+":self.strand="-"
-        if self.strand=="-":self.strand="+"
+        if self.strand == "+":
+            self.strand = "-"
+        if self.strand == "-":
+            self.strand = "+"
 
-    def validate_BSJ_read(self,junctions):
+    def validate_BSJ_read(self, junctions):
         """
         Checks if read is truly a BSJ originitor.
         * Defines left, right and middle alignments
         * Left and right alignments should not overlap
         * Middle alignment should be between left and right alignments
         """
-        if len(self.bitid.split("##"))==3:
-            left=-1
-            right=-1
-            middle=-1
-            if self.bitid=="83##163##2129":
-                left=2129
-                right=83
-                middle=163
-            if self.bitid=="339##419##2385":
-                left=2385
-                right=339
-                middle=419				
-            if self.bitid=="83##163##2209":
-                left=163
-                right=2209
-                middle=83
-            if self.bitid=="339##419##2465":
-                left=419
-                right=2465
-                middle=339
-            if self.bitid=="99##147##2145":
-                left=99
-                right=2145
-                middle=147
-            if self.bitid=="355##403##2401":
-                left=355
-                right=2401
-                middle=403
-            if self.bitid=="99##147##2193":
-                left=2193
-                right=147
-                middle=99
-            if self.bitid=="355##403##2449":
-                left=2449
-                right=403
-                middle=355
+        if len(self.bitid.split("##")) == 3:
+            left = -1
+            right = -1
+            middle = -1
+            if self.bitid == "83##163##2129":
+                left = 2129
+                right = 83
+                middle = 163
+            if self.bitid == "339##419##2385":
+                left = 2385
+                right = 339
+                middle = 419
+            if self.bitid == "83##163##2209":
+                left = 163
+                right = 2209
+                middle = 83
+            if self.bitid == "339##419##2465":
+                left = 419
+                right = 2465
+                middle = 339
+            if self.bitid == "99##147##2145":
+                left = 99
+                right = 2145
+                middle = 147
+            if self.bitid == "355##403##2401":
+                left = 355
+                right = 2401
+                middle = 403
+            if self.bitid == "99##147##2193":
+                left = 2193
+                right = 147
+                middle = 99
+            if self.bitid == "355##403##2449":
+                left = 2449
+                right = 403
+                middle = 355
             # print(left,right,middle)
             if left == -1 or right == -1 or middle == -1:
                 return False
@@ -261,89 +278,95 @@ def validate_BSJ_read(self,junctions):
             # print("validate_BSJ_read",self.readid,self.refcoordinates[middle][0],self.refcoordinates[middle][-1])
             leftmost = str(self.refcoordinates[left][0])
             rightmost = str(self.refcoordinates[right][-1])
-            possiblejid = chrom+"##"+leftmost+"##"+rightmost+"##"+self.strand
+            possiblejid = (
+                chrom + "##" + leftmost + "##" + rightmost + "##" + self.strand
+            )
             # print("validate_BSJ_read",self.readid,possiblejid)
             if possiblejid in junctions:
                 self.start = leftmost
-                self.end = str(int(rightmost) + 1)    # this will be added to the BED file
+                self.end = str(int(rightmost) + 1)  # this will be added to the BED file
                 return True
         else:
             return False
-            
-    
-    
+
     def get_bsjid(self):
-        t=[]
+        t = []
         t.append(self.refname)
         t.append(self.start)
         t.append(self.end)
         t.append(self.strand)
         return "##".join(t)
-    
-    def write_out_reads(self,outbam):
+
+    def write_out_reads(self, outbam):
         for r in self.alignments:
             outbam.write(r)
-        
-            
+
+
 def get_uniq_readid(r):
-    rname=r.query_name
-    hi=r.get_tag("HI")
-    rid=rname+"##"+str(hi)
+    rname = r.query_name
+    hi = r.get_tag("HI")
+    rid = rname + "##" + str(hi)
     return rid
 
+
 def get_bitflag(r):
-    bitflag=str(r).split("\t")[1]
+    bitflag = str(r).split("\t")[1]
     return int(bitflag)
 
+
 def _bsjid2chrom(bsjid):
-    x=bsjid.split("##")
+    x = bsjid.split("##")
     return x[0]
 
+
 def _bsjid2jid(bsjid):
-    x=bsjid.split("##")
-    chrom=x[0]
-    start=x[1]
-    end=str(int(x[2])-1)
-    jid="##".join([chrom,start,end])
-    return jid,chrom
-
-def read_regions(regionsfile,host,additives,viruses):
-    host=host.split(",")
-    additives=additives.split(",")
-    viruses=viruses.split(",")
-    infile=open(regionsfile,'r')
-    regions=dict()
+    x = bsjid.split("##")
+    chrom = x[0]
+    start = x[1]
+    end = str(int(x[2]) - 1)
+    jid = "##".join([chrom, start, end])
+    return jid, chrom
+
+
+def read_regions(regionsfile, host, additives, viruses):
+    host = host.split(",")
+    additives = additives.split(",")
+    viruses = viruses.split(",")
+    infile = open(regionsfile, "r")
+    regions = dict()
     for l in infile.readlines():
         l = l.strip().split("\t")
-        region_name=l[0]
-        regions[region_name]=dict()
-        regions[region_name]['sequences']=dict()
+        region_name = l[0]
+        regions[region_name] = dict()
+        regions[region_name]["sequences"] = dict()
         if region_name in host:
-            regions[region_name]['host_additive_virus']="host"
+            regions[region_name]["host_additive_virus"] = "host"
         elif region_name in additives:
-            regions[region_name]['host_additive_virus']="additive"
+            regions[region_name]["host_additive_virus"] = "additive"
         elif region_name in viruses:
-            regions[region_name]['host_additive_virus']="virus"
+            regions[region_name]["host_additive_virus"] = "virus"
         else:
             exit("%s has unknown region. Its not a host or a additive or a virus!!")
-        sequence_names=l[1].split()
+        sequence_names = l[1].split()
         for s in sequence_names:
-            regions[region_name]['sequences'][s]=1
-    return regions        
+            regions[region_name]["sequences"][s] = 1
+    return regions
+
 
-def _get_host_additive_virus(regions,seqname):
-    for k,v in regions.items():
-        if seqname in v['sequences']:
-            return v['host_additive_virus']
+def _get_host_additive_virus(regions, seqname):
+    for k, v in regions.items():
+        if seqname in v["sequences"]:
+            return v["host_additive_virus"]
     else:
-        exit("Sequence: %s does not have a region."%(seqname))
+        exit("Sequence: %s does not have a region." % (seqname))
 
-def _get_regionname_from_seqname(regions,seqname):
-    for k,v in regions.items():
-        if seqname in v['sequences']:
+
+def _get_regionname_from_seqname(regions, seqname):
+    for k, v in regions.items():
+        if seqname in v["sequences"]:
             return k
     else:
-        exit("Sequence: %s does not have a region."%(seqname))
+        exit("Sequence: %s does not have a region." % (seqname))
 
 
 def main():
@@ -355,193 +378,346 @@ def main():
         where the chrom, start and end represent the BSJ the read is depicting.
         """
     )
-    parser.add_argument("-i","--inbam",dest="inbam",required=True,type=str,
-        help="Input Chimeric-only STAR2p BAM file")
-    parser.add_argument('-t','--sample_counts_table', dest='countstable', type=str, required=True,
-        help='circExplore per-sample counts table')	# get coordinates of the circRNA
-    parser.add_argument("-s",'--sample_name', dest='samplename', type=str, required=False, default = 'sample1',
-        help='Sample Name: SM for RG')
-    parser.add_argument("-l",'--library', dest='library', type=str, required=False, default = 'lib1',
-        help='Sample Name: LB for RG')
-    parser.add_argument("-f",'--platform', dest='platform', type=str, required=False, default = 'illumina',
-        help='Sample Name: PL for RG')
-    parser.add_argument("-u",'--unit', dest='unit', type=str, required=False, default = 'unit1',
-        help='Sample Name: PU for RG')
-    parser.add_argument("-o","--outbam",dest="outbam",required=True,type=argparse.FileType('w'),
-        help="Output bam file ... both strands")
-    parser.add_argument("-p","--plusbam",dest="plusbam",required=True,type=argparse.FileType('w'),
-        help="Output plus strand bam file")
-    parser.add_argument("-m","--minusbam",dest="minusbam",required=True,type=argparse.FileType('w'),
-        help="Output plus strand bam file")
-    parser.add_argument("--outputhostbams",dest="outputhostbams",required=False,action='store_true', default=False,
-        help="Output individual host BAM files")
-    parser.add_argument("--outputvirusbams",dest="outputvirusbams",required=False,action='store_true', default=False,
-        help="Output individual virus BAM files")
-    parser.add_argument("--outdir",dest="outdir",required=False,type=str,
-        help="Output folder for the individual BAM files (required only if --outputhostbams or --outputvirusbams is used).")
-    parser.add_argument("-b","--bed",dest="bed",required=True,type=str,
-        help="Output BSJ bed.gz file (with strand info)")
-    parser.add_argument("-j","--junctionsfound",dest="junctionsfound",required=True,type=argparse.FileType('w', encoding='UTF-8'),
-        help="Output TSV file with counts of junctions expected vs found")
-    parser.add_argument('--regions', dest='regions', type=str, required=True,
-        help='regions file eg. ref.fa.regions')
-    parser.add_argument('--host', dest='host', type=str, required=True,
-        help='host name eg.hg38... single value')
-    parser.add_argument('--additives', dest='additives', type=str, required=True,
-        help='additive name(s) eg.ERCC... comma-separated list... all BSJs in this region are filtered out')
-    parser.add_argument('--viruses', dest='viruses', type=str, required=True,
-        help='virus name(s) eg.NC_009333.1... comma-separated list')
-    args = parser.parse_args()		
+    parser.add_argument(
+        "-i",
+        "--inbam",
+        dest="inbam",
+        required=True,
+        type=str,
+        help="Input Chimeric-only STAR2p BAM file",
+    )
+    parser.add_argument(
+        "-t",
+        "--sample_counts_table",
+        dest="countstable",
+        type=str,
+        required=True,
+        help="circExplore per-sample counts table",
+    )  # get coordinates of the circRNA
+    parser.add_argument(
+        "-s",
+        "--sample_name",
+        dest="samplename",
+        type=str,
+        required=False,
+        default="sample1",
+        help="Sample Name: SM for RG",
+    )
+    parser.add_argument(
+        "-l",
+        "--library",
+        dest="library",
+        type=str,
+        required=False,
+        default="lib1",
+        help="Sample Name: LB for RG",
+    )
+    parser.add_argument(
+        "-f",
+        "--platform",
+        dest="platform",
+        type=str,
+        required=False,
+        default="illumina",
+        help="Sample Name: PL for RG",
+    )
+    parser.add_argument(
+        "-u",
+        "--unit",
+        dest="unit",
+        type=str,
+        required=False,
+        default="unit1",
+        help="Sample Name: PU for RG",
+    )
+    parser.add_argument(
+        "-o",
+        "--outbam",
+        dest="outbam",
+        required=True,
+        type=argparse.FileType("w"),
+        help="Output bam file ... both strands",
+    )
+    parser.add_argument(
+        "-p",
+        "--plusbam",
+        dest="plusbam",
+        required=True,
+        type=argparse.FileType("w"),
+        help="Output plus strand bam file",
+    )
+    parser.add_argument(
+        "-m",
+        "--minusbam",
+        dest="minusbam",
+        required=True,
+        type=argparse.FileType("w"),
+        help="Output plus strand bam file",
+    )
+    parser.add_argument(
+        "--outputhostbams",
+        dest="outputhostbams",
+        required=False,
+        action="store_true",
+        default=False,
+        help="Output individual host BAM files",
+    )
+    parser.add_argument(
+        "--outputvirusbams",
+        dest="outputvirusbams",
+        required=False,
+        action="store_true",
+        default=False,
+        help="Output individual virus BAM files",
+    )
+    parser.add_argument(
+        "--outdir",
+        dest="outdir",
+        required=False,
+        type=str,
+        help="Output folder for the individual BAM files (required only if --outputhostbams or --outputvirusbams is used).",
+    )
+    parser.add_argument(
+        "-b",
+        "--bed",
+        dest="bed",
+        required=True,
+        type=str,
+        help="Output BSJ bed.gz file (with strand info)",
+    )
+    parser.add_argument(
+        "-j",
+        "--junctionsfound",
+        dest="junctionsfound",
+        required=True,
+        type=argparse.FileType("w", encoding="UTF-8"),
+        help="Output TSV file with counts of junctions expected vs found",
+    )
+    parser.add_argument(
+        "--regions",
+        dest="regions",
+        type=str,
+        required=True,
+        help="regions file eg. ref.fa.regions",
+    )
+    parser.add_argument(
+        "--host",
+        dest="host",
+        type=str,
+        required=True,
+        help="host name eg.hg38... single value",
+    )
+    parser.add_argument(
+        "--additives",
+        dest="additives",
+        type=str,
+        required=True,
+        help="additive name(s) eg.ERCC... comma-separated list... all BSJs in this region are filtered out",
+    )
+    parser.add_argument(
+        "--viruses",
+        dest="viruses",
+        type=str,
+        required=True,
+        help="virus name(s) eg.NC_009333.1... comma-separated list",
+    )
+    args = parser.parse_args()
     samfile = pysam.AlignmentFile(args.inbam, "rb")
     samheader = samfile.header.to_dict()
-    samheader['RG']=list()
-    junctionsfile = open(args.countstable,'r')
-    junctions=dict()
-    junctions_found=dict()
-    print("%s | Reading...junctions!..."%(get_ctime()))
+    samheader["RG"] = list()
+    junctionsfile = open(args.countstable, "r")
+    junctions = dict()
+    junctions_found = dict()
+    print("%s | Reading...junctions!..." % (get_ctime()))
     for l in junctionsfile.readlines():
-        if "read_count" in l: continue
+        if "read_count" in l:
+            continue
         l = l.strip().split("\t")
         chrom = l[0]
         start = l[1]
-        end = str(int(l[2])-1)
+        end = str(int(l[2]) - 1)
         strand = l[3]
-        jid = chrom+"##"+start+"##"+end+"##"+strand                     # create a unique junction ID for each line in the BSJ junction file and make it the dict key ... easy for searching!
-        samheader['RG'].append({'ID':jid, 'LB':args.library, 'PL':args.platform, 'PU':args.unit,'SM':args.samplename})
+        jid = (
+            chrom + "##" + start + "##" + end + "##" + strand
+        )  # create a unique junction ID for each line in the BSJ junction file and make it the dict key ... easy for searching!
+        samheader["RG"].append(
+            {
+                "ID": jid,
+                "LB": args.library,
+                "PL": args.platform,
+                "PU": args.unit,
+                "SM": args.samplename,
+            }
+        )
         junctions[jid] = int(l[4])
         junctions_found[jid] = 0
     junctionsfile.close()
     sequences = list()
-    for v in samheader['SQ']:
-        sequences.append(v['SN'])
-    seqname2regionname=dict()
-    hosts=set()
-    viruses=set()
-    regions = read_regions(regionsfile=args.regions,host=args.host,additives=args.additives,viruses=args.viruses)
+    for v in samheader["SQ"]:
+        sequences.append(v["SN"])
+    seqname2regionname = dict()
+    hosts = set()
+    viruses = set()
+    regions = read_regions(
+        regionsfile=args.regions,
+        host=args.host,
+        additives=args.additives,
+        viruses=args.viruses,
+    )
     for s in sequences:
-        hav = _get_host_additive_virus(regions,s)
+        hav = _get_host_additive_virus(regions, s)
         if hav == "host":
-            hostname = _get_regionname_from_seqname(regions,s)
-            seqname2regionname[s]=hostname
+            hostname = _get_regionname_from_seqname(regions, s)
+            seqname2regionname[s] = hostname
             hosts.add(hostname)
         if hav == "virus":
-            virusname = _get_regionname_from_seqname(regions,s)
-            seqname2regionname[s]=virusname
+            virusname = _get_regionname_from_seqname(regions, s)
+            seqname2regionname[s] = virusname
             viruses.add(virusname)
-    print("%s | Done reading %d junctions."%(get_ctime(),len(junctions)))
+    print("%s | Done reading %d junctions." % (get_ctime(), len(junctions)))
 
-    bigdict=dict()
+    bigdict = dict()
     # print("Opening...")
     # print(args.inbam)
-    print("%s | Reading...alignments!..."%(get_ctime()))
-    count=0
-    count2=0
+    print("%s | Reading...alignments!..." % (get_ctime()))
+    count = 0
+    count2 = 0
     for read in samfile.fetch():
-        count+=1
-        if debug: print(read,read.reference_id,read.next_reference_id)    
-        if read.reference_id != read.next_reference_id: continue    # only works for PE ... for SE read.next_reference_id is -1
-        count2+=1
-        rid=get_uniq_readid(read)                           # add the HI number to the readid
-        if debug:print(rid)
+        count += 1
+        if debug:
+            print(read, read.reference_id, read.next_reference_id)
+        if read.reference_id != read.next_reference_id:
+            continue  # only works for PE ... for SE read.next_reference_id is -1
+        count2 += 1
+        rid = get_uniq_readid(read)  # add the HI number to the readid
+        if debug:
+            print(rid)
         if not rid in bigdict:
-            bigdict[rid]=Readinfo(rid,read.reference_name)
+            bigdict[rid] = Readinfo(rid, read.reference_name)
         # bigdict[rid].append_alignment(read)                 # since rid has HI number included ... this separates alignment by HI
-        bitflag=get_bitflag(read)
-        if debug:print(bitflag)
-        bigdict[rid].append_bitflag(bitflag)                # each rid can have upto 3 lines in the BAM with each having its own bitflag ... collect all bigflags in a list here 
-        refpos=list(filter(lambda x:x!=None,read.get_reference_positions(full_length=True)))
-        bigdict[rid].set_refcoordinates(bitflag,refpos)     # maintain a list of reference coordinated that are "aligned" for each bitflag in each rid alignment
+        bitflag = get_bitflag(read)
+        if debug:
+            print(bitflag)
+        bigdict[rid].append_bitflag(
+            bitflag
+        )  # each rid can have upto 3 lines in the BAM with each having its own bitflag ... collect all bigflags in a list here
+        refpos = list(
+            filter(lambda x: x != None, read.get_reference_positions(full_length=True))
+        )
+        bigdict[rid].set_refcoordinates(
+            bitflag, refpos
+        )  # maintain a list of reference coordinated that are "aligned" for each bitflag in each rid alignment
         # bigdict[rid].set_read1_reverse_secondary_supplementary(bitflag,read)
-        if debug:print(bigdict[rid])
-    print("%s | Done reading %d chimeric alignments. [%d same chrom chimeras]"%(get_ctime(),count,count2))
+        if debug:
+            print(bigdict[rid])
+    print(
+        "%s | Done reading %d chimeric alignments. [%d same chrom chimeras]"
+        % (get_ctime(), count, count2)
+    )
     samfile.reset()
-    print("%s | Writing BAMs"%(get_ctime()))
-    print("%s | Re-Reading...alignments!..."%(get_ctime()))
-    plusfile = pysam.AlignmentFile(args.plusbam, "wb", header = samheader)
-    minusfile = pysam.AlignmentFile(args.minusbam, "wb", header = samheader)
-    outfile = pysam.AlignmentFile(args.outbam, "wb", header = samheader)
+    print("%s | Writing BAMs" % (get_ctime()))
+    print("%s | Re-Reading...alignments!..." % (get_ctime()))
+    plusfile = pysam.AlignmentFile(args.plusbam, "wb", header=samheader)
+    minusfile = pysam.AlignmentFile(args.minusbam, "wb", header=samheader)
+    outfile = pysam.AlignmentFile(args.outbam, "wb", header=samheader)
     outputbams = dict()
     if args.outputhostbams:
         for h in hosts:
-            outbamname = os.path.join(args.outdir,args.samplename+"."+h+".BSJ.bam")
-            outputbams[h] = pysam.AlignmentFile(outbamname, "wb", header = samheader)
+            outbamname = os.path.join(
+                args.outdir, args.samplename + "." + h + ".BSJ.bam"
+            )
+            outputbams[h] = pysam.AlignmentFile(outbamname, "wb", header=samheader)
     if args.outputvirusbams:
         for v in viruses:
-            outbamname = os.path.join(args.outdir,args.samplename+"."+v+".BSJ.bam")
-            outputbams[v] = pysam.AlignmentFile(outbamname, "wb", header = samheader)            
-    bsjdict=dict()
-    bitid_counts=dict()
+            outbamname = os.path.join(
+                args.outdir, args.samplename + "." + v + ".BSJ.bam"
+            )
+            outputbams[v] = pysam.AlignmentFile(outbamname, "wb", header=samheader)
+    bsjdict = dict()
+    bitid_counts = dict()
     lenoutputbams = len(outputbams)
     for read in samfile.fetch():
-        if read.reference_id != read.next_reference_id: continue
-        rid=get_uniq_readid(read)
+        if read.reference_id != read.next_reference_id:
+            continue
+        rid = get_uniq_readid(read)
         if rid in bigdict:
-            bigdict[rid].generate_bitid()                       # separate all bitflags for the same rid with ## and create a unique single bitflag ... bitflags are pre-sorted
-            if debug:print(bigdict[rid])                        
-            bigdict[rid].get_strand()                           # use the unique aggregated bitid to extract the strand information ... all possible cases are explicitly covered
-            bigdict[rid].flip_strand()                          # strands are flipped than those reported in the counts table .. hence flipping!
-            if not bigdict[rid].validate_BSJ_read(junctions=junctions): # ensure that the read alignments leftmost and rightmost coordinates match with one of the BSJ junctions... if yes then that rid represents a BSJ. Also add start and end to the BSJ object
+            bigdict[
+                rid
+            ].generate_bitid()  # separate all bitflags for the same rid with ## and create a unique single bitflag ... bitflags are pre-sorted
+            if debug:
+                print(bigdict[rid])
+            bigdict[
+                rid
+            ].get_strand()  # use the unique aggregated bitid to extract the strand information ... all possible cases are explicitly covered
+            bigdict[
+                rid
+            ].flip_strand()  # strands are flipped than those reported in the counts table .. hence flipping!
+            if not bigdict[rid].validate_BSJ_read(
+                junctions=junctions
+            ):  # ensure that the read alignments leftmost and rightmost coordinates match with one of the BSJ junctions... if yes then that rid represents a BSJ. Also add start and end to the BSJ object
                 continue
             # bigdict[rid].get_start_end()
             # print(bigdict[rid])
-            bsjid=bigdict[rid].get_bsjid()
-            chrom=_bsjid2chrom(bsjid)
+            bsjid = bigdict[rid].get_bsjid()
+            chrom = _bsjid2chrom(bsjid)
             # jid,chrom=_bsjid2jid(bsjid)
             read.set_tag("RG", bsjid, value_type="Z")
-            if bigdict[rid].strand=="+":
+            if bigdict[rid].strand == "+":
                 plusfile.write(read)
-            if bigdict[rid].strand=="-":
+            if bigdict[rid].strand == "-":
                 minusfile.write(read)
             outfile.write(read)
             if lenoutputbams != 0:
-                regionname=_get_regionname_from_seqname(regions,chrom)
+                regionname = _get_regionname_from_seqname(regions, chrom)
                 if regionname in hosts and args.outputhostbams:
                     outputbams[regionname].write(read)
                 if regionname in viruses and args.outputvirusbams:
                     outputbams[regionname].write(read)
             if not bsjid in bsjdict:
-                bsjdict[bsjid]=BSJ()
+                bsjdict[bsjid] = BSJ()
                 bsjdict[bsjid].set_chrom(bigdict[rid].refname)
                 bsjdict[bsjid].set_start(bigdict[rid].start)
                 bsjdict[bsjid].set_end(bigdict[rid].end)
                 bsjdict[bsjid].set_strand(bigdict[rid].strand)
             bsjdict[bsjid].append_bitid(bigdict[rid].bitid)
             if not bigdict[rid].bitid in bitid_counts:
-                bitid_counts[bigdict[rid].bitid]=0
-            bitid_counts[bigdict[rid].bitid]+=1
+                bitid_counts[bigdict[rid].bitid] = 0
+            bitid_counts[bigdict[rid].bitid] += 1
             bsjdict[bsjid].append_rid(rid)
     plusfile.close()
     minusfile.close()
     samfile.close()
     outfile.close()
     if lenoutputbams != 0:
-        for k,v in outputbams.items():
+        for k, v in outputbams.items():
             v.close()
-    print("%s | Done!"%(get_ctime()))	
+    print("%s | Done!" % (get_ctime()))
     for b in bitid_counts.keys():
-        print(b,bitid_counts[b])
-    print("%s | Writing BED"%(get_ctime()))
+        print(b, bitid_counts[b])
+    print("%s | Writing BED" % (get_ctime()))
 
-    with gzip.open(args.bed,'wt') as bsjfile:
+    with gzip.open(args.bed, "wt") as bsjfile:
         for bsjid in bsjdict.keys():
             bsjdict[bsjid].update_score_and_found_count(junctions_found)
             bsjdict[bsjid].write_out_BSJ(bsjfile)
     bsjfile.close()
-        
 
-    args.junctionsfound.write("#chrom\tstart\tend\tstrand\texpected_BSJ_reads\tfound_BSJ_reads\n")
+    args.junctionsfound.write(
+        "#chrom\tstart\tend\tstrand\texpected_BSJ_reads\tfound_BSJ_reads\n"
+    )
     for jid in junctions.keys():
-        x=jid.split("##")
-        chrom=x[0]
-        start=int(x[1])
-        end=int(x[2])+1
-        strand=x[3]
-        args.junctionsfound.write("%s\t%d\t%d\t%s\t%d\t%d\n"%(chrom,start,end,strand,junctions[jid],junctions_found[jid]))
+        x = jid.split("##")
+        chrom = x[0]
+        start = int(x[1])
+        end = int(x[2]) + 1
+        strand = x[3]
+        args.junctionsfound.write(
+            "%s\t%d\t%d\t%s\t%d\t%d\n"
+            % (chrom, start, end, strand, junctions[jid], junctions_found[jid])
+        )
     args.junctionsfound.close()
-    print("%s | ALL Done!"%(get_ctime()))
-            
+    print("%s | ALL Done!" % (get_ctime()))
+
 
 if __name__ == "__main__":
     main()
-
-
diff --git a/workflow/scripts/_create_circExplorer_BSJ_bam_se.py b/workflow/scripts/_create_circExplorer_BSJ_bam_se.py
index 8cf5454..fc0a7c1 100755
--- a/workflow/scripts/_create_circExplorer_BSJ_bam_se.py
+++ b/workflow/scripts/_create_circExplorer_BSJ_bam_se.py
@@ -5,9 +5,11 @@
 import os
 import time
 
+
 def get_ctime():
     return time.ctime(time.time())
 
+
 """
 
 This script first validates each read to be "valid" BSJ read and then splits a BSJ bam file by strand into:
@@ -16,9 +18,9 @@ def get_ctime():
 3. BSJ bed file with score(number of reads supporting the BSJ) and strand information
 Logic (for SE reads):
 Each BSJ is represented by a 2 alignments in the output BAM file.
-Alignments 1 and 2 are split alignment of read1 at two distinct loci on the same reference 
+Alignments 1 and 2 are split alignment of read1 at two distinct loci on the same reference
 chromosome.
-These alignments are grouped together by the "HI" tags in SAM file. For example, all 2 
+These alignments are grouped together by the "HI" tags in SAM file. For example, all 2
 alignments for the same BSJ will have the same "HI" value... something like "HI:i:1".
 BSJ alignment sam bitflag combinations can have 4 different possibilities, 2 from sense strand
 and 2 from anti-sense strand:
@@ -31,38 +33,38 @@ def get_ctime():
 
 class BSJ:
     def __init__(self):
-        self.chrom=""
-        self.start=""
-        self.end=""
-        self.score=0
-        self.name="."
-        self.strand="U"
-        self.bitids=set()
-        self.rids=set()
-        
+        self.chrom = ""
+        self.start = ""
+        self.end = ""
+        self.score = 0
+        self.name = "."
+        self.strand = "U"
+        self.bitids = set()
+        self.rids = set()
+
     def plusone(self):
-        self.score+=1
-    
-    def set_strand(self,strand):
-        self.strand=strand
-    
-    def set_chrom(self,chrom):
-        self.chrom=chrom
-
-    def set_start(self,start):
-        self.start=start
-
-    def set_end(self,end):
-        self.end=end
-    
-    def append_bitid(self,bitid):
+        self.score += 1
+
+    def set_strand(self, strand):
+        self.strand = strand
+
+    def set_chrom(self, chrom):
+        self.chrom = chrom
+
+    def set_start(self, start):
+        self.start = start
+
+    def set_end(self, end):
+        self.end = end
+
+    def append_bitid(self, bitid):
         self.bitids.add(bitid)
 
-    def append_rid(self,rid):
+    def append_rid(self, rid):
         self.rids.add(rid)
-        
-    def write_out_BSJ(self,outbed):
-        t=[]
+
+    def write_out_BSJ(self, outbed):
+        t = []
         t.append(self.chrom)
         t.append(str(self.start))
         t.append(str(self.end))
@@ -71,192 +73,212 @@ def write_out_BSJ(self,outbed):
         t.append(self.strand)
         t.append(",".join(self.bitids))
         t.append(",".join(self.rids))
-        outbed.write("\t".join(t)+"\n")		
+        outbed.write("\t".join(t) + "\n")
 
-    def update_score_and_found_count(self,junctions_found):
+    def update_score_and_found_count(self, junctions_found):
         self.score = len(self.rids)
-        jid = self.chrom + "##" + str(self.start) + "##" + str(int(self.end)-1) + "##" + self.strand
-        junctions_found[jid]+=self.score
-        
+        jid = (
+            self.chrom
+            + "##"
+            + str(self.start)
+            + "##"
+            + str(int(self.end) - 1)
+            + "##"
+            + self.strand
+        )
+        junctions_found[jid] += self.score
+
+
 class Readinfo:
-    def __init__(self,readid,rname):
-        self.readid=readid
-        self.refname=rname
+    def __init__(self, readid, rname):
+        self.readid = readid
+        self.refname = rname
         # self.alignments=list()
-        self.bitflags=list()
-        self.bitid=""
-        self.strand="."
-        self.start=-1
-        self.end=-1
-        self.refcoordinates=dict()
-        self.isread1=dict()
-        self.isreverse=dict()
-        self.issecondary=dict()
-        self.cigarstrs=dict()
-        self.issupplementary=dict()
-    
+        self.bitflags = list()
+        self.bitid = ""
+        self.strand = "."
+        self.start = -1
+        self.end = -1
+        self.refcoordinates = dict()
+        self.isread1 = dict()
+        self.isreverse = dict()
+        self.issecondary = dict()
+        self.cigarstrs = dict()
+        self.issupplementary = dict()
+
     def __str__(self):
-        s = "readid: %s"%(self.readid)
-        s = "%s\tbitflags: %s"%(s,self.bitflags)
-        s = "%s\tisreverse: %s"%(s,self.isreverse)
-        s = "%s\tbitid: %s"%(s,self.bitid)
+        s = "readid: %s" % (self.readid)
+        s = "%s\tbitflags: %s" % (s, self.bitflags)
+        s = "%s\tisreverse: %s" % (s, self.isreverse)
+        s = "%s\tbitid: %s" % (s, self.bitid)
         return s
 
-    def set_refcoordinates(self,bitflag,refpos):
-        self.refcoordinates[bitflag]=refpos
-    
-    def set_cigarstr(self,bitflag,cigarstr):
-        self.cigarstrs[bitflag]=cigarstr
-    
-    def set_read1_reverse_secondary_supplementary(self,bitflag,read):
+    def set_refcoordinates(self, bitflag, refpos):
+        self.refcoordinates[bitflag] = refpos
+
+    def set_cigarstr(self, bitflag, cigarstr):
+        self.cigarstrs[bitflag] = cigarstr
+
+    def set_read1_reverse_secondary_supplementary(self, bitflag, read):
         if read.is_read1:
-            self.isread1[bitflag]="Y"
+            self.isread1[bitflag] = "Y"
         else:
-            self.isread1[bitflag]="N"
+            self.isread1[bitflag] = "N"
         if read.is_reverse:
-            self.isreverse[bitflag]="Y"
+            self.isreverse[bitflag] = "Y"
         else:
-            self.isreverse[bitflag]="N"
+            self.isreverse[bitflag] = "N"
         if read.is_secondary:
-            self.issecondary[bitflag]="Y"
+            self.issecondary[bitflag] = "Y"
         else:
-            self.issecondary[bitflag]="N"
+            self.issecondary[bitflag] = "N"
         if read.is_supplementary:
-            self.issupplementary[bitflag]="Y"
+            self.issupplementary[bitflag] = "Y"
         else:
-            self.issupplementary[bitflag]="N"
-    
+            self.issupplementary[bitflag] = "N"
+
     # def append_alignment(self,read):
     #     self.alignments.append(read)
-    
-    def append_bitflag(self,bf):
+
+    def append_bitflag(self, bf):
         self.bitflags.append(bf)
-    
+
     # def extend_ref_positions(self,refcoords):
     # 	self.refcoordinates.extend(refcoords)
-    
+
     def generate_bitid(self):
-        bitlist=sorted(self.bitflags)
-        self.bitid="##".join(list(map(lambda x:str(x),bitlist)))
-# 		self.bitid=str(bitlist[0])+"##"+str(bitlist[1])+"##"+str(bitlist[2])
-    
+        bitlist = sorted(self.bitflags)
+        self.bitid = "##".join(list(map(lambda x: str(x), bitlist)))
+
+    # 		self.bitid=str(bitlist[0])+"##"+str(bitlist[1])+"##"+str(bitlist[2])
+
     def get_strand(self):
-        if self.bitid=="0##2048":
-            self.strand="-"
-        elif self.bitid=="256##2304":
-            self.strand="-"
-        elif self.bitid=="16##2064":
-            self.strand="+"
-        elif self.bitid=="272##2320":
-            self.strand="+"		
+        if self.bitid == "0##2048":
+            self.strand = "-"
+        elif self.bitid == "256##2304":
+            self.strand = "-"
+        elif self.bitid == "16##2064":
+            self.strand = "+"
+        elif self.bitid == "272##2320":
+            self.strand = "+"
         else:
-            self.strand="U"
+            self.strand = "U"
 
-    def validate_BSJ_read(self,junctions):
+    def validate_BSJ_read(self, junctions):
         """
         Checks if read is truly a BSJ originitor.
         """
-        if len(self.bitid.split("##"))==2:
+        if len(self.bitid.split("##")) == 2:
             if not self.bitid in ["0##2048", "16##2064", "256##2304", "272##2320"]:
                 return False
-            count=0
-            refcoords=self.refcoordinates
-            for k,v in refcoords.items():
-                count+=1
-                refcoords[k]=sorted(v)
-                if count==1:
-                    astart=refcoords[k][0]
-                    aend=refcoords[k][-1]
-                if count==2:
-                    bstart=refcoords[k][0]
-                    bend=refcoords[k][-1]
+            count = 0
+            refcoords = self.refcoordinates
+            for k, v in refcoords.items():
+                count += 1
+                refcoords[k] = sorted(v)
+                if count == 1:
+                    astart = refcoords[k][0]
+                    aend = refcoords[k][-1]
+                if count == 2:
+                    bstart = refcoords[k][0]
+                    bend = refcoords[k][-1]
             chrom = self.refname
-            possiblejid=chrom+"##"+str(astart)+"##"+str(bend)+"##"+self.strand
-            possiblejid2=chrom+"##"+str(bstart)+"##"+str(aend)+"##"+self.strand
+            possiblejid = (
+                chrom + "##" + str(astart) + "##" + str(bend) + "##" + self.strand
+            )
+            possiblejid2 = (
+                chrom + "##" + str(bstart) + "##" + str(aend) + "##" + self.strand
+            )
             # exit()
             if possiblejid in junctions:
                 self.start = astart
-                self.end = str(int(bend) + 1)    # this will be added to the BED file
+                self.end = str(int(bend) + 1)  # this will be added to the BED file
                 return True
             if possiblejid2 in junctions:
                 self.start = bstart
-                self.end = str(int(aend) + 1)    # this will be added to the BED file
-                return True            
+                self.end = str(int(aend) + 1)  # this will be added to the BED file
+                return True
         else:
             return False
-    
+
     def get_bsjid(self):
-        t=[]
+        t = []
         t.append(self.refname)
         t.append(str(self.start))
         t.append(str(self.end))
         t.append(self.strand)
         return "##".join(t)
-    
+
     # def write_out_reads(self,outbam):
     #     for r in self.alignments:
     #         outbam.write(r)
-        
-            
+
+
 def get_uniq_readid(r):
-    rname=r.query_name
-    hi=r.get_tag("HI")
-    rid=rname+"##"+str(hi)
+    rname = r.query_name
+    hi = r.get_tag("HI")
+    rid = rname + "##" + str(hi)
     return rid
 
+
 def get_bitflag(r):
-    bitflag=str(r).split("\t")[1]
+    bitflag = str(r).split("\t")[1]
     return int(bitflag)
 
+
 def _bsjid2chrom(bsjid):
-    x=bsjid.split("##")
+    x = bsjid.split("##")
     return x[0]
 
+
 def _bsjid2jid(bsjid):
-    x=bsjid.split("##")
-    chrom=x[0]
-    start=x[1]
-    end=str(int(x[2])-1)
-    jid="##".join([chrom,start,end])
-    return jid,chrom
-
-def read_regions(regionsfile,host,additives,viruses):
-    host=host.split(",")
-    additives=additives.split(",")
-    viruses=viruses.split(",")
-    infile=open(regionsfile,'r')
-    regions=dict()
+    x = bsjid.split("##")
+    chrom = x[0]
+    start = x[1]
+    end = str(int(x[2]) - 1)
+    jid = "##".join([chrom, start, end])
+    return jid, chrom
+
+
+def read_regions(regionsfile, host, additives, viruses):
+    host = host.split(",")
+    additives = additives.split(",")
+    viruses = viruses.split(",")
+    infile = open(regionsfile, "r")
+    regions = dict()
     for l in infile.readlines():
         l = l.strip().split("\t")
-        region_name=l[0]
-        regions[region_name]=dict()
-        regions[region_name]['sequences']=dict()
+        region_name = l[0]
+        regions[region_name] = dict()
+        regions[region_name]["sequences"] = dict()
         if region_name in host:
-            regions[region_name]['host_additive_virus']="host"
+            regions[region_name]["host_additive_virus"] = "host"
         elif region_name in additives:
-            regions[region_name]['host_additive_virus']="additive"
+            regions[region_name]["host_additive_virus"] = "additive"
         elif region_name in viruses:
-            regions[region_name]['host_additive_virus']="virus"
+            regions[region_name]["host_additive_virus"] = "virus"
         else:
             exit("%s has unknown region. Its not a host or a additive or a virus!!")
-        sequence_names=l[1].split()
+        sequence_names = l[1].split()
         for s in sequence_names:
-            regions[region_name]['sequences'][s]=1
-    return regions        
+            regions[region_name]["sequences"][s] = 1
+    return regions
 
-def _get_host_additive_virus(regions,seqname):
-    for k,v in regions.items():
-        if seqname in v['sequences']:
-            return v['host_additive_virus']
+
+def _get_host_additive_virus(regions, seqname):
+    for k, v in regions.items():
+        if seqname in v["sequences"]:
+            return v["host_additive_virus"]
     else:
-        exit("Sequence: %s does not have a region."%(seqname))
+        exit("Sequence: %s does not have a region." % (seqname))
+
 
-def _get_regionname_from_seqname(regions,seqname):
-    for k,v in regions.items():
-        if seqname in v['sequences']:
+def _get_regionname_from_seqname(regions, seqname):
+    for k, v in regions.items():
+        if seqname in v["sequences"]:
             return k
     else:
-        exit("Sequence: %s does not have a region."%(seqname))
+        exit("Sequence: %s does not have a region." % (seqname))
 
 
 def main():
@@ -268,160 +290,335 @@ def main():
         where the chrom, start and end represent the BSJ the read is depicting.
         """
     )
-    parser.add_argument("-i","--inbam",dest="inbam",required=True,type=str,
-        help="Input Chimeric-only STAR2p BAM file")
-    parser.add_argument("-s",'--sample_name', dest='samplename', type=str, required=False, default = 'sample1',
-        help='Sample Name: SM for RG')
-    parser.add_argument("-l",'--library', dest='library', type=str, required=False, default = 'lib1',
-        help='Sample Name: LB for RG')
-    parser.add_argument("-f",'--platform', dest='platform', type=str, required=False, default = 'illumina',
-        help='Sample Name: PL for RG')
-    parser.add_argument("-u",'--unit', dest='unit', type=str, required=False, default = 'unit1',
-        help='Sample Name: PU for RG')
-    parser.add_argument('-t','--sample_counts_table', dest='countstable', type=str, required=True,
-        help='circExplore per-sample counts table')	# get coordinates of the circRNA
-    parser.add_argument("-p","--plusbam",dest="plusbam",required=True,type=argparse.FileType('w'),
-        help="Output plus strand bam file")
-    parser.add_argument("-m","--minusbam",dest="minusbam",required=True,type=argparse.FileType('w'),
-        help="Output plus strand bam file")
-    parser.add_argument("-o","--outbam",dest="outbam",required=True,type=argparse.FileType('w'),
-        help="Output bam file ... both strands")
-    parser.add_argument("--outputhostbams",dest="outputhostbams",required=False,action='store_true', default=False,
-        help="Output individual host BAM files")
-    parser.add_argument("--outputvirusbams",dest="outputvirusbams",required=False,action='store_true', default=False,
-        help="Output individual virus BAM files")
-    parser.add_argument("--outdir",dest="outdir",required=False,type=str,
-        help="Output folder for the individual BAM files (required only if --outputhostbams or --outputvirusbams is used).")
-    parser.add_argument("-b","--bed",dest="bed",required=True,type=str,
-        help="Output BSJ bed.gz file (with strand info)")
-    parser.add_argument("-j","--junctionsfound",dest="junctionsfound",required=True,type=argparse.FileType('w', encoding='UTF-8'),
-        help="Output TSV file with counts of junctions expected vs found")
-    parser.add_argument('--regions', dest='regions', type=str, required=True,
-        help='regions file eg. ref.fa.regions')
-    parser.add_argument('--host', dest='host', type=str, required=True,
-        help='host name eg.hg38... single value')
-    parser.add_argument('--additives', dest='additives', type=str, required=True,
-        help='additive name(s) eg.ERCC... comma-separated list... all BSJs in this region are filtered out')
-    parser.add_argument('--viruses', dest='viruses', type=str, required=True,
-        help='virus name(s) eg.NC_009333.1... comma-separated list')
-    args = parser.parse_args()		
+    parser.add_argument(
+        "-i",
+        "--inbam",
+        dest="inbam",
+        required=True,
+        type=str,
+        help="Input Chimeric-only STAR2p BAM file",
+    )
+    parser.add_argument(
+        "-s",
+        "--sample_name",
+        dest="samplename",
+        type=str,
+        required=False,
+        default="sample1",
+        help="Sample Name: SM for RG",
+    )
+    parser.add_argument(
+        "-l",
+        "--library",
+        dest="library",
+        type=str,
+        required=False,
+        default="lib1",
+        help="Sample Name: LB for RG",
+    )
+    parser.add_argument(
+        "-f",
+        "--platform",
+        dest="platform",
+        type=str,
+        required=False,
+        default="illumina",
+        help="Sample Name: PL for RG",
+    )
+    parser.add_argument(
+        "-u",
+        "--unit",
+        dest="unit",
+        type=str,
+        required=False,
+        default="unit1",
+        help="Sample Name: PU for RG",
+    )
+    parser.add_argument(
+        "-t",
+        "--sample_counts_table",
+        dest="countstable",
+        type=str,
+        required=True,
+        help="circExplore per-sample counts table",
+    )  # get coordinates of the circRNA
+    parser.add_argument(
+        "-p",
+        "--plusbam",
+        dest="plusbam",
+        required=True,
+        type=argparse.FileType("w"),
+        help="Output plus strand bam file",
+    )
+    parser.add_argument(
+        "-m",
+        "--minusbam",
+        dest="minusbam",
+        required=True,
+        type=argparse.FileType("w"),
+        help="Output plus strand bam file",
+    )
+    parser.add_argument(
+        "-o",
+        "--outbam",
+        dest="outbam",
+        required=True,
+        type=argparse.FileType("w"),
+        help="Output bam file ... both strands",
+    )
+    parser.add_argument(
+        "--outputhostbams",
+        dest="outputhostbams",
+        required=False,
+        action="store_true",
+        default=False,
+        help="Output individual host BAM files",
+    )
+    parser.add_argument(
+        "--outputvirusbams",
+        dest="outputvirusbams",
+        required=False,
+        action="store_true",
+        default=False,
+        help="Output individual virus BAM files",
+    )
+    parser.add_argument(
+        "--outdir",
+        dest="outdir",
+        required=False,
+        type=str,
+        help="Output folder for the individual BAM files (required only if --outputhostbams or --outputvirusbams is used).",
+    )
+    parser.add_argument(
+        "-b",
+        "--bed",
+        dest="bed",
+        required=True,
+        type=str,
+        help="Output BSJ bed.gz file (with strand info)",
+    )
+    parser.add_argument(
+        "-j",
+        "--junctionsfound",
+        dest="junctionsfound",
+        required=True,
+        type=argparse.FileType("w", encoding="UTF-8"),
+        help="Output TSV file with counts of junctions expected vs found",
+    )
+    parser.add_argument(
+        "--regions",
+        dest="regions",
+        type=str,
+        required=True,
+        help="regions file eg. ref.fa.regions",
+    )
+    parser.add_argument(
+        "--host",
+        dest="host",
+        type=str,
+        required=True,
+        help="host name eg.hg38... single value",
+    )
+    parser.add_argument(
+        "--additives",
+        dest="additives",
+        type=str,
+        required=True,
+        help="additive name(s) eg.ERCC... comma-separated list... all BSJs in this region are filtered out",
+    )
+    parser.add_argument(
+        "--viruses",
+        dest="viruses",
+        type=str,
+        required=True,
+        help="virus name(s) eg.NC_009333.1... comma-separated list",
+    )
+    args = parser.parse_args()
     samfile = pysam.AlignmentFile(args.inbam, "rb")
     samheader = samfile.header.to_dict()
-    samheader['RG']=list()
-# 	bsjfile = open(args.bed,"w")
-    junctionsfile = open(args.countstable,'r')
-    junctions=dict()
-    junctions_found=dict()
-    print("%s | Reading...junctions!..."%(get_ctime()))
+    samheader["RG"] = list()
+    # 	bsjfile = open(args.bed,"w")
+    junctionsfile = open(args.countstable, "r")
+    junctions = dict()
+    junctions_found = dict()
+    print("%s | Reading...junctions!..." % (get_ctime()))
     for l in junctionsfile.readlines():
-        if "read_count" in l: continue
+        if "read_count" in l:
+            continue
         l = l.strip().split("\t")
         chrom = l[0]
         start = l[1]
-        end = str(int(l[2])-1)
+        end = str(int(l[2]) - 1)
         strand = l[3]
-        jid = chrom+"##"+start+"##"+end+"##"+strand                     # create a unique junction ID for each line in the BSJ junction file and make it the dict key ... easy for searching!
-        samheader['RG'].append({'ID':jid, 'LB':args.library, 'PL':args.platform, 'PU':args.unit,'SM':args.samplename})
+        jid = (
+            chrom + "##" + start + "##" + end + "##" + strand
+        )  # create a unique junction ID for each line in the BSJ junction file and make it the dict key ... easy for searching!
+        samheader["RG"].append(
+            {
+                "ID": jid,
+                "LB": args.library,
+                "PL": args.platform,
+                "PU": args.unit,
+                "SM": args.samplename,
+            }
+        )
         junctions[jid] = int(l[4])
         junctions_found[jid] = 0
     junctionsfile.close()
     # print(junctions)
     sequences = list()
-    for v in samheader['SQ']:
-        sequences.append(v['SN'])
-    seqname2regionname=dict()
-    hosts=set()
-    viruses=set()
-    regions = read_regions(regionsfile=args.regions,host=args.host,additives=args.additives,viruses=args.viruses)
+    for v in samheader["SQ"]:
+        sequences.append(v["SN"])
+    seqname2regionname = dict()
+    hosts = set()
+    viruses = set()
+    regions = read_regions(
+        regionsfile=args.regions,
+        host=args.host,
+        additives=args.additives,
+        viruses=args.viruses,
+    )
     for s in sequences:
-        hav = _get_host_additive_virus(regions,s)
+        hav = _get_host_additive_virus(regions, s)
         if hav == "host":
-            hostname = _get_regionname_from_seqname(regions,s)
-            seqname2regionname[s]=hostname
+            hostname = _get_regionname_from_seqname(regions, s)
+            seqname2regionname[s] = hostname
             hosts.add(hostname)
         if hav == "virus":
-            virusname = _get_regionname_from_seqname(regions,s)
-            seqname2regionname[s]=virusname
+            virusname = _get_regionname_from_seqname(regions, s)
+            seqname2regionname[s] = virusname
             viruses.add(virusname)
-    print("%s | Done reading %d junctions."%(get_ctime(),len(junctions)))
-
+    print("%s | Done reading %d junctions." % (get_ctime(), len(junctions)))
 
-    bigdict=dict()
-    print("%s | Reading...alignments!..."%(get_ctime()))
-    count=0
-    count2=0
+    bigdict = dict()
+    print("%s | Reading...alignments!..." % (get_ctime()))
+    count = 0
+    count2 = 0
     for read in samfile.fetch():
-        count+=1
-        satag=read.get_tag("SA")
-        satagchrids=list(map(lambda x:samfile.get_tid(x),list(filter(lambda x:x!='',list(map(lambda x:x.split(",")[0],satag.split(";")))))))
-        if not read.reference_id in satagchrids: continue    # specific for SE as read.next_reference_id is -1 for SE
-        count2+=1
-        rid=get_uniq_readid(read)                           # add the HI number to the readid
-        if debug:print(rid)
+        count += 1
+        satag = read.get_tag("SA")
+        satagchrids = list(
+            map(
+                lambda x: samfile.get_tid(x),
+                list(
+                    filter(
+                        lambda x: x != "",
+                        list(map(lambda x: x.split(",")[0], satag.split(";"))),
+                    )
+                ),
+            )
+        )
+        if not read.reference_id in satagchrids:
+            continue  # specific for SE as read.next_reference_id is -1 for SE
+        count2 += 1
+        rid = get_uniq_readid(read)  # add the HI number to the readid
+        if debug:
+            print(rid)
         if not rid in bigdict:
-            bigdict[rid]=Readinfo(rid,read.reference_name)
+            bigdict[rid] = Readinfo(rid, read.reference_name)
         # bigdict[rid].append_alignment(read)                 # since rid has HI number included ... this separates alignment by HI
-        bitflag=get_bitflag(read)
-        if debug:print(bitflag)
-        bigdict[rid].append_bitflag(bitflag)                # each rid can have upto 3 lines in the BAM with each having its own bitflag ... collect all bitflags in a list here 
-        refpos=list(filter(lambda x:x!=None,read.get_reference_positions(full_length=True)))
+        bitflag = get_bitflag(read)
+        if debug:
+            print(bitflag)
+        bigdict[rid].append_bitflag(
+            bitflag
+        )  # each rid can have upto 3 lines in the BAM with each having its own bitflag ... collect all bitflags in a list here
+        refpos = list(
+            filter(lambda x: x != None, read.get_reference_positions(full_length=True))
+        )
         # if debug:print(refpos)
-        bigdict[rid].set_refcoordinates(bitflag,refpos)     # maintain a list of reference coordinated that are "aligned" for each bitflag in each rid alignment
-        bigdict[rid].set_cigarstr(bitflag,read.cigarstring)
-        bigdict[rid].set_read1_reverse_secondary_supplementary(bitflag,read)
-        if debug:print(bigdict[rid])
-    print("%s | Done reading %d chimeric alignments. [%d same chrom chimeras]"%(get_ctime(),count,count2))
+        bigdict[rid].set_refcoordinates(
+            bitflag, refpos
+        )  # maintain a list of reference coordinated that are "aligned" for each bitflag in each rid alignment
+        bigdict[rid].set_cigarstr(bitflag, read.cigarstring)
+        bigdict[rid].set_read1_reverse_secondary_supplementary(bitflag, read)
+        if debug:
+            print(bigdict[rid])
+    print(
+        "%s | Done reading %d chimeric alignments. [%d same chrom chimeras]"
+        % (get_ctime(), count, count2)
+    )
     if debug:
         for rid in bigdict.keys():
-            print(">>>%s\t%s\t%s\t%s"%(rid,bigdict[rid].isreverse,bigdict[rid].cigarstrs,bigdict[rid].refcoordinates))
+            print(
+                ">>>%s\t%s\t%s\t%s"
+                % (
+                    rid,
+                    bigdict[rid].isreverse,
+                    bigdict[rid].cigarstrs,
+                    bigdict[rid].refcoordinates,
+                )
+            )
     samfile.reset()
 
-    print("%s | Writing BAMs"%(get_ctime()))
-    plusfile = pysam.AlignmentFile(args.plusbam, "wb", header = samheader)
-    minusfile = pysam.AlignmentFile(args.minusbam, "wb", header = samheader)
-    outfile = pysam.AlignmentFile(args.outbam, "wb", header = samheader)
+    print("%s | Writing BAMs" % (get_ctime()))
+    plusfile = pysam.AlignmentFile(args.plusbam, "wb", header=samheader)
+    minusfile = pysam.AlignmentFile(args.minusbam, "wb", header=samheader)
+    outfile = pysam.AlignmentFile(args.outbam, "wb", header=samheader)
     outputbams = dict()
     if args.outputhostbams:
         for h in hosts:
-            outbamname = os.path.join(args.outdir,args.samplename+"."+h+".BSJ.bam")
-            outputbams[h] = pysam.AlignmentFile(outbamname, "wb", header = samheader)
+            outbamname = os.path.join(
+                args.outdir, args.samplename + "." + h + ".BSJ.bam"
+            )
+            outputbams[h] = pysam.AlignmentFile(outbamname, "wb", header=samheader)
     if args.outputvirusbams:
         for v in viruses:
-            outbamname = os.path.join(args.outdir,args.samplename+"."+v+".BSJ.bam")
-            outputbams[v] = pysam.AlignmentFile(outbamname, "wb", header = samheader)     
-    bsjdict=dict()
-    bitid_counts=dict()
+            outbamname = os.path.join(
+                args.outdir, args.samplename + "." + v + ".BSJ.bam"
+            )
+            outputbams[v] = pysam.AlignmentFile(outbamname, "wb", header=samheader)
+    bsjdict = dict()
+    bitid_counts = dict()
     lenoutputbams = len(outputbams)
     for read in samfile.fetch():
-        satag=read.get_tag("SA")
-        satagchrids=list(map(lambda x:samfile.get_tid(x),list(filter(lambda x:x!='',list(map(lambda x:x.split(",")[0],satag.split(";")))))))
-        if not read.reference_id in satagchrids: continue    # specific for SE as read.next_reference_id is -1 for SE
-        rid=get_uniq_readid(read)
-        if rid in bigdict:    
-            bigdict[rid].generate_bitid()                       # separate all bitflags for the same rid with ## and create a unique single bitflag ... bitflags are pre-sorted
-            if debug:print(bigdict[rid])                        
-            bigdict[rid].get_strand()                           # use the unique aggregated bitid to extract the strand information ... all possible cases are explicitly covered
-            if not bigdict[rid].validate_BSJ_read(junctions=junctions): # ensure that the read alignments leftmost and rightmost coordinates match with one of the BSJ junctions... if yes then that rid represents a BSJ. Also add start and end to the BSJ object
+        satag = read.get_tag("SA")
+        satagchrids = list(
+            map(
+                lambda x: samfile.get_tid(x),
+                list(
+                    filter(
+                        lambda x: x != "",
+                        list(map(lambda x: x.split(",")[0], satag.split(";"))),
+                    )
+                ),
+            )
+        )
+        if not read.reference_id in satagchrids:
+            continue  # specific for SE as read.next_reference_id is -1 for SE
+        rid = get_uniq_readid(read)
+        if rid in bigdict:
+            bigdict[
+                rid
+            ].generate_bitid()  # separate all bitflags for the same rid with ## and create a unique single bitflag ... bitflags are pre-sorted
+            if debug:
+                print(bigdict[rid])
+            bigdict[
+                rid
+            ].get_strand()  # use the unique aggregated bitid to extract the strand information ... all possible cases are explicitly covered
+            if not bigdict[rid].validate_BSJ_read(
+                junctions=junctions
+            ):  # ensure that the read alignments leftmost and rightmost coordinates match with one of the BSJ junctions... if yes then that rid represents a BSJ. Also add start and end to the BSJ object
                 continue
             # bigdict[rid].get_start_end()
             # print(bigdict[rid])
-            bsjid=bigdict[rid].get_bsjid()
-            chrom=_bsjid2chrom(bsjid)
+            bsjid = bigdict[rid].get_bsjid()
+            chrom = _bsjid2chrom(bsjid)
             # jid,chrom=_bsjid2jid(bsjid)
             read.set_tag("RG", bsjid, value_type="Z")
-            if bigdict[rid].strand=="+":
+            if bigdict[rid].strand == "+":
                 plusfile.write(read)
-            if bigdict[rid].strand=="-":
+            if bigdict[rid].strand == "-":
                 minusfile.write(read)
             outfile.write(read)
             if lenoutputbams != 0:
-                regionname=_get_regionname_from_seqname(regions,chrom)
+                regionname = _get_regionname_from_seqname(regions, chrom)
                 if regionname in hosts and args.outputhostbams:
                     outputbams[regionname].write(read)
                 if regionname in viruses and args.outputvirusbams:
                     outputbams[regionname].write(read)
             if not bsjid in bsjdict:
-                bsjdict[bsjid]=BSJ()
+                bsjdict[bsjid] = BSJ()
                 bsjdict[bsjid].set_chrom(bigdict[rid].refname)
                 bsjdict[bsjid].set_start(bigdict[rid].start)
                 bsjdict[bsjid].set_end(bigdict[rid].end)
@@ -429,42 +626,42 @@ def main():
             # bsjdict[bsjid].plusone()
             bsjdict[bsjid].append_bitid(bigdict[rid].bitid)
             if not bigdict[rid].bitid in bitid_counts:
-                bitid_counts[bigdict[rid].bitid]=0
-            bitid_counts[bigdict[rid].bitid]+=1
+                bitid_counts[bigdict[rid].bitid] = 0
+            bitid_counts[bigdict[rid].bitid] += 1
             bsjdict[bsjid].append_rid(rid)
     plusfile.close()
     minusfile.close()
     samfile.close()
     outfile.close()
     if lenoutputbams != 0:
-        for k,v in outputbams.items():
+        for k, v in outputbams.items():
             v.close()
-    print("%s | Done!"%(get_ctime()))	
+    print("%s | Done!" % (get_ctime()))
     for b in bitid_counts.keys():
-        print(b,bitid_counts[b])
-    print("%s | Writing BED"%(get_ctime()))
-    with gzip.open(args.bed,'wt') as bsjfile:
+        print(b, bitid_counts[b])
+    print("%s | Writing BED" % (get_ctime()))
+    with gzip.open(args.bed, "wt") as bsjfile:
         for bsjid in bsjdict.keys():
             bsjdict[bsjid].update_score_and_found_count(junctions_found)
             bsjdict[bsjid].write_out_BSJ(bsjfile)
     bsjfile.close()
 
-    args.junctionsfound.write("#chrom\tstart\tend\tstrand\texpected_BSJ_reads\tfound_BSJ_reads\n")
+    args.junctionsfound.write(
+        "#chrom\tstart\tend\tstrand\texpected_BSJ_reads\tfound_BSJ_reads\n"
+    )
     for jid in junctions.keys():
-        x=jid.split("##")
-        chrom=x[0]
-        start=int(x[1])
-        end=int(x[2])+1
-        strand=x[3]
-        args.junctionsfound.write("%s\t%d\t%d\t%s\t%d\t%d\n"%(chrom,start,end,strand,junctions[jid],junctions_found[jid]))
+        x = jid.split("##")
+        chrom = x[0]
+        start = int(x[1])
+        end = int(x[2]) + 1
+        strand = x[3]
+        args.junctionsfound.write(
+            "%s\t%d\t%d\t%s\t%d\t%d\n"
+            % (chrom, start, end, strand, junctions[jid], junctions_found[jid])
+        )
     args.junctionsfound.close()
-    print("%s | ALL Done!"%(get_ctime()))
-    
-        
-
+    print("%s | ALL Done!" % (get_ctime()))
 
 
 if __name__ == "__main__":
     main()
-
-
diff --git a/workflow/scripts/_create_circExplorer_BSJ_hqonly_pe.py b/workflow/scripts/_create_circExplorer_BSJ_hqonly_pe.py
index 768bc1c..fb4e2dd 100755
--- a/workflow/scripts/_create_circExplorer_BSJ_hqonly_pe.py
+++ b/workflow/scripts/_create_circExplorer_BSJ_hqonly_pe.py
@@ -6,9 +6,11 @@
 import time
 import pandas as pd
 
+
 def get_ctime():
     return time.ctime(time.time())
 
+
 """
 
 This script first validates each read to be "valid" BSJ read and then splits a BSJ bam file by strand into:
@@ -17,10 +19,10 @@ def get_ctime():
 3. BSJ bed file with score(number of reads supporting the BSJ) and strand information
 Logic (for PE reads):
 Each BSJ is represented by a 3 alignments in the output BAM file.
-Alignment 1 is complete alignment of one of the reads in pair and 
-Alignments 2 and 3 are split alignment of the mate at two distinct loci on the same reference 
+Alignment 1 is complete alignment of one of the reads in pair and
+Alignments 2 and 3 are split alignment of the mate at two distinct loci on the same reference
 chromosome.
-These alignments are grouped together by the "HI" tags in SAM file. For example, all 3 
+These alignments are grouped together by the "HI" tags in SAM file. For example, all 3
 alignments for the same BSJ will have the same "HI" value... something like "HI:i:1".
 BSJ alignment sam bitflag combinations can have 8 different possibilities, 4 from sense strand
 and 4 from anti-sense strand:
@@ -36,12 +38,12 @@ def get_ctime():
 #         |<------------------BSJ----------------->|
 3. 83,163,2209
 4. 339,419,2465
-#         						  R1									  
-#       						<------									
+#         						  R1
+#       						<------
 #     5'--|------------------------------------------|---3'
 #     3'--|------------------------------------------|---5'
 #         |------>							  ------>|
-#         | R2.2								R2.1 | 
+#         | R2.2								R2.1 |
 #         |                                          |
 #         |<-----------------BSJ-------------------->|
 5. 99,147,2193
@@ -56,12 +58,12 @@ def get_ctime():
 #         |<------------------BSJ----------------->|
 7. 99,147,2145
 8. 355, 403, 2401
-#         						  R2									  
-#       						<------									
+#         						  R2
+#       						<------
 #     5'--|------------------------------------------|---3'
 #     3'--|------------------------------------------|---5'
 #         |------>							  ------>|
-#         | R1.2								R1.1 | 
+#         | R1.2								R1.1 |
 #         |                                          |
 #         |<-----------------BSJ-------------------->|
 """
@@ -69,38 +71,38 @@ def get_ctime():
 
 class BSJ:
     def __init__(self):
-        self.chrom=""
-        self.start=""
-        self.end=""
-        self.score=0
-        self.name="."
-        self.strand="U"
-        self.bitids=set()
-        self.rids=set()
-        
+        self.chrom = ""
+        self.start = ""
+        self.end = ""
+        self.score = 0
+        self.name = "."
+        self.strand = "U"
+        self.bitids = set()
+        self.rids = set()
+
     def plusone(self):
-        self.score+=1
-    
-    def set_strand(self,strand):
-        self.strand=strand
-    
-    def set_chrom(self,chrom):
-        self.chrom=chrom
-
-    def set_start(self,start):
-        self.start=start
-
-    def set_end(self,end):
-        self.end=end
-    
-    def append_bitid(self,bitid):
+        self.score += 1
+
+    def set_strand(self, strand):
+        self.strand = strand
+
+    def set_chrom(self, chrom):
+        self.chrom = chrom
+
+    def set_start(self, start):
+        self.start = start
+
+    def set_end(self, end):
+        self.end = end
+
+    def append_bitid(self, bitid):
         self.bitids.add(bitid)
 
-    def append_rid(self,rid):
+    def append_rid(self, rid):
         self.rids.add(rid)
-        
-    def write_out_BSJ(self,outbed):
-        t=[]
+
+    def write_out_BSJ(self, outbed):
+        t = []
         t.append(self.chrom)
         t.append(str(self.start))
         t.append(str(self.end))
@@ -109,149 +111,164 @@ def write_out_BSJ(self,outbed):
         t.append(self.strand)
         t.append(",".join(self.bitids))
         t.append(",".join(self.rids))
-        outbed.write("\t".join(t)+"\n")
+        outbed.write("\t".join(t) + "\n")
 
-    def update_score_and_found_count(self,junctions_found):
+    def update_score_and_found_count(self, junctions_found):
         self.score = len(self.rids)
-        jid = self.chrom + "##" + str(self.start) + "##" + str(int(self.end)-1) + "##" + self.strand
-        junctions_found[jid]+=self.score
+        jid = (
+            self.chrom
+            + "##"
+            + str(self.start)
+            + "##"
+            + str(int(self.end) - 1)
+            + "##"
+            + self.strand
+        )
+        junctions_found[jid] += self.score
+
 
-        
 class Readinfo:
-    def __init__(self,readid,rname):
-        self.readid=readid
-        self.refname=rname
-        self.bitflags=list()
-        self.bitid=""
-        self.strand="."
-        self.start=-1
-        self.end=-1
-        self.refcoordinates=dict()
-        self.isread1=dict()
-        self.isreverse=dict()
-        self.issecondary=dict()
-        self.issupplementary=dict()
-    
+    def __init__(self, readid, rname):
+        self.readid = readid
+        self.refname = rname
+        self.bitflags = list()
+        self.bitid = ""
+        self.strand = "."
+        self.start = -1
+        self.end = -1
+        self.refcoordinates = dict()
+        self.isread1 = dict()
+        self.isreverse = dict()
+        self.issecondary = dict()
+        self.issupplementary = dict()
+
     def __str__(self):
-        s = "readid: %s"%(self.readid)
-        s = "%s\tbitflags: %s"%(s,self.bitflags)
-        s = "%s\tbitid: %s"%(s,self.bitid)
+        s = "readid: %s" % (self.readid)
+        s = "%s\tbitflags: %s" % (s, self.bitflags)
+        s = "%s\tbitid: %s" % (s, self.bitid)
         for bf in self.bitflags:
-            s = "%s\t%s\trefcoordinates: %s"%(s,bf,", ".join(list(map(lambda x:str(x),self.refcoordinates[bf]))))
+            s = "%s\t%s\trefcoordinates: %s" % (
+                s,
+                bf,
+                ", ".join(list(map(lambda x: str(x), self.refcoordinates[bf]))),
+            )
         return s
 
-    def set_refcoordinates(self,bitflag,refpos):
-        self.refcoordinates[bitflag]=refpos
-    
-    def set_read1_reverse_secondary_supplementary(self,bitflag,read):
+    def set_refcoordinates(self, bitflag, refpos):
+        self.refcoordinates[bitflag] = refpos
+
+    def set_read1_reverse_secondary_supplementary(self, bitflag, read):
         if read.is_read1:
-            self.isread1[bitflag]="Y"
+            self.isread1[bitflag] = "Y"
         else:
-            self.isread1[bitflag]="N"
+            self.isread1[bitflag] = "N"
         if read.is_reverse:
-            self.isreverse[bitflag]="Y"
+            self.isreverse[bitflag] = "Y"
         else:
-            self.isreverse[bitflag]="N"
+            self.isreverse[bitflag] = "N"
         if read.is_secondary:
-            self.issecondary[bitflag]="Y"
+            self.issecondary[bitflag] = "Y"
         else:
-            self.issecondary[bitflag]="N"
+            self.issecondary[bitflag] = "N"
         if read.is_supplementary:
-            self.issupplementary[bitflag]="Y"
+            self.issupplementary[bitflag] = "Y"
         else:
-            self.issupplementary[bitflag]="N"
-    
-    def append_alignment(self,read):
+            self.issupplementary[bitflag] = "N"
+
+    def append_alignment(self, read):
         self.alignments.append(read)
-    
-    def append_bitflag(self,bf):
+
+    def append_bitflag(self, bf):
         self.bitflags.append(bf)
-    
+
     # def extend_ref_positions(self,refcoords):
     # 	self.refcoordinates.extend(refcoords)
-    
+
     def generate_bitid(self):
-        bitlist=sorted(self.bitflags)
-        self.bitid="##".join(list(map(lambda x:str(x),bitlist)))
-# 		self.bitid=str(bitlist[0])+"##"+str(bitlist[1])+"##"+str(bitlist[2])
-    
+        bitlist = sorted(self.bitflags)
+        self.bitid = "##".join(list(map(lambda x: str(x), bitlist)))
+
+    # 		self.bitid=str(bitlist[0])+"##"+str(bitlist[1])+"##"+str(bitlist[2])
+
     def get_strand(self):
-        if self.bitid=="83##163##2129":
-            self.strand="+"
-        elif self.bitid=="339##419##2385":
-            self.strand="+"
-        elif self.bitid=="83##163##2209":
-            self.strand="+"
-        elif self.bitid=="339##419##2465":
-            self.strand="+"		
-        elif self.bitid=="99##147##2193":
-            self.strand="-"
-        elif self.bitid=="355##403##2449":
-            self.strand="-"
-        elif self.bitid=="99##147##2145":
-            self.strand="-"
-        elif self.bitid=="355##403##2401":
-            self.strand="-"
-        elif self.bitid=="16##2064":
-            self.strand="+"
-        elif self.bitid=="272##2320":
-            self.strand="+"
-        elif self.bitid=="0##2048":
-            self.strand="-"
-        elif self.bitid=="256##2304":
-            self.strand="-"
-        elif self.bitid=="153##2201":
-            self.strand="-"
+        if self.bitid == "83##163##2129":
+            self.strand = "+"
+        elif self.bitid == "339##419##2385":
+            self.strand = "+"
+        elif self.bitid == "83##163##2209":
+            self.strand = "+"
+        elif self.bitid == "339##419##2465":
+            self.strand = "+"
+        elif self.bitid == "99##147##2193":
+            self.strand = "-"
+        elif self.bitid == "355##403##2449":
+            self.strand = "-"
+        elif self.bitid == "99##147##2145":
+            self.strand = "-"
+        elif self.bitid == "355##403##2401":
+            self.strand = "-"
+        elif self.bitid == "16##2064":
+            self.strand = "+"
+        elif self.bitid == "272##2320":
+            self.strand = "+"
+        elif self.bitid == "0##2048":
+            self.strand = "-"
+        elif self.bitid == "256##2304":
+            self.strand = "-"
+        elif self.bitid == "153##2201":
+            self.strand = "-"
         else:
-            self.strand="."
-    
+            self.strand = "."
+
     def flip_strand(self):
-        if self.strand=="+":self.strand="-"
-        if self.strand=="-":self.strand="+"
+        if self.strand == "+":
+            self.strand = "-"
+        if self.strand == "-":
+            self.strand = "+"
 
-    def validate_BSJ_read(self,junctions):
+    def validate_BSJ_read(self, junctions):
         """
         Checks if read is truly a BSJ originitor.
         * Defines left, right and middle alignments
         * Left and right alignments should not overlap
         * Middle alignment should be between left and right alignments
         """
-        if len(self.bitid.split("##"))==3:
-            left=-1
-            right=-1
-            middle=-1
-            if self.bitid=="83##163##2129":
-                left=2129
-                right=83
-                middle=163
-            if self.bitid=="339##419##2385":
-                left=2385
-                right=339
-                middle=419				
-            if self.bitid=="83##163##2209":
-                left=163
-                right=2209
-                middle=83
-            if self.bitid=="339##419##2465":
-                left=419
-                right=2465
-                middle=339
-            if self.bitid=="99##147##2145":
-                left=99
-                right=2145
-                middle=147
-            if self.bitid=="355##403##2401":
-                left=355
-                right=2401
-                middle=403
-            if self.bitid=="99##147##2193":
-                left=2193
-                right=147
-                middle=99
-            if self.bitid=="355##403##2449":
-                left=2449
-                right=403
-                middle=355
+        if len(self.bitid.split("##")) == 3:
+            left = -1
+            right = -1
+            middle = -1
+            if self.bitid == "83##163##2129":
+                left = 2129
+                right = 83
+                middle = 163
+            if self.bitid == "339##419##2385":
+                left = 2385
+                right = 339
+                middle = 419
+            if self.bitid == "83##163##2209":
+                left = 163
+                right = 2209
+                middle = 83
+            if self.bitid == "339##419##2465":
+                left = 419
+                right = 2465
+                middle = 339
+            if self.bitid == "99##147##2145":
+                left = 99
+                right = 2145
+                middle = 147
+            if self.bitid == "355##403##2401":
+                left = 355
+                right = 2401
+                middle = 403
+            if self.bitid == "99##147##2193":
+                left = 2193
+                right = 147
+                middle = 99
+            if self.bitid == "355##403##2449":
+                left = 2449
+                right = 403
+                middle = 355
             # print(left,right,middle)
             if left == -1 or right == -1 or middle == -1:
                 return False
@@ -262,89 +279,95 @@ def validate_BSJ_read(self,junctions):
             # print("validate_BSJ_read",self.readid,self.refcoordinates[middle][0],self.refcoordinates[middle][-1])
             leftmost = str(self.refcoordinates[left][0])
             rightmost = str(self.refcoordinates[right][-1])
-            possiblejid = chrom+"##"+leftmost+"##"+rightmost+"##"+self.strand
+            possiblejid = (
+                chrom + "##" + leftmost + "##" + rightmost + "##" + self.strand
+            )
             # print("validate_BSJ_read",self.readid,possiblejid)
             if possiblejid in junctions:
                 self.start = leftmost
-                self.end = str(int(rightmost) + 1)    # this will be added to the BED file
+                self.end = str(int(rightmost) + 1)  # this will be added to the BED file
                 return True
         else:
             return False
-            
-    
-    
+
     def get_bsjid(self):
-        t=[]
+        t = []
         t.append(self.refname)
         t.append(self.start)
         t.append(self.end)
         t.append(self.strand)
         return "##".join(t)
-    
-    def write_out_reads(self,outbam):
+
+    def write_out_reads(self, outbam):
         for r in self.alignments:
             outbam.write(r)
-        
-            
+
+
 def get_uniq_readid(r):
-    rname=r.query_name
-    hi=r.get_tag("HI")
-    rid=rname+"##"+str(hi)
+    rname = r.query_name
+    hi = r.get_tag("HI")
+    rid = rname + "##" + str(hi)
     return rid
 
+
 def get_bitflag(r):
-    bitflag=str(r).split("\t")[1]
+    bitflag = str(r).split("\t")[1]
     return int(bitflag)
 
+
 def _bsjid2chrom(bsjid):
-    x=bsjid.split("##")
+    x = bsjid.split("##")
     return x[0]
 
+
 def _bsjid2jid(bsjid):
-    x=bsjid.split("##")
-    chrom=x[0]
-    start=x[1]
-    end=str(int(x[2])-1)
-    jid="##".join([chrom,start,end])
-    return jid,chrom
-
-def read_regions(regionsfile,host,additives,viruses):
-    host=host.split(",")
-    additives=additives.split(",")
-    viruses=viruses.split(",")
-    infile=open(regionsfile,'r')
-    regions=dict()
+    x = bsjid.split("##")
+    chrom = x[0]
+    start = x[1]
+    end = str(int(x[2]) - 1)
+    jid = "##".join([chrom, start, end])
+    return jid, chrom
+
+
+def read_regions(regionsfile, host, additives, viruses):
+    host = host.split(",")
+    additives = additives.split(",")
+    viruses = viruses.split(",")
+    infile = open(regionsfile, "r")
+    regions = dict()
     for l in infile.readlines():
         l = l.strip().split("\t")
-        region_name=l[0]
-        regions[region_name]=dict()
-        regions[region_name]['sequences']=dict()
+        region_name = l[0]
+        regions[region_name] = dict()
+        regions[region_name]["sequences"] = dict()
         if region_name in host:
-            regions[region_name]['host_additive_virus']="host"
+            regions[region_name]["host_additive_virus"] = "host"
         elif region_name in additives:
-            regions[region_name]['host_additive_virus']="additive"
+            regions[region_name]["host_additive_virus"] = "additive"
         elif region_name in viruses:
-            regions[region_name]['host_additive_virus']="virus"
+            regions[region_name]["host_additive_virus"] = "virus"
         else:
             exit("%s has unknown region. Its not a host or a additive or a virus!!")
-        sequence_names=l[1].split()
+        sequence_names = l[1].split()
         for s in sequence_names:
-            regions[region_name]['sequences'][s]=1
-    return regions        
+            regions[region_name]["sequences"][s] = 1
+    return regions
+
 
-def _get_host_additive_virus(regions,seqname):
-    for k,v in regions.items():
-        if seqname in v['sequences']:
-            return v['host_additive_virus']
+def _get_host_additive_virus(regions, seqname):
+    for k, v in regions.items():
+        if seqname in v["sequences"]:
+            return v["host_additive_virus"]
     else:
-        exit("Sequence: %s does not have a region."%(seqname))
+        exit("Sequence: %s does not have a region." % (seqname))
 
-def _get_regionname_from_seqname(regions,seqname):
-    for k,v in regions.items():
-        if seqname in v['sequences']:
+
+def _get_regionname_from_seqname(regions, seqname):
+    for k, v in regions.items():
+        if seqname in v["sequences"]:
             return k
     else:
-        exit("Sequence: %s does not have a region."%(seqname))
+        exit("Sequence: %s does not have a region." % (seqname))
 
 
 def main():
@@ -356,64 +379,190 @@ def main():
         where the chrom, start and end represent the BSJ the read is depicting.
         """
     )
-    parser.add_argument("-i","--inbam",dest="inbam",required=True,type=str,
-        help="Input Chimeric-only STAR2p BAM file")
-    parser.add_argument('-t','--sample_counts_table', dest='countstable', type=str, required=True,
-        help='final all sample counts matrix')	# get coordinates of the circRNA
-    parser.add_argument('--hqonly', dest='hqonly', action='store_true',
-        help='filter out non HQ calls')
-    parser.add_argument("-s",'--sample_name', dest='samplename', type=str, required=False, default = 'sample1',
-        help='Sample Name: SM for RG')
-    parser.add_argument("-l",'--library', dest='library', type=str, required=False, default = 'lib1',
-        help='Sample Name: LB for RG')
-    parser.add_argument("-f",'--platform', dest='platform', type=str, required=False, default = 'illumina',
-        help='Sample Name: PL for RG')
-    parser.add_argument("-u",'--unit', dest='unit', type=str, required=False, default = 'unit1',
-        help='Sample Name: PU for RG')
-    parser.add_argument("-o","--outbam",dest="outbam",required=True,type=argparse.FileType('w'),
-        help="Output bam file ... both strands")
-    parser.add_argument("-p","--plusbam",dest="plusbam",required=True,type=argparse.FileType('w'),
-        help="Output plus strand bam file")
-    parser.add_argument("-m","--minusbam",dest="minusbam",required=True,type=argparse.FileType('w'),
-        help="Output plus strand bam file")
-    parser.add_argument("--outputhostbams",dest="outputhostbams",required=False,action='store_true', default=False,
-        help="Output individual host BAM files")
-    parser.add_argument("--outputvirusbams",dest="outputvirusbams",required=False,action='store_true', default=False,
-        help="Output individual virus BAM files")
-    parser.add_argument("--outdir",dest="outdir",required=False,type=str,
-        help="Output folder for the individual BAM files (required only if --outputhostbams or --outputvirusbams is used).")
-    parser.add_argument("-b","--bed",dest="bed",required=True,type=str,
-        help="Output BSJ bed.gz file (with strand info)")
-    parser.add_argument("-j","--junctionsfound",dest="junctionsfound",required=True,type=argparse.FileType('w', encoding='UTF-8'),
-        help="Output TSV file with counts of junctions expected vs found")
-    parser.add_argument('--regions', dest='regions', type=str, required=True,
-        help='regions file eg. ref.fa.regions')
-    parser.add_argument('--host', dest='host', type=str, required=True,
-        help='host name eg.hg38... single value')
-    parser.add_argument('--additives', dest='additives', type=str, required=True,
-        help='additive name(s) eg.ERCC... comma-separated list... all BSJs in this region are filtered out')
-    parser.add_argument('--viruses', dest='viruses', type=str, required=True,
-        help='virus name(s) eg.NC_009333.1... comma-separated list')
-    args = parser.parse_args()		
+    parser.add_argument(
+        "-i",
+        "--inbam",
+        dest="inbam",
+        required=True,
+        type=str,
+        help="Input Chimeric-only STAR2p BAM file",
+    )
+    parser.add_argument(
+        "-t",
+        "--sample_counts_table",
+        dest="countstable",
+        type=str,
+        required=True,
+        help="final all sample counts matrix",
+    )  # get coordinates of the circRNA
+    parser.add_argument(
+        "--hqonly", dest="hqonly", action="store_true", help="filter out non HQ calls"
+    )
+    parser.add_argument(
+        "-s",
+        "--sample_name",
+        dest="samplename",
+        type=str,
+        required=False,
+        default="sample1",
+        help="Sample Name: SM for RG",
+    )
+    parser.add_argument(
+        "-l",
+        "--library",
+        dest="library",
+        type=str,
+        required=False,
+        default="lib1",
+        help="Sample Name: LB for RG",
+    )
+    parser.add_argument(
+        "-f",
+        "--platform",
+        dest="platform",
+        type=str,
+        required=False,
+        default="illumina",
+        help="Sample Name: PL for RG",
+    )
+    parser.add_argument(
+        "-u",
+        "--unit",
+        dest="unit",
+        type=str,
+        required=False,
+        default="unit1",
+        help="Sample Name: PU for RG",
+    )
+    parser.add_argument(
+        "-o",
+        "--outbam",
+        dest="outbam",
+        required=True,
+        type=argparse.FileType("w"),
+        help="Output bam file ... both strands",
+    )
+    parser.add_argument(
+        "-p",
+        "--plusbam",
+        dest="plusbam",
+        required=True,
+        type=argparse.FileType("w"),
+        help="Output plus strand bam file",
+    )
+    parser.add_argument(
+        "-m",
+        "--minusbam",
+        dest="minusbam",
+        required=True,
+        type=argparse.FileType("w"),
+        help="Output plus strand bam file",
+    )
+    parser.add_argument(
+        "--outputhostbams",
+        dest="outputhostbams",
+        required=False,
+        action="store_true",
+        default=False,
+        help="Output individual host BAM files",
+    )
+    parser.add_argument(
+        "--outputvirusbams",
+        dest="outputvirusbams",
+        required=False,
+        action="store_true",
+        default=False,
+        help="Output individual virus BAM files",
+    )
+    parser.add_argument(
+        "--outdir",
+        dest="outdir",
+        required=False,
+        type=str,
+        help="Output folder for the individual BAM files (required only if --outputhostbams or --outputvirusbams is used).",
+    )
+    parser.add_argument(
+        "-b",
+        "--bed",
+        dest="bed",
+        required=True,
+        type=str,
+        help="Output BSJ bed.gz file (with strand info)",
+    )
+    parser.add_argument(
+        "-j",
+        "--junctionsfound",
+        dest="junctionsfound",
+        required=True,
+        type=argparse.FileType("w", encoding="UTF-8"),
+        help="Output TSV file with counts of junctions expected vs found",
+    )
+    parser.add_argument(
+        "--regions",
+        dest="regions",
+        type=str,
+        required=True,
+        help="regions file eg. ref.fa.regions",
+    )
+    parser.add_argument(
+        "--host",
+        dest="host",
+        type=str,
+        required=True,
+        help="host name eg.hg38... single value",
+    )
+    parser.add_argument(
+        "--additives",
+        dest="additives",
+        type=str,
+        required=True,
+        help="additive name(s) eg.ERCC... comma-separated list... all BSJs in this region are filtered out",
+    )
+    parser.add_argument(
+        "--viruses",
+        dest="viruses",
+        type=str,
+        required=True,
+        help="virus name(s) eg.NC_009333.1... comma-separated list",
+    )
+    args = parser.parse_args()
     samfile = pysam.AlignmentFile(args.inbam, "rb")
     samheader = samfile.header.to_dict()
-    samheader['RG']=list()
+    samheader["RG"] = list()
 
-    print("%s | Reading...junctions!..."%(get_ctime()))
-    indf = pd.read_csv(args.countstable,sep="\t",header=0,compression='gzip')
+    print("%s | Reading...junctions!..." % (get_ctime()))
+    indf = pd.read_csv(args.countstable, sep="\t", header=0, compression="gzip")
     # filter by samplename
-    indf = indf.loc[indf['sample_name']==args.samplename]
+    indf = indf.loc[indf["sample_name"] == args.samplename]
     # filter for hq
     if args.hqonly:
-        indf = indf.loc[indf['HQ']=="Y"]
-
-    junctions=dict()
-    junctions_found=dict()
-    
-    for index,row in indf.iterrows():
-        jid = row['chrom']+"##"+str(row['start'])+"##"+str(row['end'])+"##"+row['strand']
-        samheader['RG'].append({'ID':jid, 'LB':args.library, 'PL':args.platform, 'PU':args.unit,'SM':args.samplename})
-        junctions[jid] = max([row['circExplorer_read_count'],row['circExplorer_bwa_read_count']]) # large read count support from the "required" tools
+        indf = indf.loc[indf["HQ"] == "Y"]
+
+    junctions = dict()
+    junctions_found = dict()
+
+    for index, row in indf.iterrows():
+        jid = (
+            row["chrom"]
+            + "##"
+            + str(row["start"])
+            + "##"
+            + str(row["end"])
+            + "##"
+            + row["strand"]
+        )
+        samheader["RG"].append(
+            {
+                "ID": jid,
+                "LB": args.library,
+                "PL": args.platform,
+                "PU": args.unit,
+                "SM": args.samplename,
+            }
+        )
+        junctions[jid] = max(
+            [row["circExplorer_read_count"], row["circExplorer_bwa_read_count"]]
+        )  # large read count support from the "required" tools
         junctions_found[jid] = 0
 
     # junctionsfile = open(args.countstable,'r')
@@ -430,137 +579,171 @@ def main():
     #     junctions_found[jid] = 0
     # junctionsfile.close()
     sequences = list()
-    for v in samheader['SQ']:
-        sequences.append(v['SN'])
-    seqname2regionname=dict()
-    hosts=set()
-    viruses=set()
-    regions = read_regions(regionsfile=args.regions,host=args.host,additives=args.additives,viruses=args.viruses)
+    for v in samheader["SQ"]:
+        sequences.append(v["SN"])
+    seqname2regionname = dict()
+    hosts = set()
+    viruses = set()
+    regions = read_regions(
+        regionsfile=args.regions,
+        host=args.host,
+        additives=args.additives,
+        viruses=args.viruses,
+    )
     for s in sequences:
-        hav = _get_host_additive_virus(regions,s)
+        hav = _get_host_additive_virus(regions, s)
         if hav == "host":
-            hostname = _get_regionname_from_seqname(regions,s)
-            seqname2regionname[s]=hostname
+            hostname = _get_regionname_from_seqname(regions, s)
+            seqname2regionname[s] = hostname
             hosts.add(hostname)
         if hav == "virus":
-            virusname = _get_regionname_from_seqname(regions,s)
-            seqname2regionname[s]=virusname
+            virusname = _get_regionname_from_seqname(regions, s)
+            seqname2regionname[s] = virusname
             viruses.add(virusname)
-    print("%s | Done reading %d junctions."%(get_ctime(),len(junctions)))
+    print("%s | Done reading %d junctions." % (get_ctime(), len(junctions)))
 
-    bigdict=dict()
+    bigdict = dict()
     # print("Opening...")
     # print(args.inbam)
-    print("%s | Reading...alignments!..."%(get_ctime()))
-    count=0
-    count2=0
+    print("%s | Reading...alignments!..." % (get_ctime()))
+    count = 0
+    count2 = 0
     for read in samfile.fetch():
-        count+=1
-        if debug: print(read,read.reference_id,read.next_reference_id)    
-        if read.reference_id != read.next_reference_id: continue    # only works for PE ... for SE read.next_reference_id is -1
-        count2+=1
-        rid=get_uniq_readid(read)                           # add the HI number to the readid
-        if debug:print(rid)
+        count += 1
+        if debug:
+            print(read, read.reference_id, read.next_reference_id)
+        if read.reference_id != read.next_reference_id:
+            continue  # only works for PE ... for SE read.next_reference_id is -1
+        count2 += 1
+        rid = get_uniq_readid(read)  # add the HI number to the readid
+        if debug:
+            print(rid)
         if not rid in bigdict:
-            bigdict[rid]=Readinfo(rid,read.reference_name)
+            bigdict[rid] = Readinfo(rid, read.reference_name)
         # bigdict[rid].append_alignment(read)                 # since rid has HI number included ... this separates alignment by HI
-        bitflag=get_bitflag(read)
-        if debug:print(bitflag)
-        bigdict[rid].append_bitflag(bitflag)                # each rid can have upto 3 lines in the BAM with each having its own bitflag ... collect all bigflags in a list here 
-        refpos=list(filter(lambda x:x!=None,read.get_reference_positions(full_length=True)))
-        bigdict[rid].set_refcoordinates(bitflag,refpos)     # maintain a list of reference coordinated that are "aligned" for each bitflag in each rid alignment
+        bitflag = get_bitflag(read)
+        if debug:
+            print(bitflag)
+        bigdict[rid].append_bitflag(
+            bitflag
+        )  # each rid can have upto 3 lines in the BAM with each having its own bitflag ... collect all bigflags in a list here
+        refpos = list(
+            filter(lambda x: x != None, read.get_reference_positions(full_length=True))
+        )
+        bigdict[rid].set_refcoordinates(
+            bitflag, refpos
+        )  # maintain a list of reference coordinated that are "aligned" for each bitflag in each rid alignment
         # bigdict[rid].set_read1_reverse_secondary_supplementary(bitflag,read)
-        if debug:print(bigdict[rid])
-    print("%s | Done reading %d chimeric alignments. [%d same chrom chimeras]"%(get_ctime(),count,count2))
+        if debug:
+            print(bigdict[rid])
+    print(
+        "%s | Done reading %d chimeric alignments. [%d same chrom chimeras]"
+        % (get_ctime(), count, count2)
+    )
     samfile.reset()
-    print("%s | Writing BAMs"%(get_ctime()))
-    print("%s | Re-Reading...alignments!..."%(get_ctime()))
-    plusfile = pysam.AlignmentFile(args.plusbam, "wb", header = samheader)
-    minusfile = pysam.AlignmentFile(args.minusbam, "wb", header = samheader)
-    outfile = pysam.AlignmentFile(args.outbam, "wb", header = samheader)
+    print("%s | Writing BAMs" % (get_ctime()))
+    print("%s | Re-Reading...alignments!..." % (get_ctime()))
+    plusfile = pysam.AlignmentFile(args.plusbam, "wb", header=samheader)
+    minusfile = pysam.AlignmentFile(args.minusbam, "wb", header=samheader)
+    outfile = pysam.AlignmentFile(args.outbam, "wb", header=samheader)
     outputbams = dict()
     if args.outputhostbams:
         for h in hosts:
-            outbamname = os.path.join(args.outdir,args.samplename+"."+h+".BSJ.HQonly.bam")
-            outputbams[h] = pysam.AlignmentFile(outbamname, "wb", header = samheader)
+            outbamname = os.path.join(
+                args.outdir, args.samplename + "." + h + ".BSJ.HQonly.bam"
+            )
+            outputbams[h] = pysam.AlignmentFile(outbamname, "wb", header=samheader)
     if args.outputvirusbams:
         for v in viruses:
-            outbamname = os.path.join(args.outdir,args.samplename+"."+v+".BSJ.HQonly.bam")
-            outputbams[v] = pysam.AlignmentFile(outbamname, "wb", header = samheader)            
-    bsjdict=dict()
-    bitid_counts=dict()
+            outbamname = os.path.join(
+                args.outdir, args.samplename + "." + v + ".BSJ.HQonly.bam"
+            )
+            outputbams[v] = pysam.AlignmentFile(outbamname, "wb", header=samheader)
+    bsjdict = dict()
+    bitid_counts = dict()
     lenoutputbams = len(outputbams)
     for read in samfile.fetch():
-        if read.reference_id != read.next_reference_id: continue
-        rid=get_uniq_readid(read)
+        if read.reference_id != read.next_reference_id:
+            continue
+        rid = get_uniq_readid(read)
         if rid in bigdict:
-            bigdict[rid].generate_bitid()                       # separate all bitflags for the same rid with ## and create a unique single bitflag ... bitflags are pre-sorted
-            if debug:print(bigdict[rid])                        
-            bigdict[rid].get_strand()                           # use the unique aggregated bitid to extract the strand information ... all possible cases are explicitly covered
-            bigdict[rid].flip_strand()                          # strands are flipped than those reported in the counts table .. hence flipping!
-            if not bigdict[rid].validate_BSJ_read(junctions=junctions): # ensure that the read alignments leftmost and rightmost coordinates match with one of the BSJ junctions... if yes then that rid represents a BSJ. Also add start and end to the BSJ object
+            bigdict[
+                rid
+            ].generate_bitid()  # separate all bitflags for the same rid with ## and create a unique single bitflag ... bitflags are pre-sorted
+            if debug:
+                print(bigdict[rid])
+            bigdict[
+                rid
+            ].get_strand()  # use the unique aggregated bitid to extract the strand information ... all possible cases are explicitly covered
+            bigdict[
+                rid
+            ].flip_strand()  # strands are flipped than those reported in the counts table .. hence flipping!
+            if not bigdict[rid].validate_BSJ_read(
+                junctions=junctions
+            ):  # ensure that the read alignments leftmost and rightmost coordinates match with one of the BSJ junctions... if yes then that rid represents a BSJ. Also add start and end to the BSJ object
                 continue
             # bigdict[rid].get_start_end()
             # print(bigdict[rid])
-            bsjid=bigdict[rid].get_bsjid()
-            chrom=_bsjid2chrom(bsjid)
+            bsjid = bigdict[rid].get_bsjid()
+            chrom = _bsjid2chrom(bsjid)
             # jid,chrom=_bsjid2jid(bsjid)
             read.set_tag("RG", bsjid, value_type="Z")
-            if bigdict[rid].strand=="+":
+            if bigdict[rid].strand == "+":
                 plusfile.write(read)
-            if bigdict[rid].strand=="-":
+            if bigdict[rid].strand == "-":
                 minusfile.write(read)
             outfile.write(read)
             if lenoutputbams != 0:
-                regionname=_get_regionname_from_seqname(regions,chrom)
+                regionname = _get_regionname_from_seqname(regions, chrom)
                 if regionname in hosts and args.outputhostbams:
                     outputbams[regionname].write(read)
                 if regionname in viruses and args.outputvirusbams:
                     outputbams[regionname].write(read)
             if not bsjid in bsjdict:
-                bsjdict[bsjid]=BSJ()
+                bsjdict[bsjid] = BSJ()
                 bsjdict[bsjid].set_chrom(bigdict[rid].refname)
                 bsjdict[bsjid].set_start(bigdict[rid].start)
                 bsjdict[bsjid].set_end(bigdict[rid].end)
                 bsjdict[bsjid].set_strand(bigdict[rid].strand)
             bsjdict[bsjid].append_bitid(bigdict[rid].bitid)
             if not bigdict[rid].bitid in bitid_counts:
-                bitid_counts[bigdict[rid].bitid]=0
-            bitid_counts[bigdict[rid].bitid]+=1
+                bitid_counts[bigdict[rid].bitid] = 0
+            bitid_counts[bigdict[rid].bitid] += 1
             bsjdict[bsjid].append_rid(rid)
     plusfile.close()
     minusfile.close()
     samfile.close()
     outfile.close()
     if lenoutputbams != 0:
-        for k,v in outputbams.items():
+        for k, v in outputbams.items():
             v.close()
-    print("%s | Done!"%(get_ctime()))	
+    print("%s | Done!" % (get_ctime()))
     for b in bitid_counts.keys():
-        print(b,bitid_counts[b])
-    print("%s | Writing BED"%(get_ctime()))
+        print(b, bitid_counts[b])
+    print("%s | Writing BED" % (get_ctime()))
 
-    with gzip.open(args.bed,'wt') as bsjfile:
+    with gzip.open(args.bed, "wt") as bsjfile:
         for bsjid in bsjdict.keys():
             bsjdict[bsjid].update_score_and_found_count(junctions_found)
             bsjdict[bsjid].write_out_BSJ(bsjfile)
     bsjfile.close()
-        
 
-    args.junctionsfound.write("#chrom\tstart\tend\tstrand\texpected_BSJ_reads\tfound_BSJ_reads\n")
+    args.junctionsfound.write(
+        "#chrom\tstart\tend\tstrand\texpected_BSJ_reads\tfound_BSJ_reads\n"
+    )
     for jid in junctions.keys():
-        x=jid.split("##")
-        chrom=x[0]
-        start=int(x[1])
-        end=int(x[2])+1
-        strand=x[3]
-        args.junctionsfound.write("%s\t%d\t%d\t%s\t%d\t%d\n"%(chrom,start,end,strand,junctions[jid],junctions_found[jid]))
+        x = jid.split("##")
+        chrom = x[0]
+        start = int(x[1])
+        end = int(x[2]) + 1
+        strand = x[3]
+        args.junctionsfound.write(
+            "%s\t%d\t%d\t%s\t%d\t%d\n"
+            % (chrom, start, end, strand, junctions[jid], junctions_found[jid])
+        )
     args.junctionsfound.close()
-    print("%s | ALL Done!"%(get_ctime()))
-            
+    print("%s | ALL Done!" % (get_ctime()))
+
 
 if __name__ == "__main__":
     main()
-
-
-
diff --git a/workflow/scripts/_extract_circExplorer_linear_reads.py b/workflow/scripts/_extract_circExplorer_linear_reads.py
index 0a61854..a9ec842 100755
--- a/workflow/scripts/_extract_circExplorer_linear_reads.py
+++ b/workflow/scripts/_extract_circExplorer_linear_reads.py
@@ -5,49 +5,54 @@
 import pprint
 import time
 
+
 def get_ctime():
     return time.ctime(time.time())
 
+
 pp = pprint.PrettyPrinter(indent=4)
 
 
-def read_regions(regionsfile,host,additives,viruses):
-    host=host.split(",")
-    additives=additives.split(",")
-    viruses=viruses.split(",")
-    infile=open(regionsfile,'r')
-    regions=dict()
+def read_regions(regionsfile, host, additives, viruses):
+    host = host.split(",")
+    additives = additives.split(",")
+    viruses = viruses.split(",")
+    infile = open(regionsfile, "r")
+    regions = dict()
     for l in infile.readlines():
         l = l.strip().split("\t")
-        region_name=l[0]
-        regions[region_name]=dict()
-        regions[region_name]['sequences']=dict()
+        region_name = l[0]
+        regions[region_name] = dict()
+        regions[region_name]["sequences"] = dict()
         if region_name in host:
-            regions[region_name]['host_additive_virus']="host"
+            regions[region_name]["host_additive_virus"] = "host"
         elif region_name in additives:
-            regions[region_name]['host_additive_virus']="additive"
+            regions[region_name]["host_additive_virus"] = "additive"
         elif region_name in viruses:
-            regions[region_name]['host_additive_virus']="virus"
+            regions[region_name]["host_additive_virus"] = "virus"
         else:
             exit("%s has unknown region. Its not a host or a additive or a virus!!")
-        sequence_names=l[1].split()
+        sequence_names = l[1].split()
         for s in sequence_names:
-            regions[region_name]['sequences'][s]=1
-    return regions        
+            regions[region_name]["sequences"][s] = 1
+    return regions
+
 
-def _get_host_additive_virus(regions,seqname):
-    for k,v in regions.items():
-        if seqname in v['sequences']:
-            return v['host_additive_virus']
+def _get_host_additive_virus(regions, seqname):
+    for k, v in regions.items():
+        if seqname in v["sequences"]:
+            return v["host_additive_virus"]
     else:
-        exit("Sequence: %s does not have a region."%(seqname))
+        exit("Sequence: %s does not have a region." % (seqname))
 
-def _get_regionname_from_seqname(regions,seqname):
-    for k,v in regions.items():
-        if seqname in v['sequences']:
+
+def _get_regionname_from_seqname(regions, seqname):
+    for k, v in regions.items():
+        if seqname in v["sequences"]:
             return k
     else:
-        exit("Sequence: %s does not have a region."%(seqname))
+        exit("Sequence: %s does not have a region." % (seqname))
+
 
 def _convertjid(jid):
     jid = jid.split("##")
@@ -57,9 +62,12 @@ def _convertjid(jid):
     strand = jid[3]
     read_strand = jid[4]
     strand_info = "."
-    if strand==read_strand: strand_info="SS"
-    if (strand=="+" and read_strand=="-") or (strand=="-" and read_strand=="+"): strand_info="OS"
-    return "##".join([chrom,start,end,strand,strand_info])
+    if strand == read_strand:
+        strand_info = "SS"
+    if (strand == "+" and read_strand == "-") or (strand == "-" and read_strand == "+"):
+        strand_info = "OS"
+    return "##".join([chrom, start, end, strand, strand_info])
+
 
 def _get_shortjid(jid):
     jid = jid.split("##")
@@ -69,7 +77,8 @@ def _get_shortjid(jid):
     strand = jid[3]
     read_strand = jid[4]
     strand_info = "."
-    return "##".join([chrom,start,end,strand])
+    return "##".join([chrom, start, end, strand])
+
 
 def _get_jinfo(jid):
     jid = jid.split("##")
@@ -79,270 +88,482 @@ def _get_jinfo(jid):
     strand = jid[3]
     read_strand = jid[4]
     strand_info = "."
-    if strand==read_strand: strand_info="SS"
-    if (strand=="+" and read_strand=="-") or (strand=="-" and read_strand=="+"): strand_info="OS"
-    short_jid = "##".join([chrom,start,end,strand])
-    converted_jid = "##".join([chrom,start,end,strand,strand_info])
-    return chrom,start,end,strand_info,short_jid,converted_jid,read_strand    
+    if strand == read_strand:
+        strand_info = "SS"
+    if (strand == "+" and read_strand == "-") or (strand == "-" and read_strand == "+"):
+        strand_info = "OS"
+    short_jid = "##".join([chrom, start, end, strand])
+    converted_jid = "##".join([chrom, start, end, strand, strand_info])
+    return chrom, start, end, strand_info, short_jid, converted_jid, read_strand
+
 
 class JID:
-    def __init__(self,chrom,start,end,strand):
-        self.chrom=chrom
-        self.start=start
-        self.end=end
-        self.strand=strand
-        self.ss_linear_count=0
-        self.os_linear_count=0
-        self.ss_linear_spliced_count=0
-        self.os_linear_spliced_count=0
+    def __init__(self, chrom, start, end, strand):
+        self.chrom = chrom
+        self.start = start
+        self.end = end
+        self.strand = strand
+        self.ss_linear_count = 0
+        self.os_linear_count = 0
+        self.ss_linear_spliced_count = 0
+        self.os_linear_spliced_count = 0
 
-    def increment_linear(self,strand_info):
-        if strand_info=="SS": self.ss_linear_count+=1
-        if strand_info=="OS": self.os_linear_count+=1
+    def increment_linear(self, strand_info):
+        if strand_info == "SS":
+            self.ss_linear_count += 1
+        if strand_info == "OS":
+            self.os_linear_count += 1
 
-    def increment_linear_spliced(self,strand_info):
-        if strand_info=="SS": self.ss_linear_spliced_count+=1
-        if strand_info=="OS": self.os_linear_spliced_count+=1
+    def increment_linear_spliced(self, strand_info):
+        if strand_info == "SS":
+            self.ss_linear_spliced_count += 1
+        if strand_info == "OS":
+            self.os_linear_spliced_count += 1
 
 
 def main():
     # debug = True
     debug = False
-    parser = argparse.ArgumentParser(
-    )
+    parser = argparse.ArgumentParser()
     # INPUTs
-    parser.add_argument("-i","--inbam",dest="inbam",required=True,type=str,
-        help="Input BAM file")
-    parser.add_argument('-r',"--rid2jid",dest="rid2jid",required=True,type=str,
-        help="readID to junctionID lookup")
-    parser.add_argument('-t','--sample_counts_table', dest='countstable', type=str, required=True,
-        help='circExplore per-sample counts table')	# get coordinates of the circRNA
-    parser.add_argument("-s",'--sample_name', dest='samplename', type=str, required=False, default = 'sample1',
-        help='Sample Name: SM for RG')
-    parser.add_argument('-p',"--pe",dest="pe",required=False,action='store_true', default=False,
-        help="set this if BAM is paired end")
-    parser.add_argument("-l",'--library', dest='library', type=str, required=False, default = 'lib1',
-        help='Sample Name: LB for RG')
-    parser.add_argument("-f",'--platform', dest='platform', type=str, required=False, default = 'illumina',
-        help='Sample Name: PL for RG')
-    parser.add_argument("-u",'--unit', dest='unit', type=str, required=False, default = 'unit1',
-        help='Sample Name: PU for RG')
-    parser.add_argument('--regions', dest='regions', type=str, required=True,
-        help='regions file eg. ref.fa.regions')
-    parser.add_argument('--host', dest='host', type=str, required=True,
-        help='host name eg.hg38... single value')
-    parser.add_argument('--additives', dest='additives', type=str, required=True,
-        help='additive name(s) eg.ERCC... comma-separated list... all BSJs in this region are filtered out')
-    parser.add_argument('--viruses', dest='viruses', type=str, required=True,
-        help='virus name(s) eg.NC_009333.1... comma-separated list')
+    parser.add_argument(
+        "-i", "--inbam", dest="inbam", required=True, type=str, help="Input BAM file"
+    )
+    parser.add_argument(
+        "-r",
+        "--rid2jid",
+        dest="rid2jid",
+        required=True,
+        type=str,
+        help="readID to junctionID lookup",
+    )
+    parser.add_argument(
+        "-t",
+        "--sample_counts_table",
+        dest="countstable",
+        type=str,
+        required=True,
+        help="circExplore per-sample counts table",
+    )  # get coordinates of the circRNA
+    parser.add_argument(
+        "-s",
+        "--sample_name",
+        dest="samplename",
+        type=str,
+        required=False,
+        default="sample1",
+        help="Sample Name: SM for RG",
+    )
+    parser.add_argument(
+        "-p",
+        "--pe",
+        dest="pe",
+        required=False,
+        action="store_true",
+        default=False,
+        help="set this if BAM is paired end",
+    )
+    parser.add_argument(
+        "-l",
+        "--library",
+        dest="library",
+        type=str,
+        required=False,
+        default="lib1",
+        help="Sample Name: LB for RG",
+    )
+    parser.add_argument(
+        "-f",
+        "--platform",
+        dest="platform",
+        type=str,
+        required=False,
+        default="illumina",
+        help="Sample Name: PL for RG",
+    )
+    parser.add_argument(
+        "-u",
+        "--unit",
+        dest="unit",
+        type=str,
+        required=False,
+        default="unit1",
+        help="Sample Name: PU for RG",
+    )
+    parser.add_argument(
+        "--regions",
+        dest="regions",
+        type=str,
+        required=True,
+        help="regions file eg. ref.fa.regions",
+    )
+    parser.add_argument(
+        "--host",
+        dest="host",
+        type=str,
+        required=True,
+        help="host name eg.hg38... single value",
+    )
+    parser.add_argument(
+        "--additives",
+        dest="additives",
+        type=str,
+        required=True,
+        help="additive name(s) eg.ERCC... comma-separated list... all BSJs in this region are filtered out",
+    )
+    parser.add_argument(
+        "--viruses",
+        dest="viruses",
+        type=str,
+        required=True,
+        help="virus name(s) eg.NC_009333.1... comma-separated list",
+    )
     # OUTPUTs
-    parser.add_argument("-o","--outbam",dest="outbam",required=True,type=str,
-        help="Output \"primary alignment near BSJ\" only BAM file")
-    parser.add_argument("--outplusbam",dest="outplusbam",required=True,type=str,
-        help="Output \"primary alignment near BSJ\" only plus strand BAM file")
-    parser.add_argument("--outminusbam",dest="outminusbam",required=True,type=str,
-        help="Output \"primary alignment near BSJ\" only minus strand BAM file")
-    parser.add_argument("--splicedbam",dest="splicedbam",required=True,type=str,
-        help="Output \"primary spliced alignment\" only BAM file")
-    parser.add_argument("--splicedbsjbam",dest="splicedbsjbam",required=True,type=str,
-        help="Output \"primary spliced alignment near BSJ\" only BAM file")
-    parser.add_argument("--splicedbsjplusbam",dest="splicedbsjplusbam",required=True,type=str,
-        help="Output \"primary spliced alignment near BSJ\" only plus strand BAM file")
-    parser.add_argument("--splicedbsjminusbam",dest="splicedbsjminusbam",required=True,type=str,
-        help="Output \"primary spliced alignment near BSJ\" only minus strand BAM file")
-    parser.add_argument("--outputhostbams",dest="outputhostbams",required=False,action='store_true', default=False,
-        help="Output individual host BAM files")
-    parser.add_argument("--outputvirusbams",dest="outputvirusbams",required=False,action='store_true', default=False,
-        help="Output individual virus BAM files")
-    parser.add_argument("--outdir",dest="outdir",required=False,type=str,
-        help="Output folder for the individual BAM files (required only if --outputhostbams or --outputvirusbams is used).")
-    parser.add_argument("-c","--countsfound",dest="countsfound",required=True,type=argparse.FileType('w', encoding='UTF-8'),
-        help="Output TSV file with counts of junctions found")
-
+    parser.add_argument(
+        "-o",
+        "--outbam",
+        dest="outbam",
+        required=True,
+        type=str,
+        help='Output "primary alignment near BSJ" only BAM file',
+    )
+    parser.add_argument(
+        "--outplusbam",
+        dest="outplusbam",
+        required=True,
+        type=str,
+        help='Output "primary alignment near BSJ" only plus strand BAM file',
+    )
+    parser.add_argument(
+        "--outminusbam",
+        dest="outminusbam",
+        required=True,
+        type=str,
+        help='Output "primary alignment near BSJ" only minus strand BAM file',
+    )
+    parser.add_argument(
+        "--splicedbam",
+        dest="splicedbam",
+        required=True,
+        type=str,
+        help='Output "primary spliced alignment" only BAM file',
+    )
+    parser.add_argument(
+        "--splicedbsjbam",
+        dest="splicedbsjbam",
+        required=True,
+        type=str,
+        help='Output "primary spliced alignment near BSJ" only BAM file',
+    )
+    parser.add_argument(
+        "--splicedbsjplusbam",
+        dest="splicedbsjplusbam",
+        required=True,
+        type=str,
+        help='Output "primary spliced alignment near BSJ" only plus strand BAM file',
+    )
+    parser.add_argument(
+        "--splicedbsjminusbam",
+        dest="splicedbsjminusbam",
+        required=True,
+        type=str,
+        help='Output "primary spliced alignment near BSJ" only minus strand BAM file',
+    )
+    parser.add_argument(
+        "--outputhostbams",
+        dest="outputhostbams",
+        required=False,
+        action="store_true",
+        default=False,
+        help="Output individual host BAM files",
+    )
+    parser.add_argument(
+        "--outputvirusbams",
+        dest="outputvirusbams",
+        required=False,
+        action="store_true",
+        default=False,
+        help="Output individual virus BAM files",
+    )
+    parser.add_argument(
+        "--outdir",
+        dest="outdir",
+        required=False,
+        type=str,
+        help="Output folder for the individual BAM files (required only if --outputhostbams or --outputvirusbams is used).",
+    )
+    parser.add_argument(
+        "-c",
+        "--countsfound",
+        dest="countsfound",
+        required=True,
+        type=argparse.FileType("w", encoding="UTF-8"),
+        help="Output TSV file with counts of junctions found",
+    )
 
     args = parser.parse_args()
-    print("%s | Reading...rid2jid!..."%(get_ctime()))
+    print("%s | Reading...rid2jid!..." % (get_ctime()))
     rid2jid = dict()
-    with gzip.open(args.rid2jid,'rt') as tfile:
+    with gzip.open(args.rid2jid, "rt") as tfile:
         for l in tfile:
-            l=l.strip().split("\t")
-            rid2jid[l[0]]=l[1]
+            l = l.strip().split("\t")
+            rid2jid[l[0]] = l[1]
     tfile.close()
-    print("%s | Done reading...%d rid2jid's!"%(get_ctime(),len(rid2jid)))
+    print("%s | Done reading...%d rid2jid's!" % (get_ctime(), len(rid2jid)))
 
     samfile = pysam.AlignmentFile(args.inbam, "rb")
     samheader = samfile.header.to_dict()
-    samheader['RG']=list()
-    junctionsfile = open(args.countstable,'r')
-    print("%s | Reading...junctions!..."%(get_ctime()))
-    count=0
-    junction_counts=dict()
+    samheader["RG"] = list()
+    junctionsfile = open(args.countstable, "r")
+    print("%s | Reading...junctions!..." % (get_ctime()))
+    count = 0
+    junction_counts = dict()
     # splicedbsjjid=dict()
     for l in junctionsfile.readlines():
-        count+=1
-        if "read_count" in l: continue
+        count += 1
+        if "read_count" in l:
+            continue
         l = l.strip().split("\t")
         chrom = l[0]
         start = l[1]
-        end = str(int(l[2])-1)
+        end = str(int(l[2]) - 1)
         strand = l[3]
-        short_jid  = chrom+"##"+start+"##"+end+"##"+strand        # create a unique junction ID for each line in the BSJ junction file and make it the dict key ... easy for searching!
-        jid1 = short_jid+"##SS"                                   # SS=sample strand ... called BSJ and read are on the same strand
-        jid2 = short_jid+"##OS"                                   # OS=opposite strand ... called BSJ and read are on opposite strands
-        samheader['RG'].append({'ID':jid1 ,  'LB':args.library, 'PL':args.platform, 'PU':args.unit,'SM':args.samplename})
-        samheader['RG'].append({'ID':jid2 ,  'LB':args.library, 'PL':args.platform, 'PU':args.unit,'SM':args.samplename})
+        short_jid = (
+            chrom + "##" + start + "##" + end + "##" + strand
+        )  # create a unique junction ID for each line in the BSJ junction file and make it the dict key ... easy for searching!
+        jid1 = (
+            short_jid + "##SS"
+        )  # SS=sample strand ... called BSJ and read are on the same strand
+        jid2 = (
+            short_jid + "##OS"
+        )  # OS=opposite strand ... called BSJ and read are on opposite strands
+        samheader["RG"].append(
+            {
+                "ID": jid1,
+                "LB": args.library,
+                "PL": args.platform,
+                "PU": args.unit,
+                "SM": args.samplename,
+            }
+        )
+        samheader["RG"].append(
+            {
+                "ID": jid2,
+                "LB": args.library,
+                "PL": args.platform,
+                "PU": args.unit,
+                "SM": args.samplename,
+            }
+        )
         # print(short_jid)
-        junction_counts[short_jid] = JID(chrom,start,end,strand)
+        junction_counts[short_jid] = JID(chrom, start, end, strand)
         # splicedbsjjid[jid] = dict()
     junctionsfile.close()
     # exit()
     sequences = list()
-    for v in samheader['SQ']:
-        sequences.append(v['SN'])
-    seqname2regionname=dict()
-    hosts=set()
-    viruses=set()
-    regions = read_regions(regionsfile=args.regions,host=args.host,additives=args.additives,viruses=args.viruses)
+    for v in samheader["SQ"]:
+        sequences.append(v["SN"])
+    seqname2regionname = dict()
+    hosts = set()
+    viruses = set()
+    regions = read_regions(
+        regionsfile=args.regions,
+        host=args.host,
+        additives=args.additives,
+        viruses=args.viruses,
+    )
     for s in sequences:
-        hav = _get_host_additive_virus(regions,s)
+        hav = _get_host_additive_virus(regions, s)
         if hav == "host":
-            hostname = _get_regionname_from_seqname(regions,s)
-            seqname2regionname[s]=hostname
+            hostname = _get_regionname_from_seqname(regions, s)
+            seqname2regionname[s] = hostname
             hosts.add(hostname)
         if hav == "virus":
-            virusname = _get_regionname_from_seqname(regions,s)
-            seqname2regionname[s]=virusname
+            virusname = _get_regionname_from_seqname(regions, s)
+            seqname2regionname[s] = virusname
             viruses.add(virusname)
-    print("%s | Done reading %d junctions."%(get_ctime(),count))
-    
+    print("%s | Done reading %d junctions." % (get_ctime(), count))
+
     outbam = pysam.AlignmentFile(args.outbam, "wb", header=samheader)
     outplusbam = pysam.AlignmentFile(args.outplusbam, "wb", header=samheader)
     outminusbam = pysam.AlignmentFile(args.outminusbam, "wb", header=samheader)
     splicedbam = pysam.AlignmentFile(args.splicedbam, "wb", header=samheader)
     splicedbsjbam = pysam.AlignmentFile(args.splicedbsjbam, "wb", header=samheader)
-    splicedbsjplusbam = pysam.AlignmentFile(args.splicedbsjplusbam, "wb", header=samheader)
-    splicedbsjminusbam = pysam.AlignmentFile(args.splicedbsjminusbam, "wb", header=samheader)
+    splicedbsjplusbam = pysam.AlignmentFile(
+        args.splicedbsjplusbam, "wb", header=samheader
+    )
+    splicedbsjminusbam = pysam.AlignmentFile(
+        args.splicedbsjminusbam, "wb", header=samheader
+    )
     outputbams = dict()
     if args.outputhostbams:
         for h in hosts:
-            outbamname = os.path.join(args.outdir,args.samplename+"."+h+".BSJ.bam")
-            outputbams[h] = pysam.AlignmentFile(outbamname, "wb", header = samheader)
+            outbamname = os.path.join(
+                args.outdir, args.samplename + "." + h + ".BSJ.bam"
+            )
+            outputbams[h] = pysam.AlignmentFile(outbamname, "wb", header=samheader)
     if args.outputvirusbams:
         for v in viruses:
-            outbamname = os.path.join(args.outdir,args.samplename+"."+v+".BSJ.bam")
-            outputbams[v] = pysam.AlignmentFile(outbamname, "wb", header = samheader)            
+            outbamname = os.path.join(
+                args.outdir, args.samplename + "." + v + ".BSJ.bam"
+            )
+            outputbams[v] = pysam.AlignmentFile(outbamname, "wb", header=samheader)
     lenoutputbams = len(outputbams)
     # pp.pprint(rid2jid)
-    print("%s | Opened output BAMs for writing..."%(get_ctime()))
-    spliced=dict() # 1=spliced
-    splicedbsj=dict()
-    count1=0    # total reads
-    count2=0    # total reads near BSJ
-    count3=0    # total spliced reads
-    count4=0    # total spliced reads near BSJ
+    print("%s | Opened output BAMs for writing..." % (get_ctime()))
+    spliced = dict()  # 1=spliced
+    splicedbsj = dict()
+    count1 = 0  # total reads
+    count2 = 0  # total reads near BSJ
+    count3 = 0  # total spliced reads
+    count4 = 0  # total spliced reads near BSJ
     print("Reading alignments...")
-    mate_already_counted1=dict()
-    mate_already_counted2=dict()
+    mate_already_counted1 = dict()
+    mate_already_counted2 = dict()
     # mate_already_counted3=dict() # not needed as similar to the "spliced" dict
     # mate_already_counted4=dict() # not needed as similar to "spliced" dict have value 2
-    last_printed=-1
+    last_printed = -1
     for read in samfile.fetch():
-        if args.pe and ( read.reference_id != read.next_reference_id ): continue    # only works for PE ... for SE read.next_reference_id is -1
-        if args.pe and ( not read.is_proper_pair ): continue
-        if read.is_secondary or read.is_supplementary or read.is_unmapped : continue
-        rid=read.query_name
-# count read if it has not been counted yet
+        if args.pe and (read.reference_id != read.next_reference_id):
+            continue  # only works for PE ... for SE read.next_reference_id is -1
+        if args.pe and (not read.is_proper_pair):
+            continue
+        if read.is_secondary or read.is_supplementary or read.is_unmapped:
+            continue
+        rid = read.query_name
+        # count read if it has not been counted yet
         if not rid in mate_already_counted1:
-            mate_already_counted1[rid]=1
-            count1+=1
-# find cigar tuple, cigar string and generate a cigar string order
-# if 0 is followed by 3 in the "cigarstringorder" value (can happen more than once in multi-spliced reads)
-# then the read is spliced
-        cigar=read.cigarstring
-        cigart=read.cigartuples
-        cigart=cigart[list(map(lambda z:z[0],cigart)).index(0):]
-        cigarstringorder=""
+            mate_already_counted1[rid] = 1
+            count1 += 1
+        # find cigar tuple, cigar string and generate a cigar string order
+        # if 0 is followed by 3 in the "cigarstringorder" value (can happen more than once in multi-spliced reads)
+        # then the read is spliced
+        cigar = read.cigarstring
+        cigart = read.cigartuples
+        cigart = cigart[list(map(lambda z: z[0], cigart)).index(0) :]
+        cigarstringorder = ""
         for j in range(len(cigart)):
-            cigarstringorder+=str(cigart[j][0])
-# cigarstringorder can be like 034 or 03034 or 03 or 0303
-# check if the rid is already found to be spliced ... if not then check if it is
+            cigarstringorder += str(cigart[j][0])
+        # cigarstringorder can be like 034 or 03034 or 03 or 0303
+        # check if the rid is already found to be spliced ... if not then check if it is
         if not rid in spliced:
-            if "03" in cigarstringorder: # aka read is spliced
-                count3+=1
-                spliced[rid]=1
-# check if the rid exists in the rid2jid lookup table
+            if "03" in cigarstringorder:  # aka read is spliced
+                count3 += 1
+                spliced[rid] = 1
+        # check if the rid exists in the rid2jid lookup table
         if rid in rid2jid:  # does this rid have a corresponding BSJ??
-# if rid is in rid2jid lookuptable and it is not previously counted then count it as "linear" read for that BSJ
+            # if rid is in rid2jid lookuptable and it is not previously counted then count it as "linear" read for that BSJ
             if not rid in mate_already_counted2:
-                mate_already_counted2[rid]=1
-                count2+=1
+                mate_already_counted2[rid] = 1
+                count2 += 1
             jid = rid2jid[rid]
-            chrom, jstart, jend, strand_info, short_jid, converted_jid, read_strand = _get_jinfo(jid)
+            (
+                chrom,
+                jstart,
+                jend,
+                strand_info,
+                short_jid,
+                converted_jid,
+                read_strand,
+            ) = _get_jinfo(jid)
             junction_counts[short_jid].increment_linear(strand_info)
-            read.set_tag("RG", converted_jid , value_type="Z")
+            read.set_tag("RG", converted_jid, value_type="Z")
             outbam.write(read)
-            if read_strand=="+": outplusbam.write(read)
-            if read_strand=="-": outminusbam.write(read)
+            if read_strand == "+":
+                outplusbam.write(read)
+            if read_strand == "-":
+                outminusbam.write(read)
             if lenoutputbams != 0:
-                regionname=_get_regionname_from_seqname(regions,chrom)
+                regionname = _get_regionname_from_seqname(regions, chrom)
                 if regionname in hosts and args.outputhostbams:
                     outputbams[regionname].write(read)
                 if regionname in viruses and args.outputvirusbams:
                     outputbams[regionname].write(read)
-# check if this rid's .. this alignment is spliced!
-# rid could be in spliced but this may be an unspliced mate
+            # check if this rid's .. this alignment is spliced!
+            # rid could be in spliced but this may be an unspliced mate
             if rid in spliced and "03" in cigarstringorder:
                 if not rid in splicedbsj:
-# CIGAR has match ... followed by skip ... aka spliced read
-# find number of splices
-# nsplices is the number of times "03" is found in cigarstringorder
-# if nsplices is gt than 1 then we have to get the coordinates of all the matches and 
-# try to compare each one with the BSJ coordinates
+                    # CIGAR has match ... followed by skip ... aka spliced read
+                    # find number of splices
+                    # nsplices is the number of times "03" is found in cigarstringorder
+                    # if nsplices is gt than 1 then we have to get the coordinates of all the matches and
+                    # try to compare each one with the BSJ coordinates
                     nsplices = cigarstringorder.count("03")
                     if nsplices == 1:
-                        start=int(read.reference_start)+int(cigart[0][1])+1
-                        end=int(start)+int(cigart[1][1])-1
+                        start = int(read.reference_start) + int(cigart[0][1]) + 1
+                        end = int(start) + int(cigart[1][1]) - 1
                         # print(start,end,jstart,jend)
-                        if abs(int(start)-int(jstart))<3 or abs(int(end)-int(jend))<3: # include 2,1,0,-1,-2
-                            junction_counts[short_jid].increment_linear_spliced(strand_info)
-                            splicedbsj[rid]=1  # aka read is spliced and is spliced at BSJ
-                            count4+=1
+                        if (
+                            abs(int(start) - int(jstart)) < 3
+                            or abs(int(end) - int(jend)) < 3
+                        ):  # include 2,1,0,-1,-2
+                            junction_counts[short_jid].increment_linear_spliced(
+                                strand_info
+                            )
+                            splicedbsj[
+                                rid
+                            ] = 1  # aka read is spliced and is spliced at BSJ
+                            count4 += 1
                             # splicedbsjjid[jid][rid]=1
-                    else:   # read has multiple splicing events
-                        for j in range(len(cigart)-1):
-                            if cigart[j][0]==0 and cigart[j+1][0]==3:
+                    else:  # read has multiple splicing events
+                        for j in range(len(cigart) - 1):
+                            if cigart[j][0] == 0 and cigart[j + 1][0] == 3:
                                 add_coords = 0
-                                for k in range(j+1):
-                                    add_coords+=int(cigart[k][1])
-                                start=int(read.reference_start)+add_coords+1
-                                end=int(start)+int(cigart[j+1][1])-1
-                                if abs(int(start)-int(jstart))<3 or abs(int(end)-int(jend))<3: # include 2,1,0,-1,-2
-                                    junction_counts[short_jid].increment_linear_spliced(strand_info)
-                                    splicedbsj[rid]=1  # aka read is spliced and is spliced at BSJ
-                                    count4+=1
+                                for k in range(j + 1):
+                                    add_coords += int(cigart[k][1])
+                                start = int(read.reference_start) + add_coords + 1
+                                end = int(start) + int(cigart[j + 1][1]) - 1
+                                if (
+                                    abs(int(start) - int(jstart)) < 3
+                                    or abs(int(end) - int(jend)) < 3
+                                ):  # include 2,1,0,-1,-2
+                                    junction_counts[short_jid].increment_linear_spliced(
+                                        strand_info
+                                    )
+                                    splicedbsj[
+                                        rid
+                                    ] = 1  # aka read is spliced and is spliced at BSJ
+                                    count4 += 1
                                     # splicedbsjjid[jid][rid]=1
                                     break
-        if (count1%100000==0) and (last_printed!=count1):
-            last_printed=count1
-            print("%s | ...Processed %d reads/readpairs (%d  were spliced! %d linear around BSJ! %d spliced at BSJ)"%(get_ctime(),count1,len(spliced),count2,len(splicedbsj)))
-    print("%s | Done processing alignments: %d reads/readpairs (%d  were spliced! %d linear around BSJ! %d spliced at BSJ)"%(get_ctime(),count1,len(spliced),count2,len(splicedbsj)))
+        if (count1 % 100000 == 0) and (last_printed != count1):
+            last_printed = count1
+            print(
+                "%s | ...Processed %d reads/readpairs (%d  were spliced! %d linear around BSJ! %d spliced at BSJ)"
+                % (get_ctime(), count1, len(spliced), count2, len(splicedbsj))
+            )
+    print(
+        "%s | Done processing alignments: %d reads/readpairs (%d  were spliced! %d linear around BSJ! %d spliced at BSJ)"
+        % (get_ctime(), count1, len(spliced), count2, len(splicedbsj))
+    )
     if lenoutputbams != 0:
-        for k,v in outputbams.items():
+        for k, v in outputbams.items():
             v.close()
     samfile.reset()
-    print("%s | Writing spliced BAMs ..."%(get_ctime()))
+    print("%s | Writing spliced BAMs ..." % (get_ctime()))
 
     for read in samfile.fetch():
         rid = read.query_name
-        if rid in spliced : splicedbam.write(read)
-        if rid in splicedbsj : 
+        if rid in spliced:
+            splicedbam.write(read)
+        if rid in splicedbsj:
             jid = rid2jid[rid]
             # converted_jid = _convertjid(jid)
-            chrom, jstart, jend, strand_info, short_jid, converted_jid, read_strand = _get_jinfo(jid)
-            read.set_tag("RG", converted_jid ,  value_type="Z") 
+            (
+                chrom,
+                jstart,
+                jend,
+                strand_info,
+                short_jid,
+                converted_jid,
+                read_strand,
+            ) = _get_jinfo(jid)
+            read.set_tag("RG", converted_jid, value_type="Z")
             splicedbsjbam.write(read)
-            if read_strand=="+": splicedbsjplusbam.write(read)
-            if read_strand=="-": splicedbsjminusbam.write(read)
+            if read_strand == "+":
+                splicedbsjplusbam.write(read)
+            if read_strand == "-":
+                splicedbsjminusbam.write(read)
 
     samfile.close()
     outbam.close()
@@ -352,21 +573,35 @@ def main():
     splicedbsjbam.close()
     splicedbsjplusbam.close()
     splicedbsjminusbam.close()
-    print("%s | Closing all BAMs"%(get_ctime()))
-    args.countsfound.write("#chrom\tstart\tend\tstrand\tlinear_BSJ_reads_same_strand\tlinear_spliced_BSJ_reads_same_strand\tlinear_BSJ_reads_opposite_strand\tlinear_spliced_BSJ_reads_opposite_strand\n")
+    print("%s | Closing all BAMs" % (get_ctime()))
+    args.countsfound.write(
+        "#chrom\tstart\tend\tstrand\tlinear_BSJ_reads_same_strand\tlinear_spliced_BSJ_reads_same_strand\tlinear_BSJ_reads_opposite_strand\tlinear_spliced_BSJ_reads_opposite_strand\n"
+    )
     for short_jid in junction_counts.keys():
-        chrom=junction_counts[short_jid].chrom
-        start=junction_counts[short_jid].start
-        end=int(junction_counts[short_jid].end)+1
-        strand=junction_counts[short_jid].strand
-        ss_linear_count=junction_counts[short_jid].ss_linear_count
-        ss_linear_spliced_count=junction_counts[short_jid].ss_linear_spliced_count
-        os_linear_count=junction_counts[short_jid].os_linear_count
-        os_linear_spliced_count=junction_counts[short_jid].os_linear_spliced_count
-        args.countsfound.write("%s\t%s\t%s\t%s\t%d\t%d\t%d\t%d\n"%(chrom,str(start),str(end),strand,ss_linear_count,ss_linear_spliced_count,os_linear_count,os_linear_spliced_count))
+        chrom = junction_counts[short_jid].chrom
+        start = junction_counts[short_jid].start
+        end = int(junction_counts[short_jid].end) + 1
+        strand = junction_counts[short_jid].strand
+        ss_linear_count = junction_counts[short_jid].ss_linear_count
+        ss_linear_spliced_count = junction_counts[short_jid].ss_linear_spliced_count
+        os_linear_count = junction_counts[short_jid].os_linear_count
+        os_linear_spliced_count = junction_counts[short_jid].os_linear_spliced_count
+        args.countsfound.write(
+            "%s\t%s\t%s\t%s\t%d\t%d\t%d\t%d\n"
+            % (
+                chrom,
+                str(start),
+                str(end),
+                strand,
+                ss_linear_count,
+                ss_linear_spliced_count,
+                os_linear_count,
+                os_linear_spliced_count,
+            )
+        )
     args.countsfound.close()
-    print("%s | DONE!!"%(get_ctime()))
+    print("%s | DONE!!" % (get_ctime()))
 
 
 if __name__ == "__main__":
-    main()
\ No newline at end of file
+    main()
diff --git a/workflow/scripts/_filter_linear_spliced_readids_w_rid2jid.py b/workflow/scripts/_filter_linear_spliced_readids_w_rid2jid.py
index 7bbace0..f314ce7 100755
--- a/workflow/scripts/_filter_linear_spliced_readids_w_rid2jid.py
+++ b/workflow/scripts/_filter_linear_spliced_readids_w_rid2jid.py
@@ -3,97 +3,135 @@
 import argparse
 import gzip
 
+
 def main():
     # debug = True
     debug = False
     parser = argparse.ArgumentParser(
-    description=""" Filter read list to only include those that are part of the rid2jid lookup!
-    """    
+        description=""" Filter read list to only include those that are part of the rid2jid lookup!
+    """
     )
     # INPUTs
-    parser.add_argument("--linearin",dest="linearin",required=True,type=str,
-        help="gzip-ed input linear readid list")
-    parser.add_argument("--splicedin",dest="splicedin",required=True,type=str,
-        help="gzip-ed input splicedin readid list")
-    parser.add_argument('-r',"--rid2jid",dest="rid2jid",required=True,type=str,
-        help="gzip-ed rid2jid lookup")
+    parser.add_argument(
+        "--linearin",
+        dest="linearin",
+        required=True,
+        type=str,
+        help="gzip-ed input linear readid list",
+    )
+    parser.add_argument(
+        "--splicedin",
+        dest="splicedin",
+        required=True,
+        type=str,
+        help="gzip-ed input splicedin readid list",
+    )
+    parser.add_argument(
+        "-r",
+        "--rid2jid",
+        dest="rid2jid",
+        required=True,
+        type=str,
+        help="gzip-ed rid2jid lookup",
+    )
     # OUTPUTs
 
-    parser.add_argument("--linearout",dest="linearout",required=True,type=str,
-        help="gzip-ed output linear readid list")
-    parser.add_argument("--splicedout",dest="splicedout",required=True,type=str,
-        help="gzip-ed output linear readid list")
-    parser.add_argument("--jidcounts",dest="jidcounts",required=True,type=str,
-        help="gzip-ed output linear readid list")
+    parser.add_argument(
+        "--linearout",
+        dest="linearout",
+        required=True,
+        type=str,
+        help="gzip-ed output linear readid list",
+    )
+    parser.add_argument(
+        "--splicedout",
+        dest="splicedout",
+        required=True,
+        type=str,
+        help="gzip-ed output linear readid list",
+    )
+    parser.add_argument(
+        "--jidcounts",
+        dest="jidcounts",
+        required=True,
+        type=str,
+        help="gzip-ed output linear readid list",
+    )
 
     args = parser.parse_args()
-# SRR5762377.10004802##-	NC_001806.2##88486##88645##+
-# SRR5762377.10008194##+	chrM##1031##1445##+
-# SRR5762377.10010198##+	chr45S##8599##9010##+
+    # SRR5762377.10004802##-	NC_001806.2##88486##88645##+
+    # SRR5762377.10008194##+	chrM##1031##1445##+
+    # SRR5762377.10010198##+	chr45S##8599##9010##+
     linridlist = dict()
     sinridlist = dict()
-    with gzip.open(args.linearin,'rt') as inrl:
+    with gzip.open(args.linearin, "rt") as inrl:
         for r in inrl:
-            r=r.strip()
-            linridlist[r]=1
+            r = r.strip()
+            linridlist[r] = 1
     inrl.close()
-    with gzip.open(args.splicedin,'rt') as inrl:
+    with gzip.open(args.splicedin, "rt") as inrl:
         for r in inrl:
-            r=r.strip()
-            sinridlist[r]=1
+            r = r.strip()
+            sinridlist[r] = 1
     inrl.close()
-    scount=dict()
-    lcount=dict()
-    with gzip.open(args.rid2jid,'rt') as rid2jid:
+    scount = dict()
+    lcount = dict()
+    with gzip.open(args.rid2jid, "rt") as rid2jid:
         for l in rid2jid:
-            l=l.strip().split("\t")
-            jid=l[1]
-            if jid==".":
-                print(">>>>>>>>jid is dot:",l)
+            l = l.strip().split("\t")
+            jid = l[1]
+            if jid == ".":
+                print(">>>>>>>>jid is dot:", l)
             # jchr,jstart,jend,jstrand=jid.split("##")
             # jid2="##".join([jchr,jstart,jend])
-            jid2=jid
+            jid2 = jid
             if not jid2 in scount:
-                scount[jid2]=dict()
-                lcount[jid2]=dict()
-                scount[jid2]["+"]=0
-                scount[jid2]["-"]=0
-                scount[jid2]["."]=0
-                lcount[jid2]["+"]=0
-                lcount[jid2]["-"]=0
-                lcount[jid2]["."]=0
+                scount[jid2] = dict()
+                lcount[jid2] = dict()
+                scount[jid2]["+"] = 0
+                scount[jid2]["-"] = 0
+                scount[jid2]["."] = 0
+                lcount[jid2]["+"] = 0
+                lcount[jid2]["-"] = 0
+                lcount[jid2]["."] = 0
             if "##" in l[0]:
-                rid,rstrand=l[0].split("##")
+                rid, rstrand = l[0].split("##")
             else:
-                rid=l[0]
-                rstrand="."
+                rid = l[0]
+                rstrand = "."
             if rid in linridlist:
-                linridlist[rid]+=1
-                lcount[jid][rstrand]+=1
+                linridlist[rid] += 1
+                lcount[jid][rstrand] += 1
             if rid in sinridlist:
-                sinridlist[rid]+=1
-                scount[jid][rstrand]+=1
+                sinridlist[rid] += 1
+                scount[jid][rstrand] += 1
     rid2jid.close()
-    with gzip.open(args.linearout,'wt') as outrl:
-        for k,v in linridlist.items():
-            if v!=1:
-                outrl.write("%s\n"%k)
+    with gzip.open(args.linearout, "wt") as outrl:
+        for k, v in linridlist.items():
+            if v != 1:
+                outrl.write("%s\n" % k)
     outrl.close()
-    with gzip.open(args.splicedout,'wt') as outrl:
-        for k,v in sinridlist.items():
-            if v!=1:
-                outrl.write("%s\n"%k)
+    with gzip.open(args.splicedout, "wt") as outrl:
+        for k, v in sinridlist.items():
+            if v != 1:
+                outrl.write("%s\n" % k)
     outrl.close()
-    countout=open(args.jidcounts,'w')
+    countout = open(args.jidcounts, "w")
     # countout.write("#chrom\tstart\tend\tlinear_+\tspliced_+\tlinear_-\tspliced_-\tlinear_.\tspliced_.\n")
-    countout.write("#chrom\tstart\tend\tstrand\tlinear_+\tspliced_+\tlinear_-\tspliced_-\tlinear_.\tspliced_.\n")
+    countout.write(
+        "#chrom\tstart\tend\tstrand\tlinear_+\tspliced_+\tlinear_-\tspliced_-\tlinear_.\tspliced_.\n"
+    )
     for k in lcount.keys():
-        v1=lcount[k]
-        v2=scount[k]
-        kstr=k.split("##")
-        k="\t".join(kstr)
-        countout.write("%s\t%d\t%d\t%d\t%d\t%d\t%d\n"%(k,v1["+"],v2["+"],v1["-"],v2["-"],v1["."],v2["."]))
+        v1 = lcount[k]
+        v2 = scount[k]
+        kstr = k.split("##")
+        k = "\t".join(kstr)
+        countout.write(
+            "%s\t%d\t%d\t%d\t%d\t%d\t%d\n"
+            % (k, v1["+"], v2["+"], v1["-"], v2["-"], v1["."], v2["."])
+        )
     countout.close()
 
+
 if __name__ == "__main__":
     main()
diff --git a/workflow/scripts/_make_master_counts_table.py b/workflow/scripts/_make_master_counts_table.py
index 1a81dbf..75328a9 100755
--- a/workflow/scripts/_make_master_counts_table.py
+++ b/workflow/scripts/_make_master_counts_table.py
@@ -1,28 +1,45 @@
 import pandas as pd
 import argparse
 
-def _df_setcol_as_int(df,collist):
+
+def _df_setcol_as_int(df, collist):
     for c in collist:
-        df[[c]]=df[[c]].astype(int)
+        df[[c]] = df[[c]].astype(int)
     return df
 
-def _df_setcol_as_str(df,collist):
+
+def _df_setcol_as_str(df, collist):
     for c in collist:
-        df[[c]]=df[[c]].astype(str)
+        df[[c]] = df[[c]].astype(str)
     return df
 
-def _df_setcol_as_float(df,collist):
+
+def _df_setcol_as_float(df, collist):
     for c in collist:
-        df[[c]]=df[[c]].astype(float)
+        df[[c]] = df[[c]].astype(float)
     return df
 
-def main() :
-    parser = argparse.ArgumentParser(description='Make Master Counts Table with circExplorer_BWA fixes')
-    parser.add_argument('--counttablelist', dest='counttablelist', type=str, required=True,
-        help='comma separted list of per sample counts tables to merge')
-    parser.add_argument('--minreads', dest='minreads', type=int, required=False, default=3,
-        help='min read filter')
-    parser.add_argument('-o',dest='outfile',required=True,help='master counts table')
+
+def main():
+    parser = argparse.ArgumentParser(
+        description="Make Master Counts Table with circExplorer_BWA fixes"
+    )
+    parser.add_argument(
+        "--counttablelist",
+        dest="counttablelist",
+        type=str,
+        required=True,
+        help="comma separted list of per sample counts tables to merge",
+    )
+    parser.add_argument(
+        "--minreads",
+        dest="minreads",
+        type=int,
+        required=False,
+        default=3,
+        help="min read filter",
+    )
+    parser.add_argument("-o", dest="outfile", required=True, help="master counts table")
     args = parser.parse_args()
 
     infiles = args.counttablelist
@@ -30,37 +47,37 @@ def main() :
     count = 0
     for f in infiles:
         count += 1
-        if count==1:
-            outdf = pd.read_csv(f,sep="\t",header=0,compression='gzip')
-            outdf.set_index(['chrom', 'start', 'end', 'sample_name'])
+        if count == 1:
+            outdf = pd.read_csv(f, sep="\t", header=0, compression="gzip")
+            outdf.set_index(["chrom", "start", "end", "sample_name"])
         else:
-            tmpdf = pd.read_csv(f,sep="\t",header=0,compression='gzip')
-            tmpdf.set_index(['chrom', 'start', 'end', 'sample_name'])
-            outdf = pd.concat([outdf , tmpdf],axis=0,join="outer",sort=False)
-    outdf.reset_index(drop=True,inplace=True)
-    outdf.fillna(-1,inplace=True)
+            tmpdf = pd.read_csv(f, sep="\t", header=0, compression="gzip")
+            tmpdf.set_index(["chrom", "start", "end", "sample_name"])
+            outdf = pd.concat([outdf, tmpdf], axis=0, join="outer", sort=False)
+    outdf.reset_index(drop=True, inplace=True)
+    outdf.fillna(-1, inplace=True)
     # print(outdf.columns)
-    intcols=['start','end','ntools']
+    intcols = ["start", "end", "ntools"]
     for c in outdf.columns:
         if "count" in c:
             intcols.append(c)
     # print(intcols)
-    strcols=list(set(outdf.columns)-set(intcols))
+    strcols = list(set(outdf.columns) - set(intcols))
     # print(strcols)
-    outdf = _df_setcol_as_int(outdf,intcols)
-    outdf = _df_setcol_as_str(outdf,strcols)
-    outdf = outdf.sort_values(by=['chrom','start','end', 'sample_name'])
-
+    outdf = _df_setcol_as_int(outdf, intcols)
+    outdf = _df_setcol_as_str(outdf, strcols)
+    outdf = outdf.sort_values(by=["chrom", "start", "end", "sample_name"])
 
-    intcols=['start','end','ntools']
+    intcols = ["start", "end", "ntools"]
     for c in outdf.columns:
         if "count" in c:
             intcols.append(c)
-    strcols=list(set(outdf.columns)-set(intcols))
-    outdf = _df_setcol_as_int(outdf,intcols)
-    outdf = _df_setcol_as_str(outdf,strcols)
-    outdf = outdf.sort_values(by=['chrom','start','end','sample_name'])
-    outdf.to_csv(args.outfile,sep="\t",header=True,index=False,compression='gzip')
+    strcols = list(set(outdf.columns) - set(intcols))
+    outdf = _df_setcol_as_int(outdf, intcols)
+    outdf = _df_setcol_as_str(outdf, strcols)
+    outdf = outdf.sort_values(by=["chrom", "start", "end", "sample_name"])
+    outdf.to_csv(args.outfile, sep="\t", header=True, index=False, compression="gzip")
+
 
 if __name__ == "__main__":
-    main()
\ No newline at end of file
+    main()
diff --git a/workflow/scripts/_merge_circExplorer_found_counts.py b/workflow/scripts/_merge_circExplorer_found_counts.py
index 245c260..fbac83e 100755
--- a/workflow/scripts/_merge_circExplorer_found_counts.py
+++ b/workflow/scripts/_merge_circExplorer_found_counts.py
@@ -2,40 +2,63 @@
 import sys
 import pandas
 
-def _df_setcol_as_int(df,collist):
+
+def _df_setcol_as_int(df, collist):
     for c in collist:
-        df[[c]]=df[[c]].astype(int)
+        df[[c]] = df[[c]].astype(int)
     return df
 
-def _df_setcol_as_str(df,collist):
+
+def _df_setcol_as_str(df, collist):
     for c in collist:
-        df[[c]]=df[[c]].astype(str)
+        df[[c]] = df[[c]].astype(str)
     return df
 
+
 def main():
     # debug = True
     debug = False
-    parser = argparse.ArgumentParser(
+    parser = argparse.ArgumentParser()
+    parser.add_argument(
+        "-b",
+        "--bsjcounts",
+        dest="bsjcounts",
+        required=True,
+        type=str,
+        help="BSJ counts file",
+    )
+    parser.add_argument(
+        "-l",
+        "--linearcounts",
+        dest="linearcounts",
+        required=True,
+        type=str,
+        help="Linear counts file",
+    )
+    parser.add_argument(
+        "-o",
+        "--mergedcounts",
+        dest="mergedcounts",
+        required=True,
+        type=str,
+        help="merged counts file",
     )
-    parser.add_argument("-b","--bsjcounts",dest="bsjcounts",required=True,type=str,
-        help="BSJ counts file")
-    parser.add_argument("-l","--linearcounts",dest="linearcounts",required=True,type=str,
-        help="Linear counts file")
-    parser.add_argument("-o","--mergedcounts",dest="mergedcounts",required=True,type=str,
-        help="merged counts file")
     args = parser.parse_args()
 
-    bcounts = pandas.read_csv(args.bsjcounts,header=0,sep="\t")
-    lcounts = pandas.read_csv(args.linearcounts,header=0,sep="\t")
+    bcounts = pandas.read_csv(args.bsjcounts, header=0, sep="\t")
+    lcounts = pandas.read_csv(args.linearcounts, header=0, sep="\t")
     print(bcounts.head())
     print(lcounts.head())
-    mcounts = bcounts.merge(lcounts,how='outer',on=["#chrom","start","end","strand"])
-    strcols = [ '#chrom', 'strand' ]
-    intcols = list ( set(mcounts.columns) - set(strcols) )
-    mcounts.fillna(value=0,inplace=True)
-    mcounts = _df_setcol_as_str(mcounts,strcols)
-    mcounts = _df_setcol_as_int(mcounts,intcols)
-    mcounts.to_csv(args.mergedcounts,index=False,doublequote=False,sep="\t")
+    mcounts = bcounts.merge(
+        lcounts, how="outer", on=["#chrom", "start", "end", "strand"]
+    )
+    strcols = ["#chrom", "strand"]
+    intcols = list(set(mcounts.columns) - set(strcols))
+    mcounts.fillna(value=0, inplace=True)
+    mcounts = _df_setcol_as_str(mcounts, strcols)
+    mcounts = _df_setcol_as_int(mcounts, intcols)
+    mcounts.to_csv(args.mergedcounts, index=False, doublequote=False, sep="\t")
+
 
 if __name__ == "__main__":
-    main()
\ No newline at end of file
+    main()
diff --git a/workflow/scripts/_merge_per_sample_counts_table.py b/workflow/scripts/_merge_per_sample_counts_table.py
index 21b5d0f..2c75842 100755
--- a/workflow/scripts/_merge_per_sample_counts_table.py
+++ b/workflow/scripts/_merge_per_sample_counts_table.py
@@ -4,90 +4,163 @@
 import sys
 import gzip
 
-def _df_setcol_as_int(df,collist):
+
+def _df_setcol_as_int(df, collist):
     for c in collist:
-        df[[c]]=df[[c]].astype(int)
+        df[[c]] = df[[c]].astype(int)
     return df
 
-def _df_setcol_as_str(df,collist):
+
+def _df_setcol_as_str(df, collist):
     for c in collist:
-        df[[c]]=df[[c]].astype(str)
+        df[[c]] = df[[c]].astype(str)
     return df
 
-def _df_setcol_as_float(df,collist):
+
+def _df_setcol_as_float(df, collist):
     for c in collist:
-        df[[c]]=df[[c]].astype(float)
+        df[[c]] = df[[c]].astype(float)
     return df
 
+
 def _rev_comp(seq):
     seq = seq.upper()
     seq = seq.replace("A", "t").replace("C", "g").replace("T", "a").replace("G", "c")
     seq = seq.upper()[::-1]
     return seq
 
+
 class BSJ:
-    def __init__(self,chrom,start,end,strand="+"):
-        self.chrom=chrom
-        self.start=int(start)
-        self.end=int(end)
-        self.strand=strand
-        self.splice_site_flank_5="" #donor
-        self.splice_site_flank_3="" #acceptor
-    
-    def add_flanks(self,sequences): # adds flanking assuming + strand
+    def __init__(self, chrom, start, end, strand="+"):
+        self.chrom = chrom
+        self.start = int(start)
+        self.end = int(end)
+        self.strand = strand
+        self.splice_site_flank_5 = ""  # donor
+        self.splice_site_flank_3 = ""  # acceptor
+
+    def add_flanks(self, sequences):  # adds flanking assuming + strand
         coord = int(self.end)
-        seq = sequences[self.chrom][coord:coord+2]
+        seq = sequences[self.chrom][coord : coord + 2]
         self.splice_site_flank_5 = seq.upper()
         coord = int(self.start)
-        seq = sequences[self.chrom][coord-2:coord]
+        seq = sequences[self.chrom][coord - 2 : coord]
         self.splice_site_flank_3 = seq.upper()
-    
-    def get_flanks(self): # returns + and - strand flanks
-        plus_strand = self.splice_site_flank_5+"##"+self.splice_site_flank_3
-        minus_strand = _rev_comp(self.splice_site_flank_5)+"##"+_rev_comp(self.splice_site_flank_3)
-        return plus_strand,minus_strand
-
-def main() :
-    parser = argparse.ArgumentParser(description='Merge per sample Counts from different circRNA detection tools')
-    parser.add_argument('--circExplorer', dest='circE', type=str, required=True,
-        help='circExplorer2 per-sample counts table')
-    parser.add_argument('--circExplorerbwa', dest='circEbwa', type=str, required=True,
-        help='circExplorer2_bwa per-sample counts table')
-    parser.add_argument('--ciri', dest='ciri', type=str, required=True,
-        help='ciri2 per-sample output')
-    parser.add_argument('--findcirc', dest='findcirc', type=str, required=False,
-        help='findcirc per-sample counts table')
-    parser.add_argument('--dcc', dest='dcc', type=str, required=False,
-        help='dcc per-sample counts table')
-    parser.add_argument('--mapsplice', dest='mapsplice', type=str, required=False,
-        help='mapsplice per-sample counts table')
-    parser.add_argument('--nclscan', dest='nclscan', type=str, required=False,
-        help='nclscan per-sample counts table')
-    parser.add_argument('--circrnafinder', dest='circrnafinder', type=str, required=False,
-        help='circrnafinder per-sample counts table')
-    parser.add_argument('--samplename', dest='samplename', type=str, required=True,
-        help='Sample Name')
-    parser.add_argument('--min_read_count_reqd', dest='minreads', type=int, required=False, default=2,
-        help='Read count threshold..circRNA with lower than this number of read support are excluded! (default=2)')
-    parser.add_argument("--reffa",dest="reffa",required=True,type=argparse.FileType('r'),default=sys.stdin,
-        help="reference fasta file")
-    parser.add_argument('--hqcc', dest='hqcc', type=str, required=False, default="circExplorer,circExplorer_bwa",
-        help='Comma separated list of high confidence core callers (default="circExplorer,circExplorer_bwa")')
-    parser.add_argument('--hqccpn', dest='hqccpn', type=int, required=False, default=1,
-        help='Define n:high confidence core callers plus n callers are required to call this circRNA HQ (default 1)')
-    parser.add_argument('-o',dest='outfile',required=True,help='merged table')
+
+    def get_flanks(self):  # returns + and - strand flanks
+        plus_strand = self.splice_site_flank_5 + "##" + self.splice_site_flank_3
+        minus_strand = (
+            _rev_comp(self.splice_site_flank_5)
+            + "##"
+            + _rev_comp(self.splice_site_flank_3)
+        )
+        return plus_strand, minus_strand
+
+
+def main():
+    parser = argparse.ArgumentParser(
+        description="Merge per sample Counts from different circRNA detection tools"
+    )
+    parser.add_argument(
+        "--circExplorer",
+        dest="circE",
+        type=str,
+        required=True,
+        help="circExplorer2 per-sample counts table",
+    )
+    parser.add_argument(
+        "--circExplorerbwa",
+        dest="circEbwa",
+        type=str,
+        required=True,
+        help="circExplorer2_bwa per-sample counts table",
+    )
+    parser.add_argument(
+        "--ciri", dest="ciri", type=str, required=True, help="ciri2 per-sample output"
+    )
+    parser.add_argument(
+        "--findcirc",
+        dest="findcirc",
+        type=str,
+        required=False,
+        help="findcirc per-sample counts table",
+    )
+    parser.add_argument(
+        "--dcc",
+        dest="dcc",
+        type=str,
+        required=False,
+        help="dcc per-sample counts table",
+    )
+    parser.add_argument(
+        "--mapsplice",
+        dest="mapsplice",
+        type=str,
+        required=False,
+        help="mapsplice per-sample counts table",
+    )
+    parser.add_argument(
+        "--nclscan",
+        dest="nclscan",
+        type=str,
+        required=False,
+        help="nclscan per-sample counts table",
+    )
+    parser.add_argument(
+        "--circrnafinder",
+        dest="circrnafinder",
+        type=str,
+        required=False,
+        help="circrnafinder per-sample counts table",
+    )
+    parser.add_argument(
+        "--samplename", dest="samplename", type=str, required=True, help="Sample Name"
+    )
+    parser.add_argument(
+        "--min_read_count_reqd",
+        dest="minreads",
+        type=int,
+        required=False,
+        default=2,
+        help="Read count threshold..circRNA with lower than this number of read support are excluded! (default=2)",
+    )
+    parser.add_argument(
+        "--reffa",
+        dest="reffa",
+        required=True,
+        type=argparse.FileType("r"),
+        default=sys.stdin,
+        help="reference fasta file",
+    )
+    parser.add_argument(
+        "--hqcc",
+        dest="hqcc",
+        type=str,
+        required=False,
+        default="circExplorer,circExplorer_bwa",
+        help='Comma separated list of high confidence core callers (default="circExplorer,circExplorer_bwa")',
+    )
+    parser.add_argument(
+        "--hqccpn",
+        dest="hqccpn",
+        type=int,
+        required=False,
+        default=1,
+        help="Define n:high confidence core callers plus n callers are required to call this circRNA HQ (default 1)",
+    )
+    parser.add_argument("-o", dest="outfile", required=True, help="merged table")
     args = parser.parse_args()
 
-    sn=args.samplename
-    hqcc=args.hqcc
-    hqcc=hqcc.strip().lower().split(",")
-    hqcclen=len(hqcc)
-    required_hqcols=[]
-    not_required_hqcols=[]
-    dfs=[]
+    sn = args.samplename
+    hqcc = args.hqcc
+    hqcc = hqcc.strip().lower().split(",")
+    hqcclen = len(hqcc)
+    required_hqcols = []
+    not_required_hqcols = []
+    dfs = []
 
     # load circExplorer
-    circE=pandas.read_csv(args.circE,sep="\t",header=0)
+    circE = pandas.read_csv(args.circE, sep="\t", header=0)
     print(circE.columns)
     # columns are:
     # | #  | Column                                   |
@@ -105,67 +178,106 @@ def main() :
     # | 11 | spliced_-                  |
     # | 12 | linear_.                    |
     # | 13 | spliced_.                   |
-    circE['circRNA_id']=circE['#chrom'].astype(str)+"##"+circE['start'].astype(str)+"##"+circE['end'].astype(str)
-    circE.rename({'strand'              : 'circExplorer_strand',
-                  'known_novel'         : 'circExplorer_annotation',
-                'expected_BSJ_reads'    : 'circExplorer_read_count',
-                'found_BSJ_reads'       : 'circExplorer_found_BSJcounts',
-                'linear_+'              : 'circExplorer_found_linear_BSJ_+_counts',
-                'spliced_+'             : 'circExplorer_found_linear_spliced_BSJ_+_counts',
-                'linear_-'              : 'circExplorer_found_linear_BSJ_-_counts',
-                'spliced_-'             : 'circExplorer_found_linear_spliced_BSJ_-_counts',
-                'linear_.'              : 'circExplorer_found_linear_BSJ_._counts',
-                'spliced_.'             : 'circExplorer_found_linear_spliced_BSJ_._counts'}, axis=1, inplace=True)
-    circE.drop(['#chrom','start', 'end'], axis = 1,inplace=True)
-    circE.set_index(['circRNA_id'],inplace=True,drop=True)
-    
-    circE.fillna(value=-1,inplace=True)
+    circE["circRNA_id"] = (
+        circE["#chrom"].astype(str)
+        + "##"
+        + circE["start"].astype(str)
+        + "##"
+        + circE["end"].astype(str)
+    )
+    circE.rename(
+        {
+            "strand": "circExplorer_strand",
+            "known_novel": "circExplorer_annotation",
+            "expected_BSJ_reads": "circExplorer_read_count",
+            "found_BSJ_reads": "circExplorer_found_BSJcounts",
+            "linear_+": "circExplorer_found_linear_BSJ_+_counts",
+            "spliced_+": "circExplorer_found_linear_spliced_BSJ_+_counts",
+            "linear_-": "circExplorer_found_linear_BSJ_-_counts",
+            "spliced_-": "circExplorer_found_linear_spliced_BSJ_-_counts",
+            "linear_.": "circExplorer_found_linear_BSJ_._counts",
+            "spliced_.": "circExplorer_found_linear_spliced_BSJ_._counts",
+        },
+        axis=1,
+        inplace=True,
+    )
+    circE.drop(["#chrom", "start", "end"], axis=1, inplace=True)
+    circE.set_index(["circRNA_id"], inplace=True, drop=True)
+
+    circE.fillna(value=-1, inplace=True)
     print(circE.columns)
 
-    intcols = [ 'circExplorer_read_count', 
-                'circExplorer_found_BSJcounts', 
-                'circExplorer_found_linear_BSJ_+_counts', 
-                'circExplorer_found_linear_spliced_BSJ_+_counts', 
-                'circExplorer_found_linear_BSJ_-_counts', 
-                'circExplorer_found_linear_spliced_BSJ_-_counts', 
-                'circExplorer_found_linear_BSJ_._counts', 
-                'circExplorer_found_linear_spliced_BSJ_._counts' ]
-    strcols = list ( set(circE.columns) - set(intcols) )
-    circE = _df_setcol_as_int(circE,intcols)
-    circE = _df_setcol_as_str(circE,strcols)
+    intcols = [
+        "circExplorer_read_count",
+        "circExplorer_found_BSJcounts",
+        "circExplorer_found_linear_BSJ_+_counts",
+        "circExplorer_found_linear_spliced_BSJ_+_counts",
+        "circExplorer_found_linear_BSJ_-_counts",
+        "circExplorer_found_linear_spliced_BSJ_-_counts",
+        "circExplorer_found_linear_BSJ_._counts",
+        "circExplorer_found_linear_spliced_BSJ_._counts",
+    ]
+    strcols = list(set(circE.columns) - set(intcols))
+    circE = _df_setcol_as_int(circE, intcols)
+    circE = _df_setcol_as_str(circE, strcols)
 
     dfs.append(circE)
-    if "circExplorer".lower() in hqcc: 
+    if "circExplorer".lower() in hqcc:
         required_hqcols.append("circExplorer_read_count")
     else:
         not_required_hqcols.append("circExplorer_read_count")
 
-    #chrom  start   end     strand  read_count      known_novel
+    # chrom  start   end     strand  read_count      known_novel
     # circExplorer2 with BWA
 
-    circEbwa=pandas.read_csv(args.circEbwa,sep="\t",header=0)
-    circEbwa['circRNA_id']=circEbwa['#chrom'].astype(str)+"##"+circEbwa['start'].astype(str)+"##"+circEbwa['end'].astype(str)
-    circEbwa.rename({'strand'       : 'circExplorer_bwa_strand',
-                     'known_novel'  : 'circExplorer_bwa_annotation',
-                     'read_count'   : 'circExplorer_bwa_read_count'}, axis=1, inplace=True)
-    circEbwa.drop(['#chrom','start', 'end'], axis = 1,inplace=True)
-    circEbwa.set_index(['circRNA_id'],inplace=True,drop=True)
-    
-    circEbwa.fillna(value=-1,inplace=True)
-
-    intcols = [ 'circExplorer_bwa_read_count' ]
-    strcols = list ( set(circEbwa.columns) - set(intcols) )
-    circEbwa = _df_setcol_as_int(circEbwa,intcols)
-    circEbwa = _df_setcol_as_str(circEbwa,strcols)
+    circEbwa = pandas.read_csv(args.circEbwa, sep="\t", header=0)
+    circEbwa["circRNA_id"] = (
+        circEbwa["#chrom"].astype(str)
+        + "##"
+        + circEbwa["start"].astype(str)
+        + "##"
+        + circEbwa["end"].astype(str)
+    )
+    circEbwa.rename(
+        {
+            "strand": "circExplorer_bwa_strand",
+            "known_novel": "circExplorer_bwa_annotation",
+            "read_count": "circExplorer_bwa_read_count",
+        },
+        axis=1,
+        inplace=True,
+    )
+    circEbwa.drop(["#chrom", "start", "end"], axis=1, inplace=True)
+    circEbwa.set_index(["circRNA_id"], inplace=True, drop=True)
+
+    circEbwa.fillna(value=-1, inplace=True)
+
+    intcols = ["circExplorer_bwa_read_count"]
+    strcols = list(set(circEbwa.columns) - set(intcols))
+    circEbwa = _df_setcol_as_int(circEbwa, intcols)
+    circEbwa = _df_setcol_as_str(circEbwa, strcols)
 
     dfs.append(circEbwa)
-    if "circExplorer_bwa".lower() in hqcc: 
+    if "circExplorer_bwa".lower() in hqcc:
         required_hqcols.append("circExplorer_bwa_read_count")
     else:
         not_required_hqcols.append("circExplorer_bwa_read_count")
 
     # load ciri
-    ciri=pandas.read_csv(args.ciri,sep="\t",header=0,usecols=['chr', 'circRNA_start', 'circRNA_end', '#junction_reads', '#non_junction_reads', 'circRNA_type', 'strand'])
+    ciri = pandas.read_csv(
+        args.ciri,
+        sep="\t",
+        header=0,
+        usecols=[
+            "chr",
+            "circRNA_start",
+            "circRNA_end",
+            "#junction_reads",
+            "#non_junction_reads",
+            "circRNA_type",
+            "strand",
+        ],
+    )
     # columns are:
     # circRNA_ID	chr	circRNA_start	circRNA_end	#junction_reads	SM_MS_SMS	#non_junction_reads	junction_reads_ratio	circRNA_type	gene_id	strand	junction_reads_ID
     # | #  | colName              | Description                                                                                                                                                                                                                                                                                                                                                                                                                                                                                             |
@@ -181,75 +293,97 @@ def main() :
     # | 9  | circRNA_type         | type of a circRNA according to positions of its two ends on chromosome (exon, intron or intergenic_region; only available when annotation file is provided)                                                                                                                                                                                                                                                                                                                                             |
     # | 10 | gene_id              | ID of the gene(s) where an exonic or intronic circRNA locates                                                                                                                                                                                                                                                                                                                                                                                                                                           |
     # | 11 | strand               | strand info of a predicted circRNAs (new in CIRI2)                                                                                                                                                                                                                                                                                                                                                                                                                                                      |
-    # | 12 | junction_reads_ID    | all of the circular junction read IDs (split by ",")   
-    ciri["circRNA_start"]=ciri["circRNA_start"].astype(int)-1
-    ciri['circRNA_id']=ciri['chr'].astype(str)+"##"+ciri['circRNA_start'].astype(str)+"##"+ciri['circRNA_end'].astype(str)
-    ciri.rename({   'strand'                : 'ciri_strand',
-                    '#junction_reads'       : 'ciri_read_count', 
-                    '#non_junction_reads'   : 'ciri_linear_read_count', 
-                    'circRNA_type'          : 'ciri_annotation'}, axis=1, inplace=True)
-    ciri.drop(['chr','circRNA_start', 'circRNA_end'], axis = 1,inplace=True)
-    ciri.set_index(['circRNA_id'],inplace=True,drop=True)
-
-    ciri.fillna(value=-1,inplace=True)
-
-    intcols = [ 'ciri_read_count',
-                'ciri_linear_read_count' ]
-    strcols = list ( set(ciri.columns) - set(intcols) )
-    ciri = _df_setcol_as_int(ciri,intcols)
-    if len(strcols) > 0: ciri = _df_setcol_as_str(ciri,strcols)
-    
+    # | 12 | junction_reads_ID    | all of the circular junction read IDs (split by ",")
+    ciri["circRNA_start"] = ciri["circRNA_start"].astype(int) - 1
+    ciri["circRNA_id"] = (
+        ciri["chr"].astype(str)
+        + "##"
+        + ciri["circRNA_start"].astype(str)
+        + "##"
+        + ciri["circRNA_end"].astype(str)
+    )
+    ciri.rename(
+        {
+            "strand": "ciri_strand",
+            "#junction_reads": "ciri_read_count",
+            "#non_junction_reads": "ciri_linear_read_count",
+            "circRNA_type": "ciri_annotation",
+        },
+        axis=1,
+        inplace=True,
+    )
+    ciri.drop(["chr", "circRNA_start", "circRNA_end"], axis=1, inplace=True)
+    ciri.set_index(["circRNA_id"], inplace=True, drop=True)
+
+    ciri.fillna(value=-1, inplace=True)
+
+    intcols = ["ciri_read_count", "ciri_linear_read_count"]
+    strcols = list(set(ciri.columns) - set(intcols))
+    ciri = _df_setcol_as_int(ciri, intcols)
+    if len(strcols) > 0:
+        ciri = _df_setcol_as_str(ciri, strcols)
+
     dfs.append(ciri)
-    if "ciri".lower() in hqcc: 
+    if "ciri".lower() in hqcc:
         required_hqcols.append("ciri_read_count")
     else:
         not_required_hqcols.append("ciri_read_count")
 
     if args.findcirc:
-        findcirc=pandas.read_csv(args.findcirc,sep="\t",header=0)
+        findcirc = pandas.read_csv(args.findcirc, sep="\t", header=0)
         print(findcirc.columns)
-# add find_circ
-# | #  | short_name      | description
-# | -- | --------------- | ---------------------------------------------------------------------------------------------------------------- |
-# | 1  | chrom           | chromosome/contig name                                                                                           |
-# | 2  | start           | left splice site (zero-based)                                                                                    |
-# | 3  | end             | right splice site (zero-based). (Always: end > start. 5' 3' depends on strand)                                   |
-# | 4  | name            | (provisional) running number/name assigned to junction                                                           |
-# | 5  | n_reads         | number of reads supporting the junction (BED 'score')                                                            |
-# | 6  | strand          | genomic strand (+ or -)                                                                                          |
-# | 7  | n_uniq          | number of distinct read sequences supporting the junction                                                        |
-# | 8  | uniq_bridges    | number of reads with both anchors aligning uniquely                                                              |
-# | 9  | best_qual_left  | alignment score margin of the best anchor alignment supporting the left splice junction (max=2 \* anchor_length) |
-# | 10 | best_qual_right | same for the right splice site                                                                                   |
-# | 11 | tissues         | comma-separated, alphabetically sorted list of tissues/samples with this junction                                |
-# | 12 | tiss_counts     | comma-separated list of corresponding read-counts                                                                |
-# | 13 | edits           | number of mismatches in the anchor extension process                                                             |
-# | 14 | anchor_overlap  | number of nucleotides the breakpoint resides within one anchor                                                   |
-# | 15 | breakpoints     | number of alternative ways to break the read with flanking GT/AG                                                 |
-# | 16 | signal          | flanking dinucleotide splice signal (normally GT/AG)                                                             |
-# | 17 | strandmatch     | 'MATCH', 'MISMATCH' or 'NA' for non-stranded analysis                                                            |
-# | 18 | category        | list of keywords describing the junction. Useful for quick grep filtering                                        |
-        findcirc['circRNA_id']=findcirc['chrom'].astype(str)+"##"+findcirc['start'].astype(str)+"##"+findcirc['end'].astype(str)
-        findcirc = findcirc.loc[:, ['circRNA_id', 'n_reads', 'strand']]
-        findcirc.rename({ 'strand'      : 'findcirc_strand',
-                          'n_reads'     : 'findcirc_read_count'}, axis=1, inplace=True)
-        findcirc.set_index(['circRNA_id'],inplace=True,drop=True)
-
-        findcirc.fillna(value=-1,inplace=True)
-
-        intcols = [ 'findcirc_read_count' ]
-        strcols = list ( set(findcirc.columns) - set(intcols) )
-        findcirc = _df_setcol_as_int(findcirc,intcols)
-        if len(strcols) > 0: findcirc = _df_setcol_as_str(findcirc,strcols)    
+        # add find_circ
+        # | #  | short_name      | description
+        # | -- | --------------- | ---------------------------------------------------------------------------------------------------------------- |
+        # | 1  | chrom           | chromosome/contig name                                                                                           |
+        # | 2  | start           | left splice site (zero-based)                                                                                    |
+        # | 3  | end             | right splice site (zero-based). (Always: end > start. 5' 3' depends on strand)                                   |
+        # | 4  | name            | (provisional) running number/name assigned to junction                                                           |
+        # | 5  | n_reads         | number of reads supporting the junction (BED 'score')                                                            |
+        # | 6  | strand          | genomic strand (+ or -)                                                                                          |
+        # | 7  | n_uniq          | number of distinct read sequences supporting the junction                                                        |
+        # | 8  | uniq_bridges    | number of reads with both anchors aligning uniquely                                                              |
+        # | 9  | best_qual_left  | alignment score margin of the best anchor alignment supporting the left splice junction (max=2 \* anchor_length) |
+        # | 10 | best_qual_right | same for the right splice site                                                                                   |
+        # | 11 | tissues         | comma-separated, alphabetically sorted list of tissues/samples with this junction                                |
+        # | 12 | tiss_counts     | comma-separated list of corresponding read-counts                                                                |
+        # | 13 | edits           | number of mismatches in the anchor extension process                                                             |
+        # | 14 | anchor_overlap  | number of nucleotides the breakpoint resides within one anchor                                                   |
+        # | 15 | breakpoints     | number of alternative ways to break the read with flanking GT/AG                                                 |
+        # | 16 | signal          | flanking dinucleotide splice signal (normally GT/AG)                                                             |
+        # | 17 | strandmatch     | 'MATCH', 'MISMATCH' or 'NA' for non-stranded analysis                                                            |
+        # | 18 | category        | list of keywords describing the junction. Useful for quick grep filtering                                        |
+        findcirc["circRNA_id"] = (
+            findcirc["chrom"].astype(str)
+            + "##"
+            + findcirc["start"].astype(str)
+            + "##"
+            + findcirc["end"].astype(str)
+        )
+        findcirc = findcirc.loc[:, ["circRNA_id", "n_reads", "strand"]]
+        findcirc.rename(
+            {"strand": "findcirc_strand", "n_reads": "findcirc_read_count"},
+            axis=1,
+            inplace=True,
+        )
+        findcirc.set_index(["circRNA_id"], inplace=True, drop=True)
+
+        findcirc.fillna(value=-1, inplace=True)
+
+        intcols = ["findcirc_read_count"]
+        strcols = list(set(findcirc.columns) - set(intcols))
+        findcirc = _df_setcol_as_int(findcirc, intcols)
+        if len(strcols) > 0:
+            findcirc = _df_setcol_as_str(findcirc, strcols)
         dfs.append(findcirc)
-        if "findcirc".lower() in hqcc: 
+        if "findcirc".lower() in hqcc:
             required_hqcols.append("findcirc_read_count")
         else:
             not_required_hqcols.append("findcirc_read_count")
 
     # load dcc
     if args.dcc:
-        dcc=pandas.read_csv(args.dcc,sep="\t",header=0)
+        dcc = pandas.read_csv(args.dcc, sep="\t", header=0)
         # output dcc.counts_table.tsv has the following columns:
         # | # | ColName        |
         # |---|----------------|
@@ -260,33 +394,41 @@ def main() :
         # | 5 | read_count     |
         # | 6 | linear_read_count|
         # | 7 | dcc_annotation | --> this is gene##JunctionType##Start-End Region from CircCoordinates file
-        dcc["start"]=dcc["start"].astype(int)-1
-        dcc['circRNA_id']=dcc['chr'].astype(str)+"##"+dcc['start'].astype(str)+"##"+dcc['end'].astype(str)
-        dcc.rename({'strand': 'dcc_strand'}, axis=1, inplace=True)
-        dcc.rename({'read_count': 'dcc_read_count'}, axis=1, inplace=True)
-        dcc.rename({'linear_read_count': 'dcc_linear_read_count'}, axis=1, inplace=True)
-        dcc[['dcc_gene', 'dcc_junction_type', 'dcc_annotation2']] = dcc['dcc_annotation'].apply(lambda x: pandas.Series(str(x).split("##")))
-        dcc.drop(['chr','start', 'end','dcc_annotation'], axis = 1,inplace=True)
-        dcc.rename({'dcc_annotation2': 'dcc_annotation'}, axis=1, inplace=True)
-        dcc.set_index(['circRNA_id'],inplace=True,drop=True)
-
-        dcc.fillna(value=-1,inplace=True)
-
-        intcols = [ 'dcc_read_count',
-                    'dcc_linear_read_count' ]
-        strcols = list ( set(dcc.columns) - set(intcols) )
-        dcc = _df_setcol_as_int(dcc,intcols)
-        if len(strcols) > 0: dcc = _df_setcol_as_str(dcc,strcols)    
+        dcc["start"] = dcc["start"].astype(int) - 1
+        dcc["circRNA_id"] = (
+            dcc["chr"].astype(str)
+            + "##"
+            + dcc["start"].astype(str)
+            + "##"
+            + dcc["end"].astype(str)
+        )
+        dcc.rename({"strand": "dcc_strand"}, axis=1, inplace=True)
+        dcc.rename({"read_count": "dcc_read_count"}, axis=1, inplace=True)
+        dcc.rename({"linear_read_count": "dcc_linear_read_count"}, axis=1, inplace=True)
+        dcc[["dcc_gene", "dcc_junction_type", "dcc_annotation2"]] = dcc[
+            "dcc_annotation"
+        ].apply(lambda x: pandas.Series(str(x).split("##")))
+        dcc.drop(["chr", "start", "end", "dcc_annotation"], axis=1, inplace=True)
+        dcc.rename({"dcc_annotation2": "dcc_annotation"}, axis=1, inplace=True)
+        dcc.set_index(["circRNA_id"], inplace=True, drop=True)
+
+        dcc.fillna(value=-1, inplace=True)
+
+        intcols = ["dcc_read_count", "dcc_linear_read_count"]
+        strcols = list(set(dcc.columns) - set(intcols))
+        dcc = _df_setcol_as_int(dcc, intcols)
+        if len(strcols) > 0:
+            dcc = _df_setcol_as_str(dcc, strcols)
 
         dfs.append(dcc)
-        if "DCC".lower() in hqcc: 
+        if "DCC".lower() in hqcc:
             required_hqcols.append("dcc_read_count")
         else:
             not_required_hqcols.append("dcc_read_count")
 
     # load mapsplice
     if args.mapsplice:
-        mapsplice=pandas.read_csv(args.mapsplice,sep="\t",header=0)
+        mapsplice = pandas.read_csv(args.mapsplice, sep="\t", header=0)
         # output .mapslice.counts_table.tsv has the following columns:
         # | # | ColName              | Eg.              |
         # |---|----------------------|------------------|
@@ -295,35 +437,48 @@ def main() :
         # | 3 | end                  | 1223968          |
         # | 4 | strand               | -                |
         # | 5 | read_count           | 26               |
-        # | 6 | mapsplice_annotation | normal##2.811419 | <--"fusion_type"##"entropy" 
+        # | 6 | mapsplice_annotation | normal##2.811419 | <--"fusion_type"##"entropy"
         # "fusion_type" is either "normal" or "overlapping" ... higher "entropy" values are better!
-        mapsplice["start"]=mapsplice["start"].astype(int)-1
-        mapsplice['circRNA_id']=mapsplice['chrom'].astype(str)+"##"+mapsplice['start'].astype(str)+"##"+mapsplice['end'].astype(str)
-        mapsplice.rename({'strand': 'mapsplice_strand'}, axis=1, inplace=True)
-        mapsplice.rename({'read_count': 'mapsplice_read_count'}, axis=1, inplace=True)
-        mapsplice[['mapsplice_annotation2', 'mapsplice_entropy']] = mapsplice['mapsplice_annotation'].apply(lambda x: pandas.Series(str(x).split("##")))
-        mapsplice.drop(['chrom','start', 'end','mapsplice_annotation'], axis = 1,inplace=True)
-        mapsplice.rename({'mapsplice_annotation2': 'mapsplice_annotation'}, axis=1, inplace=True)
-        mapsplice.set_index(['circRNA_id'],inplace=True,drop=True)
-
-        mapsplice.fillna(value=-1,inplace=True)
-
-        intcols = [ 'mapsplice_read_count' ]
-        mapsplice = _df_setcol_as_int(mapsplice,intcols)
-        floatcols = [ 'mapsplice_entropy' ]
-        mapsplice = _df_setcol_as_float(mapsplice,floatcols)
-        strcols = list ( ( set(mapsplice.columns) - set(intcols) ) - set(floatcols) )
-        if len(strcols) > 0: mapsplice = _df_setcol_as_str(mapsplice,strcols) 
+        mapsplice["start"] = mapsplice["start"].astype(int) - 1
+        mapsplice["circRNA_id"] = (
+            mapsplice["chrom"].astype(str)
+            + "##"
+            + mapsplice["start"].astype(str)
+            + "##"
+            + mapsplice["end"].astype(str)
+        )
+        mapsplice.rename({"strand": "mapsplice_strand"}, axis=1, inplace=True)
+        mapsplice.rename({"read_count": "mapsplice_read_count"}, axis=1, inplace=True)
+        mapsplice[["mapsplice_annotation2", "mapsplice_entropy"]] = mapsplice[
+            "mapsplice_annotation"
+        ].apply(lambda x: pandas.Series(str(x).split("##")))
+        mapsplice.drop(
+            ["chrom", "start", "end", "mapsplice_annotation"], axis=1, inplace=True
+        )
+        mapsplice.rename(
+            {"mapsplice_annotation2": "mapsplice_annotation"}, axis=1, inplace=True
+        )
+        mapsplice.set_index(["circRNA_id"], inplace=True, drop=True)
+
+        mapsplice.fillna(value=-1, inplace=True)
+
+        intcols = ["mapsplice_read_count"]
+        mapsplice = _df_setcol_as_int(mapsplice, intcols)
+        floatcols = ["mapsplice_entropy"]
+        mapsplice = _df_setcol_as_float(mapsplice, floatcols)
+        strcols = list((set(mapsplice.columns) - set(intcols)) - set(floatcols))
+        if len(strcols) > 0:
+            mapsplice = _df_setcol_as_str(mapsplice, strcols)
 
         dfs.append(mapsplice)
-        if "MapSplice".lower() in hqcc: 
+        if "MapSplice".lower() in hqcc:
             required_hqcols.append("mapsplice_read_count")
         else:
             not_required_hqcols.append("mapsplice_read_count")
 
     # load nclscan
     if args.nclscan:
-        nclscan=pandas.read_csv(args.nclscan,sep="\t",header=0)
+        nclscan = pandas.read_csv(args.nclscan, sep="\t", header=0)
         # output nslscan table has the following columns:
         # | # | ColName              | Eg.              |
         # |---|----------------------|------------------|
@@ -333,35 +488,47 @@ def main() :
         # | 4 | strand               | -                |
         # | 5 | read_count           | 26               |
         # | 6 | nclscan_annotation   | 1                | <--1 for intragenic 0 for intergenic
-        includenclscan=True
-        if nclscan.shape[0]==0: includenclscan=False
+        includenclscan = True
+        if nclscan.shape[0] == 0:
+            includenclscan = False
         if includenclscan:
-            nclscan["start"]=nclscan["start"].astype(int)-1
-            nclscan['circRNA_id']=nclscan['chrom'].astype(str)+"##"+nclscan['start'].astype(str)+"##"+nclscan['end'].astype(str)
-            nclscan.rename({'strand': 'nclscan_strand'}, axis=1, inplace=True)
-            nclscan.rename({'read_count': 'nclscan_read_count'}, axis=1, inplace=True)
-            nclscan.drop(['chrom','start', 'end'], axis = 1,inplace=True)
-            nclscan = _df_setcol_as_str(nclscan,['nclscan_annotation'])
-            nclscan.loc[nclscan['nclscan_annotation']=="1", 'nclscan_annotation'] = "Intragenic"
-            nclscan.loc[nclscan['nclscan_annotation']=="0", 'nclscan_annotation'] = "Intergenic"
+            nclscan["start"] = nclscan["start"].astype(int) - 1
+            nclscan["circRNA_id"] = (
+                nclscan["chrom"].astype(str)
+                + "##"
+                + nclscan["start"].astype(str)
+                + "##"
+                + nclscan["end"].astype(str)
+            )
+            nclscan.rename({"strand": "nclscan_strand"}, axis=1, inplace=True)
+            nclscan.rename({"read_count": "nclscan_read_count"}, axis=1, inplace=True)
+            nclscan.drop(["chrom", "start", "end"], axis=1, inplace=True)
+            nclscan = _df_setcol_as_str(nclscan, ["nclscan_annotation"])
+            nclscan.loc[
+                nclscan["nclscan_annotation"] == "1", "nclscan_annotation"
+            ] = "Intragenic"
+            nclscan.loc[
+                nclscan["nclscan_annotation"] == "0", "nclscan_annotation"
+            ] = "Intergenic"
             # nclscan.loc[nclscan['nclscan_annotation']!="0" and nclscan['nclscan_annotation']!="1" , 'nclscan_annotation'] = "Unknown"
-            nclscan.set_index(['circRNA_id'],inplace=True,drop=True)
+            nclscan.set_index(["circRNA_id"], inplace=True, drop=True)
 
-            nclscan.fillna(value=-1,inplace=True)
+            nclscan.fillna(value=-1, inplace=True)
 
-            intcols = [ 'nclscan_read_count' ]
-            strcols = list ( set(nclscan.columns) - set(intcols) )
-            nclscan = _df_setcol_as_int(nclscan,intcols)
-            if len(strcols) > 0: nclscan = _df_setcol_as_str(nclscan,strcols)  
+            intcols = ["nclscan_read_count"]
+            strcols = list(set(nclscan.columns) - set(intcols))
+            nclscan = _df_setcol_as_int(nclscan, intcols)
+            if len(strcols) > 0:
+                nclscan = _df_setcol_as_str(nclscan, strcols)
 
         dfs.append(nclscan)
-        if "NCLscan".lower() in hqcc: 
+        if "NCLscan".lower() in hqcc:
             required_hqcols.append("nclscan_read_count")
         else:
             not_required_hqcols.append("nclscan_read_count")
 
     if args.circrnafinder:
-        circrnafinder=pandas.read_csv(args.circrnafinder,sep="\t",header=0)
+        circrnafinder = pandas.read_csv(args.circrnafinder, sep="\t", header=0)
         # output circrnafinder table has the following columns:
         # | # | ColName              | Eg.              |
         # |---|----------------------|------------------|
@@ -370,21 +537,30 @@ def main() :
         # | 3 | end                  | 1223968          |
         # | 4 | strand               | -                |
         # | 5 | read_count           | 26               |
-        circrnafinder['circRNA_id']=circrnafinder['chr'].astype(str)+"##"+circrnafinder['start'].astype(str)+"##"+circrnafinder['end'].astype(str)
-        circrnafinder.rename({'strand': 'circrnafinder_strand'}, axis=1, inplace=True)
-        circrnafinder.rename({'read_count': 'circrnafinder_read_count'}, axis=1, inplace=True)
-        circrnafinder.drop(['chr','start', 'end'], axis = 1,inplace=True)
-        circrnafinder.set_index(['circRNA_id'],inplace=True,drop=True)
-
-        circrnafinder.fillna(value=-1,inplace=True)
-
-        intcols = [ 'circrnafinder_read_count' ]
-        strcols = list ( set(circrnafinder.columns) - set(intcols) )
-        circrnafinder = _df_setcol_as_int(circrnafinder,intcols)
-        if len(strcols) > 0: circrnafinder = _df_setcol_as_str(circrnafinder,strcols)    
+        circrnafinder["circRNA_id"] = (
+            circrnafinder["chr"].astype(str)
+            + "##"
+            + circrnafinder["start"].astype(str)
+            + "##"
+            + circrnafinder["end"].astype(str)
+        )
+        circrnafinder.rename({"strand": "circrnafinder_strand"}, axis=1, inplace=True)
+        circrnafinder.rename(
+            {"read_count": "circrnafinder_read_count"}, axis=1, inplace=True
+        )
+        circrnafinder.drop(["chr", "start", "end"], axis=1, inplace=True)
+        circrnafinder.set_index(["circRNA_id"], inplace=True, drop=True)
+
+        circrnafinder.fillna(value=-1, inplace=True)
+
+        intcols = ["circrnafinder_read_count"]
+        strcols = list(set(circrnafinder.columns) - set(intcols))
+        circrnafinder = _df_setcol_as_int(circrnafinder, intcols)
+        if len(strcols) > 0:
+            circrnafinder = _df_setcol_as_str(circrnafinder, strcols)
 
         dfs.append(circrnafinder)
-        if "circRNAFinder".lower() in hqcc: 
+        if "circRNAFinder".lower() in hqcc:
             required_hqcols.append("circrnafinder_read_count")
         else:
             not_required_hqcols.append("circrnafinder_read_count")
@@ -392,178 +568,214 @@ def main() :
     # for df in dfs:
     #     print(df.columns)
 
-
     # merged_counts=pandas.concat(dfs,axis=1,join="outer",sort=False)
     # merged_counts['circRNA_id']=merged_counts.index
 
-# above pandas.concat not working as expected
-# giving error
-#   File "/vf/users/Ziegelbauer_lab/Pipelines/circRNA/230406_activeDev_20284a3/workflow/scripts/_merge_per_sample_counts_table.py", line 396, in <module>
-#     main()
-#   File "/vf/users/Ziegelbauer_lab/Pipelines/circRNA/230406_activeDev_20284a3/workflow/scripts/_merge_per_sample_counts_table.py", line 289, in main
-#     merged_counts=pandas.concat(dfs,axis=1,join="outer",sort=False)
-#   File "/usr/local/Anaconda/envs/py3.7/lib/python3.7/site-packages/pandas/util/_decorators.py", line 311, in wrapper
-#     return func(*args, **kwargs)
-#   File "/usr/local/Anaconda/envs/py3.7/lib/python3.7/site-packages/pandas/core/reshape/concat.py", line 307, in concat
-#     return op.get_result()
-#   File "/usr/local/Anaconda/envs/py3.7/lib/python3.7/site-packages/pandas/core/reshape/concat.py", line 528, in get_result
-#     indexers[ax] = obj_labels.get_indexer(new_labels)
-#   File "/usr/local/Anaconda/envs/py3.7/lib/python3.7/site-packages/pandas/core/indexes/base.py", line 3442, in get_indexer
-#     raise InvalidIndexError(self._requires_unique_msg)
-# pandas.errors.InvalidIndexError: Reindexing only valid with uniquely valued Index objects
-# HENCE, replacing concat with this:
-
-    for i,df in enumerate(dfs):
-        if i==0:
-            merged_counts=df
-            merged_counts['circRNA_id']=merged_counts.index
-            merged_counts.reset_index(inplace=True,drop=True)
+    # above pandas.concat not working as expected
+    # giving error
+    #   File "/vf/users/Ziegelbauer_lab/Pipelines/circRNA/230406_activeDev_20284a3/workflow/scripts/_merge_per_sample_counts_table.py", line 396, in <module>
+    #     main()
+    #   File "/vf/users/Ziegelbauer_lab/Pipelines/circRNA/230406_activeDev_20284a3/workflow/scripts/_merge_per_sample_counts_table.py", line 289, in main
+    #     merged_counts=pandas.concat(dfs,axis=1,join="outer",sort=False)
+    #   File "/usr/local/Anaconda/envs/py3.7/lib/python3.7/site-packages/pandas/util/_decorators.py", line 311, in wrapper
+    #     return func(*args, **kwargs)
+    #   File "/usr/local/Anaconda/envs/py3.7/lib/python3.7/site-packages/pandas/core/reshape/concat.py", line 307, in concat
+    #     return op.get_result()
+    #   File "/usr/local/Anaconda/envs/py3.7/lib/python3.7/site-packages/pandas/core/reshape/concat.py", line 528, in get_result
+    #     indexers[ax] = obj_labels.get_indexer(new_labels)
+    #   File "/usr/local/Anaconda/envs/py3.7/lib/python3.7/site-packages/pandas/core/indexes/base.py", line 3442, in get_indexer
+    #     raise InvalidIndexError(self._requires_unique_msg)
+    # pandas.errors.InvalidIndexError: Reindexing only valid with uniquely valued Index objects
+    # HENCE, replacing concat with this:
+
+    for i, df in enumerate(dfs):
+        if i == 0:
+            merged_counts = df
+            merged_counts["circRNA_id"] = merged_counts.index
+            merged_counts.reset_index(inplace=True, drop=True)
         else:
-            df['circRNA_id']=df.index
-            df.reset_index(inplace=True,drop=True)
-            merged_counts=pandas.merge(merged_counts,df,how='outer',on=['circRNA_id'])
-    
+            df["circRNA_id"] = df.index
+            df.reset_index(inplace=True, drop=True)
+            merged_counts = pandas.merge(
+                merged_counts, df, how="outer", on=["circRNA_id"]
+            )
+
     print(merged_counts.columns)
-    
+
     # merged_counts.set_index(['circRNA_id'],inplace=True,drop=True)
 
-    merged_counts.fillna(-1,inplace=True)
-    merged_counts[ 'ntools'] = 0
-    merged_counts[ 'HQ' ] = "N"
-    merged_counts[ 'hqcounts' ] = 0
-    merged_counts[ 'nonhqcounts' ] = 0
+    merged_counts.fillna(-1, inplace=True)
+    merged_counts["ntools"] = 0
+    merged_counts["HQ"] = "N"
+    merged_counts["hqcounts"] = 0
+    merged_counts["nonhqcounts"] = 0
 
-    annotation_cols=['circExplorer_annotation','ciri_annotation']
+    annotation_cols = ["circExplorer_annotation", "ciri_annotation"]
     floatcols = []
-    strand_cols = ['circExplorer_strand','circExplorer_bwa_strand','ciri_strand']
-
-    intcols = [ 'circExplorer_read_count', 
-                'circExplorer_found_BSJcounts', 
-                'circExplorer_found_linear_BSJ_+_counts', 
-                'circExplorer_found_linear_spliced_BSJ_+_counts', 
-                'circExplorer_found_linear_BSJ_-_counts', 
-                'circExplorer_found_linear_spliced_BSJ_-_counts', 
-                'circExplorer_found_linear_BSJ_._counts', 
-                'circExplorer_found_linear_spliced_BSJ_._counts' ]
-
-    intcols.extend([ 'ciri_read_count',
-                'ciri_linear_read_count' ])
-    
-    intcols.extend(['circExplorer_bwa_read_count'])
-    annotation_cols.extend(['circExplorer_bwa_annotation'])
+    strand_cols = ["circExplorer_strand", "circExplorer_bwa_strand", "ciri_strand"]
+
+    intcols = [
+        "circExplorer_read_count",
+        "circExplorer_found_BSJcounts",
+        "circExplorer_found_linear_BSJ_+_counts",
+        "circExplorer_found_linear_spliced_BSJ_+_counts",
+        "circExplorer_found_linear_BSJ_-_counts",
+        "circExplorer_found_linear_spliced_BSJ_-_counts",
+        "circExplorer_found_linear_BSJ_._counts",
+        "circExplorer_found_linear_spliced_BSJ_._counts",
+    ]
+
+    intcols.extend(["ciri_read_count", "ciri_linear_read_count"])
+
+    intcols.extend(["circExplorer_bwa_read_count"])
+    annotation_cols.extend(["circExplorer_bwa_annotation"])
 
     if args.findcirc:
-        intcols.extend(['findcirc_read_count'])
-        strand_cols.append('findcirc_strand')
-    
+        intcols.extend(["findcirc_read_count"])
+        strand_cols.append("findcirc_strand")
+
     if args.dcc:
-        intcols.extend([ 'dcc_read_count',
-                    'dcc_linear_read_count' ])
-        annotation_cols.extend(['dcc_gene','dcc_junction_type','dcc_annotation'])
-        strand_cols.append('dcc_strand')
-    
+        intcols.extend(["dcc_read_count", "dcc_linear_read_count"])
+        annotation_cols.extend(["dcc_gene", "dcc_junction_type", "dcc_annotation"])
+        strand_cols.append("dcc_strand")
+
     if args.mapsplice:
-        intcols.extend([ 'mapsplice_read_count' ])
-        floatcols.extend([ 'mapsplice_entropy' ])
-        annotation_cols.extend(['mapsplice_annotation'])
-        strand_cols.append('mapsplice_strand')
+        intcols.extend(["mapsplice_read_count"])
+        floatcols.extend(["mapsplice_entropy"])
+        annotation_cols.extend(["mapsplice_annotation"])
+        strand_cols.append("mapsplice_strand")
 
     if args.nclscan and includenclscan:
-        intcols.extend([ 'nclscan_read_count' ])
-        annotation_cols.extend(['nclscan_annotation'])
-        strand_cols.append('nclscan_strand')
-    
-    if args.circrnafinder:
-        intcols.extend(['circrnafinder_read_count'])
-        strand_cols.append('circrnafinder_strand')
+        intcols.extend(["nclscan_read_count"])
+        annotation_cols.extend(["nclscan_annotation"])
+        strand_cols.append("nclscan_strand")
 
-    intcols.extend(['ntools'])
-    intcols.extend(['hqcounts','nonhqcounts'])
-    strcols = list ( ( set(merged_counts.columns) - set(intcols) ) - set(floatcols) )
-    strcols.append('HQ')
-    merged_counts = _df_setcol_as_int(merged_counts,intcols)
-    if len(floatcols)>0: merged_counts = _df_setcol_as_float(merged_counts,floatcols)
-    merged_counts = _df_setcol_as_str(merged_counts,strcols)
+    if args.circrnafinder:
+        intcols.extend(["circrnafinder_read_count"])
+        strand_cols.append("circrnafinder_strand")
+
+    intcols.extend(["ntools"])
+    intcols.extend(["hqcounts", "nonhqcounts"])
+    strcols = list((set(merged_counts.columns) - set(intcols)) - set(floatcols))
+    strcols.append("HQ")
+    merged_counts = _df_setcol_as_int(merged_counts, intcols)
+    if len(floatcols) > 0:
+        merged_counts = _df_setcol_as_float(merged_counts, floatcols)
+    merged_counts = _df_setcol_as_str(merged_counts, strcols)
 
     # fix annotations == -1
     for c in annotation_cols:
-        merged_counts.loc[merged_counts[c]=="-1" , c] = "Unknown"
-    
+        merged_counts.loc[merged_counts[c] == "-1", c] = "Unknown"
+
     for c in required_hqcols:
-        merged_counts.loc[merged_counts[c] >= args.minreads, 'hqcounts'] += 1
+        merged_counts.loc[merged_counts[c] >= args.minreads, "hqcounts"] += 1
     for c in not_required_hqcols:
-        merged_counts.loc[merged_counts[c] >= args.minreads, 'nonhqcounts'] += 1
-    
-    merged_counts.loc[merged_counts['hqcounts'] == hqcclen, 'HQ'] = "Y"
-    merged_counts.loc[merged_counts['nonhqcounts'] < args.hqccpn, 'HQ'] = "N"
-
-    merged_counts.loc[merged_counts['circExplorer_read_count'] >= args.minreads, 'ntools'] += 1
-    merged_counts.loc[merged_counts['ciri_read_count'] >= args.minreads, 'ntools'] += 1
-    merged_counts.loc[merged_counts['circExplorer_bwa_read_count'] >= args.minreads, 'ntools'] += 1
-    if args.findcirc: merged_counts.loc[merged_counts['findcirc_read_count'] >= args.minreads, 'ntools'] += 1
-    if args.dcc: merged_counts.loc[merged_counts['dcc_read_count'] >= args.minreads, 'ntools'] += 1
-    if args.mapsplice: merged_counts.loc[merged_counts['mapsplice_read_count'] >= args.minreads, 'ntools'] += 1
-    if args.nclscan and includenclscan: merged_counts.loc[merged_counts['nclscan_read_count'] >= args.minreads, 'ntools'] += 1
-    if args.circrnafinder: merged_counts.loc[merged_counts['circrnafinder_read_count'] >= args.minreads, 'ntools'] += 1
-    merged_counts[['chrom', 'start', 'end']] = merged_counts['circRNA_id'].str.split('##', expand=True)
- 
-    merged_counts=_df_setcol_as_int(merged_counts,['start','end','ntools'])
-    merged_counts=_df_setcol_as_str(merged_counts,['chrom'])
+        merged_counts.loc[merged_counts[c] >= args.minreads, "nonhqcounts"] += 1
+
+    merged_counts.loc[merged_counts["hqcounts"] == hqcclen, "HQ"] = "Y"
+    merged_counts.loc[merged_counts["nonhqcounts"] < args.hqccpn, "HQ"] = "N"
+
+    merged_counts.loc[
+        merged_counts["circExplorer_read_count"] >= args.minreads, "ntools"
+    ] += 1
+    merged_counts.loc[merged_counts["ciri_read_count"] >= args.minreads, "ntools"] += 1
+    merged_counts.loc[
+        merged_counts["circExplorer_bwa_read_count"] >= args.minreads, "ntools"
+    ] += 1
+    if args.findcirc:
+        merged_counts.loc[
+            merged_counts["findcirc_read_count"] >= args.minreads, "ntools"
+        ] += 1
+    if args.dcc:
+        merged_counts.loc[
+            merged_counts["dcc_read_count"] >= args.minreads, "ntools"
+        ] += 1
+    if args.mapsplice:
+        merged_counts.loc[
+            merged_counts["mapsplice_read_count"] >= args.minreads, "ntools"
+        ] += 1
+    if args.nclscan and includenclscan:
+        merged_counts.loc[
+            merged_counts["nclscan_read_count"] >= args.minreads, "ntools"
+        ] += 1
+    if args.circrnafinder:
+        merged_counts.loc[
+            merged_counts["circrnafinder_read_count"] >= args.minreads, "ntools"
+        ] += 1
+    merged_counts[["chrom", "start", "end"]] = merged_counts["circRNA_id"].str.split(
+        "##", expand=True
+    )
+
+    merged_counts = _df_setcol_as_int(merged_counts, ["start", "end", "ntools"])
+    merged_counts = _df_setcol_as_str(merged_counts, ["chrom"])
 
     # adding flanking sites
-    merged_counts['flanking_sites_+']="-1"
-    merged_counts['flanking_sites_-']="-1"
+    merged_counts["flanking_sites_+"] = "-1"
+    merged_counts["flanking_sites_-"] = "-1"
 
-    sequences = dict((s[1], s[0]) for s in HTSeq.FastaReader(args.reffa, raw_iterator=True))
+    sequences = dict(
+        (s[1], s[0]) for s in HTSeq.FastaReader(args.reffa, raw_iterator=True)
+    )
     for index, row in merged_counts.iterrows():
-        bsj = BSJ(chrom=row['chrom'],start=row['start'],end=row['end'])
+        bsj = BSJ(chrom=row["chrom"], start=row["start"], end=row["end"])
         bsj.add_flanks(sequences)
         plus_flank, minus_flank = bsj.get_flanks()
-        merged_counts.loc[index, 'flanking_sites_+'] = plus_flank
-        merged_counts.loc[index, 'flanking_sites_-'] = minus_flank
+        merged_counts.loc[index, "flanking_sites_+"] = plus_flank
+        merged_counts.loc[index, "flanking_sites_-"] = minus_flank
 
     # add samplename
-    merged_counts['sample_name'] = args.samplename
-    merged_counts=_df_setcol_as_str(merged_counts,['sample_name','flanking_sites_+','flanking_sites_-'])
+    merged_counts["sample_name"] = args.samplename
+    merged_counts = _df_setcol_as_str(
+        merged_counts, ["sample_name", "flanking_sites_+", "flanking_sites_-"]
+    )
     print(merged_counts.columns)
 
     # prepare output ... reorder columns
-    outcols=['chrom', 'start', 'end']
+    outcols = ["chrom", "start", "end"]
     outcols.extend(strand_cols)
-    outcols.extend(['flanking_sites_+','flanking_sites_-', 'sample_name', 'ntools', 'HQ'])
+    outcols.extend(
+        ["flanking_sites_+", "flanking_sites_-", "sample_name", "ntools", "HQ"]
+    )
     # add circExplorer columns
-    outcols.extend(['circExplorer_read_count',
-                    'circExplorer_found_BSJcounts', 
-                    'circExplorer_found_linear_BSJ_+_counts', 
-                    'circExplorer_found_linear_spliced_BSJ_+_counts', 
-                    'circExplorer_found_linear_BSJ_-_counts', 
-                    'circExplorer_found_linear_spliced_BSJ_-_counts', 
-                    'circExplorer_found_linear_BSJ_._counts', 
-                    'circExplorer_found_linear_spliced_BSJ_._counts'])
+    outcols.extend(
+        [
+            "circExplorer_read_count",
+            "circExplorer_found_BSJcounts",
+            "circExplorer_found_linear_BSJ_+_counts",
+            "circExplorer_found_linear_spliced_BSJ_+_counts",
+            "circExplorer_found_linear_BSJ_-_counts",
+            "circExplorer_found_linear_spliced_BSJ_-_counts",
+            "circExplorer_found_linear_BSJ_._counts",
+            "circExplorer_found_linear_spliced_BSJ_._counts",
+        ]
+    )
     # add ciri columns
-    outcols.extend(['ciri_read_count',
-                    'ciri_linear_read_count'])
+    outcols.extend(["ciri_read_count", "ciri_linear_read_count"])
     # add circExplorer_BWA columns
-    outcols.extend(['circExplorer_bwa_read_count'])
+    outcols.extend(["circExplorer_bwa_read_count"])
     # add find_circ columns
     if args.findcirc:
-        outcols.extend(['findcirc_read_count'])
+        outcols.extend(["findcirc_read_count"])
     # add DCC columns
-    if args.dcc: 
-        outcols.extend(['dcc_read_count',
-                        'dcc_linear_read_count'])
+    if args.dcc:
+        outcols.extend(["dcc_read_count", "dcc_linear_read_count"])
     # add MapSplice columns
-    if args.mapsplice: outcols.append('mapsplice_read_count')
+    if args.mapsplice:
+        outcols.append("mapsplice_read_count")
     # add NCLscan columns
-    if args.nclscan and includenclscan: outcols.append('nclscan_read_count')
+    if args.nclscan and includenclscan:
+        outcols.append("nclscan_read_count")
     # add circRNAfinder columns
-    if args.circrnafinder: outcols.append('circrnafinder_read_count')
+    if args.circrnafinder:
+        outcols.append("circrnafinder_read_count")
 
-    outcols.extend(['hqcounts','nonhqcounts'])
+    outcols.extend(["hqcounts", "nonhqcounts"])
     # add annotation columns
     outcols.extend(annotation_cols)
     merged_counts = merged_counts[outcols]
-    merged_counts.to_csv(args.outfile,sep="\t",header=True,index=False,compression='gzip')
+    merged_counts.to_csv(
+        args.outfile, sep="\t", header=True, index=False, compression="gzip"
+    )
 
 
 if __name__ == "__main__":
diff --git a/workflow/scripts/_multifasta2separatefastas.sh b/workflow/scripts/_multifasta2separatefastas.sh
index e14b198..4f26015 100755
--- a/workflow/scripts/_multifasta2separatefastas.sh
+++ b/workflow/scripts/_multifasta2separatefastas.sh
@@ -9,4 +9,4 @@ cat $fasta | awk '{
         if (substr($0, 1, 1)==">") {filename=(substr($0,2) ".fa")}
                         print $0 >> filename
                                 close(filename)
-                        }'
\ No newline at end of file
+                        }'
diff --git a/workflow/scripts/_process_bamtobed.py b/workflow/scripts/_process_bamtobed.py
index 222b7ae..a096148 100755
--- a/workflow/scripts/_process_bamtobed.py
+++ b/workflow/scripts/_process_bamtobed.py
@@ -3,83 +3,104 @@
 import argparse
 import gzip
 
+
 def main():
     # debug = True
     debug = False
-    parser = argparse.ArgumentParser(
-    )
+    parser = argparse.ArgumentParser()
     # INPUTs
-    parser.add_argument("-i","--inbed",dest="inbed",required=True,type=str,
-        help="Input bamtobed bed file")
+    parser.add_argument(
+        "-i",
+        "--inbed",
+        dest="inbed",
+        required=True,
+        type=str,
+        help="Input bamtobed bed file",
+    )
     # OUTPUTs
-    parser.add_argument('-o',"--outbed",dest="outbed",required=True,type=str,
-        help="Output bed file")
-    parser.add_argument('-l',"--linear",dest="linear",required=True,type=str,
-        help="gzip-ed list of linear readids")
-    parser.add_argument('-s',"--spliced",dest="spliced",required=True,type=str,
-        help="gzip-ed list of spliced readids")
+    parser.add_argument(
+        "-o", "--outbed", dest="outbed", required=True, type=str, help="Output bed file"
+    )
+    parser.add_argument(
+        "-l",
+        "--linear",
+        dest="linear",
+        required=True,
+        type=str,
+        help="gzip-ed list of linear readids",
+    )
+    parser.add_argument(
+        "-s",
+        "--spliced",
+        dest="spliced",
+        required=True,
+        type=str,
+        help="gzip-ed list of spliced readids",
+    )
     args = parser.parse_args()
-    outbed = open(args.outbed,'w')
+    outbed = open(args.outbed, "w")
     pairtest = 0
     paired = 0
     readname_counts = dict()
-    with open(args.inbed,'r') as inbed:
+    with open(args.inbed, "r") as inbed:
         for l in inbed:
-            l=l.strip().split("\t")
-            l1=[]
-            l2=[]
+            l = l.strip().split("\t")
+            l1 = []
+            l2 = []
             l1.append(l[0])
             l2.append(l[0])
             l1.append(l[1])
             l1.append(l[1])
             l2.append(l[2])
             l2.append(l[2])
-            if "/" in l[3]: # paired end
+            if "/" in l[3]:  # paired end
                 if pairtest == 0:
-                    pairtest=1
-                    paired=1
-                x=l[3].split("/")
-                readname=x[0]
-                if x[1]=="1":   # pick the strand of mate1 as the read strand
-                    strand=l[5]
-                else:           # if it is mate2 then reverse the strand
-                    if l[5]=="-": 
-                        strand="+"
-                    elif l[5]=="+":
-                        strand="-"
-                    else:       # if neither + or - is provided the use whatever is provided
-                        strand=l[5]
-            else: # single end
-                readname=l[3]
-                strand=l[5]
+                    pairtest = 1
+                    paired = 1
+                x = l[3].split("/")
+                readname = x[0]
+                if x[1] == "1":  # pick the strand of mate1 as the read strand
+                    strand = l[5]
+                else:  # if it is mate2 then reverse the strand
+                    if l[5] == "-":
+                        strand = "+"
+                    elif l[5] == "+":
+                        strand = "-"
+                    else:  # if neither + or - is provided the use whatever is provided
+                        strand = l[5]
+            else:  # single end
+                readname = l[3]
+                strand = l[5]
             if readname in readname_counts:
-                readname_counts[readname]+=1
+                readname_counts[readname] += 1
             else:
-                readname_counts[readname]=1
-            readname+="##"+strand
+                readname_counts[readname] = 1
+            readname += "##" + strand
             l1.append(readname)
             l2.append(readname)
             l1.append(".")
             l2.append(".")
             l1.append(strand)
             l2.append(strand)
-            outbed.write("\t".join(l1)+"\n")
-            outbed.write("\t".join(l2)+"\n")
+            outbed.write("\t".join(l1) + "\n")
+            outbed.write("\t".join(l2) + "\n")
     inbed.close()
     outbed.close()
     # linear = open(args.linear,'w')
     # spliced = open(args.spliced,'w')
     limit = 1
-    if paired==1: limit=2
-    with gzip.open(args.spliced,'wt') as spliced:
-        with gzip.open(args.linear,'wt') as linear:
-            for rid,count in readname_counts.items():
-                if count>limit:
-                    spliced.write("%s\n"%rid)
+    if paired == 1:
+        limit = 2
+    with gzip.open(args.spliced, "wt") as spliced:
+        with gzip.open(args.linear, "wt") as linear:
+            for rid, count in readname_counts.items():
+                if count > limit:
+                    spliced.write("%s\n" % rid)
                 else:
-                    linear.write("%s\n"%rid)
+                    linear.write("%s\n" % rid)
     spliced.close()
     linear.close()
 
+
 if __name__ == "__main__":
     main()
diff --git a/workflow/scripts/annotate_clear_quant.py b/workflow/scripts/annotate_clear_quant.py
index 39e3df9..f2238b5 100755
--- a/workflow/scripts/annotate_clear_quant.py
+++ b/workflow/scripts/annotate_clear_quant.py
@@ -4,16 +4,46 @@
 import pandas
 import sys
 
-indexcol=sys.argv[3] # hg38 or mm39
+indexcol = sys.argv[3]  # hg38 or mm39
 
-lookupfile=sys.argv[1]
-annotations=pandas.read_csv(lookupfile,sep="\t",header=0)
-annotations.set_index([indexcol],inplace=True)
+lookupfile = sys.argv[1]
+annotations = pandas.read_csv(lookupfile, sep="\t", header=0)
+annotations.set_index([indexcol], inplace=True)
 
-quantfile=sys.argv[2]
-quant=pandas.read_csv(quantfile,sep="\t",header=None,names=["quant_chrom","quant_start","quant_end","quant_name","quant_score","quant_quant_strand","quant_thickStart","quant_thickEnd","quant_itemRgb","quant_exonCount","quant_exonSizes","quant_exonOffsets","quant_readNumber","quant_circType","quant_geneName","quant_isoformName","quant_index","quant_flankIntron","quant_FPBcirc","quant_FPBlinear","quant_CIRCscore"])
-quant[indexcol]=quant.apply(lambda row: row.quant_chrom+":"+str(row.quant_start)+"-"+str(row.quant_end),axis=1)
-quant.set_index([indexcol],inplace=True)
+quantfile = sys.argv[2]
+quant = pandas.read_csv(
+    quantfile,
+    sep="\t",
+    header=None,
+    names=[
+        "quant_chrom",
+        "quant_start",
+        "quant_end",
+        "quant_name",
+        "quant_score",
+        "quant_quant_strand",
+        "quant_thickStart",
+        "quant_thickEnd",
+        "quant_itemRgb",
+        "quant_exonCount",
+        "quant_exonSizes",
+        "quant_exonOffsets",
+        "quant_readNumber",
+        "quant_circType",
+        "quant_geneName",
+        "quant_isoformName",
+        "quant_index",
+        "quant_flankIntron",
+        "quant_FPBcirc",
+        "quant_FPBlinear",
+        "quant_CIRCscore",
+    ],
+)
+quant[indexcol] = quant.apply(
+    lambda row: row.quant_chrom + ":" + str(row.quant_start) + "-" + str(row.quant_end),
+    axis=1,
+)
+quant.set_index([indexcol], inplace=True)
 
-x=quant.join(annotations)
-x.to_csv(quantfile+'.annotated',sep="\t",header=True)
+x = quant.join(annotations)
+x.to_csv(quantfile + ".annotated", sep="\t", header=True)
diff --git a/workflow/scripts/apply_junction_filters.py b/workflow/scripts/apply_junction_filters.py
index f9173ea..47e29f9 100755
--- a/workflow/scripts/apply_junction_filters.py
+++ b/workflow/scripts/apply_junction_filters.py
@@ -2,63 +2,106 @@
 import argparse
 import os
 
+
 def my_bool(s):
-    return s != 'False'
+    return s != "False"
+
 
-parser = argparse.ArgumentParser(description='apply junction filters, input stdin and output stdout')
-parser.add_argument('--regions', dest='regions', type=str, required=True, metavar="absolute path to regions file",
-                    help='regions file')
-parser.add_argument('--filter1regions', dest='filter1regions', type=str, required=True, metavar="eg. \"hg38,ERCC,rRNA\"",
-                    help='comma separated list of regions to apply filter1 on ... filter2 is applied to all other regions')
-parser.add_argument('--filter1_noncanonical', dest='filter1_noncanonical', default=True, type=my_bool, required=True, metavar="\"True/False\"",
-                    help='apply canonical filter on filter1')
-parser.add_argument('--filter1_unannotated', dest='filter1_unannotated', default=True, type=my_bool, required=True, metavar="\"True/False\"",
-                    help='apply unannotated filter on filter1')
-parser.add_argument('--filter2_noncanonical', dest='filter2_noncanonical', default=False, type=my_bool, required=True, metavar="\"True/False\"",
-                    help='apply canonical filter on filter2')
-parser.add_argument('--filter2_unannotated', dest='filter2_unannotated', default=False, type=my_bool, required=True, metavar="\"True/False\"",
-                    help='apply unannotated filter on filter2')					
+parser = argparse.ArgumentParser(
+    description="apply junction filters, input stdin and output stdout"
+)
+parser.add_argument(
+    "--regions",
+    dest="regions",
+    type=str,
+    required=True,
+    metavar="absolute path to regions file",
+    help="regions file",
+)
+parser.add_argument(
+    "--filter1regions",
+    dest="filter1regions",
+    type=str,
+    required=True,
+    metavar='eg. "hg38,ERCC,rRNA"',
+    help="comma separated list of regions to apply filter1 on ... filter2 is applied to all other regions",
+)
+parser.add_argument(
+    "--filter1_noncanonical",
+    dest="filter1_noncanonical",
+    default=True,
+    type=my_bool,
+    required=True,
+    metavar='"True/False"',
+    help="apply canonical filter on filter1",
+)
+parser.add_argument(
+    "--filter1_unannotated",
+    dest="filter1_unannotated",
+    default=True,
+    type=my_bool,
+    required=True,
+    metavar='"True/False"',
+    help="apply unannotated filter on filter1",
+)
+parser.add_argument(
+    "--filter2_noncanonical",
+    dest="filter2_noncanonical",
+    default=False,
+    type=my_bool,
+    required=True,
+    metavar='"True/False"',
+    help="apply canonical filter on filter2",
+)
+parser.add_argument(
+    "--filter2_unannotated",
+    dest="filter2_unannotated",
+    default=False,
+    type=my_bool,
+    required=True,
+    metavar='"True/False"',
+    help="apply unannotated filter on filter2",
+)
 args = parser.parse_args()
 
-chr2region=dict()
-regions=list()
+chr2region = dict()
+regions = list()
 x = open(args.regions)
 for r in x.readlines():
-	r = r.strip().split("\t")
-	regions.append(r[0])
-	for c in r[1].split():
-		chr2region[c]=r[0]
+    r = r.strip().split("\t")
+    regions.append(r[0])
+    for c in r[1].split():
+        chr2region[c] = r[0]
 x.close()
 
-region2filter=dict()
+region2filter = dict()
 for x in regions:
-	region2filter[x]=2 # apply filter2 to everything
+    region2filter[x] = 2  # apply filter2 to everything
 
-filter1regions=args.filter1regions
+filter1regions = args.filter1regions
 for f in filter1regions.split(","):
-	f = f.strip()
-	if not f in region2filter:
-		exit("Region "+f+" not defined!")
-	region2filter[f]=1 # change filter from filter2 to filter1
+    f = f.strip()
+    if not f in region2filter:
+        exit("Region " + f + " not defined!")
+    region2filter[f] = 1  # change filter from filter2 to filter1
 
 # cat {input} |sort|uniq|awk -F \"\\t\" '{{if ($5>0 && $6==1) {{print}}}}'|cut -f1-4|sort -k1,1 -k2,2n|uniq > {output.pass1sjtab}
 for line in sys.stdin:
-	l=line.split("\t")
-	f=region2filter[chr2region[l[0]]]
-	if f==1:
-		if args.filter1_noncanonical:
-			if not int(l[4])>0:
-				continue
-		if args.filter1_unannotated:
-			if not int(l[5])==1:
-				continue
-	elif f==2:
-		if args.filter2_noncanonical:
-			if not int(l[4])>0:
-				continue
-		if args.filter2_unannotated:
-			if not int(l[5])==1:
-				continue
-	sys.stdout.write(line)
-	# exit()
-
+    l = line.split("\t")
+    f = region2filter[chr2region[l[0]]]
+    if f == 1:
+        if args.filter1_noncanonical:
+            if not int(l[4]) > 0:
+                continue
+        if args.filter1_unannotated:
+            if not int(l[5]) == 1:
+                continue
+    elif f == 2:
+        if args.filter2_noncanonical:
+            if not int(l[4]) > 0:
+                continue
+        if args.filter2_unannotated:
+            if not int(l[5]) == 1:
+                continue
+    sys.stdout.write(line)
+    # exit()
diff --git a/workflow/scripts/bam_get_max_readlen.py b/workflow/scripts/bam_get_max_readlen.py
index d16edef..25e3001 100755
--- a/workflow/scripts/bam_get_max_readlen.py
+++ b/workflow/scripts/bam_get_max_readlen.py
@@ -8,17 +8,19 @@ def main():
     parser = argparse.ArgumentParser(
         description="Print out the maximum aligned read length in the input BAM"
     )
-    parser.add_argument("-i","--bam",dest="inbam",required=True,type=str,
-        help="Input BAM file")
+    parser.add_argument(
+        "-i", "--bam", dest="inbam", required=True, type=str, help="Input BAM file"
+    )
     args = parser.parse_args()
     samfile = pysam.AlignmentFile(args.inbam, "rb")
-    maxrl=0
+    maxrl = 0
     for read in samfile.fetch():
         rl = int(read.query_length)
-        if rl > maxrl: maxrl=rl
+        if rl > maxrl:
+            maxrl = rl
     samfile.close()
     print(maxrl)
 
 
 if __name__ == "__main__":
-    main()
\ No newline at end of file
+    main()
diff --git a/workflow/scripts/bam_split_by_regions.py b/workflow/scripts/bam_split_by_regions.py
index 9da14e2..abd1ba5 100755
--- a/workflow/scripts/bam_split_by_regions.py
+++ b/workflow/scripts/bam_split_by_regions.py
@@ -3,46 +3,51 @@
 import os
 import time
 
+
 def get_ctime():
     return time.ctime(time.time())
 
-def read_regions(regionsfile,host,additives,viruses):
-    host=host.split(",")
-    additives=additives.split(",")
-    viruses=viruses.split(",")
-    infile=open(regionsfile,'r')
-    regions=dict()
+
+def read_regions(regionsfile, host, additives, viruses):
+    host = host.split(",")
+    additives = additives.split(",")
+    viruses = viruses.split(",")
+    infile = open(regionsfile, "r")
+    regions = dict()
     for l in infile.readlines():
         l = l.strip().split("\t")
-        region_name=l[0]
-        regions[region_name]=dict()
-        regions[region_name]['sequences']=dict()
+        region_name = l[0]
+        regions[region_name] = dict()
+        regions[region_name]["sequences"] = dict()
         if region_name in host:
-            regions[region_name]['host_additive_virus']="host"
+            regions[region_name]["host_additive_virus"] = "host"
         elif region_name in additives:
-            regions[region_name]['host_additive_virus']="additive"
+            regions[region_name]["host_additive_virus"] = "additive"
         elif region_name in viruses:
-            regions[region_name]['host_additive_virus']="virus"
+            regions[region_name]["host_additive_virus"] = "virus"
         else:
             exit("%s has unknown region. Its not a host or a additive or a virus!!")
-        sequence_names=l[1].split()
+        sequence_names = l[1].split()
         for s in sequence_names:
-            regions[region_name]['sequences'][s]=1
-    return regions        
+            regions[region_name]["sequences"][s] = 1
+    return regions
+
 
-def _get_host_additive_virus(regions,seqname):
-    for k,v in regions.items():
-        if seqname in v['sequences']:
-            return v['host_additive_virus']
+def _get_host_additive_virus(regions, seqname):
+    for k, v in regions.items():
+        if seqname in v["sequences"]:
+            return v["host_additive_virus"]
     else:
-        exit("Sequence: %s does not have a region."%(seqname))
+        exit("Sequence: %s does not have a region." % (seqname))
 
-def _get_regionname_from_seqname(regions,seqname):
-    for k,v in regions.items():
-        if seqname in v['sequences']:
+
+def _get_regionname_from_seqname(regions, seqname):
+    for k, v in regions.items():
+        if seqname in v["sequences"]:
             return k
     else:
-        exit("Sequence: %s does not have a region."%(seqname))
+        exit("Sequence: %s does not have a region." % (seqname))
+
 
 def main():
     # debug = True
@@ -51,66 +56,113 @@ def main():
         description="""Extracts PE BSJs from STAR2p output Chimeric BAM file. It also adds
         unique read group IDs to each read. This RID is of the format <chrom>##<start>##<end>
         where the chrom, start and end represent the BSJ the read is depicting.
-        ## UPDATE: works for all BAM files ... not just BSJ only 
+        ## UPDATE: works for all BAM files ... not just BSJ only
         """
     )
-    #INPUTs
-    parser.add_argument("-i","--inbam",dest="inbam",required=True,type=str,
-        help="Input BAM file")
-    parser.add_argument("-s",'--sample_name', dest='samplename', type=str, required=False, default = 'sample1',
-        help='Sample Name: SM for RG')
-    parser.add_argument('--regions', dest='regions', type=str, required=True,
-        help='regions file eg. ref.fa.regions')
-    parser.add_argument('--host', dest='host', type=str, required=True,
-        help='host name eg.hg38... single value')
-    parser.add_argument('--additives', dest='additives', type=str, required=True,
-        help='additive name(s) eg.ERCC... comma-separated list... all BSJs in this region are filtered out')
-    parser.add_argument('--viruses', dest='viruses', type=str, required=True,
-        help='virus name(s) eg.NC_009333.1... comma-separated list')
-    parser.add_argument('--prefix', dest='prefix', type=str, required=True,
-        help='outfile prefix ... like "linear" or "linear_spliced" etc.')
-    #OUTPUTs
-    parser.add_argument("--outdir",dest="outdir",required=False,type=str,
-        help="Output folder for the individual BAM files.")
+    # INPUTs
+    parser.add_argument(
+        "-i", "--inbam", dest="inbam", required=True, type=str, help="Input BAM file"
+    )
+    parser.add_argument(
+        "-s",
+        "--sample_name",
+        dest="samplename",
+        type=str,
+        required=False,
+        default="sample1",
+        help="Sample Name: SM for RG",
+    )
+    parser.add_argument(
+        "--regions",
+        dest="regions",
+        type=str,
+        required=True,
+        help="regions file eg. ref.fa.regions",
+    )
+    parser.add_argument(
+        "--host",
+        dest="host",
+        type=str,
+        required=True,
+        help="host name eg.hg38... single value",
+    )
+    parser.add_argument(
+        "--additives",
+        dest="additives",
+        type=str,
+        required=True,
+        help="additive name(s) eg.ERCC... comma-separated list... all BSJs in this region are filtered out",
+    )
+    parser.add_argument(
+        "--viruses",
+        dest="viruses",
+        type=str,
+        required=True,
+        help="virus name(s) eg.NC_009333.1... comma-separated list",
+    )
+    parser.add_argument(
+        "--prefix",
+        dest="prefix",
+        type=str,
+        required=True,
+        help='outfile prefix ... like "linear" or "linear_spliced" etc.',
+    )
+    # OUTPUTs
+    parser.add_argument(
+        "--outdir",
+        dest="outdir",
+        required=False,
+        type=str,
+        help="Output folder for the individual BAM files.",
+    )
 
     args = parser.parse_args()
 
     samfile = pysam.AlignmentFile(args.inbam, "rb")
     sequences = list()
     samheader = samfile.header.to_dict()
-    for v in samheader['SQ']:
-        sequences.append(v['SN'])
-    
-    seqname2regionname=dict()
-    hosts=set()
-    viruses=set()
-    
-    regions = read_regions(regionsfile=args.regions,host=args.host,additives=args.additives,viruses=args.viruses)		
+    for v in samheader["SQ"]:
+        sequences.append(v["SN"])
+
+    seqname2regionname = dict()
+    hosts = set()
+    viruses = set()
+
+    regions = read_regions(
+        regionsfile=args.regions,
+        host=args.host,
+        additives=args.additives,
+        viruses=args.viruses,
+    )
     for s in sequences:
-        hav = _get_host_additive_virus(regions,s)
+        hav = _get_host_additive_virus(regions, s)
         if hav == "host":
-            hostname = _get_regionname_from_seqname(regions,s)
-            seqname2regionname[s]=hostname
+            hostname = _get_regionname_from_seqname(regions, s)
+            seqname2regionname[s] = hostname
             hosts.add(hostname)
         if hav == "virus":
-            virusname = _get_regionname_from_seqname(regions,s)
-            seqname2regionname[s]=virusname
+            virusname = _get_regionname_from_seqname(regions, s)
+            seqname2regionname[s] = virusname
             viruses.add(virusname)
         if hav == "additive":
-            additive = _get_regionname_from_seqname(regions,s)
-            seqname2regionname[s]=additive
-    
+            additive = _get_regionname_from_seqname(regions, s)
+            seqname2regionname[s] = additive
+
     outputbams = dict()
     for h in hosts:
-        outbamname = os.path.join(args.outdir,args.samplename+"."+args.prefix+"."+h+".bam")
-        outputbams[h] = pysam.AlignmentFile(outbamname, "wb", header = samheader)
+        outbamname = os.path.join(
+            args.outdir, args.samplename + "." + args.prefix + "." + h + ".bam"
+        )
+        outputbams[h] = pysam.AlignmentFile(outbamname, "wb", header=samheader)
     for h in viruses:
-        outbamname = os.path.join(args.outdir,args.samplename+"."+args.prefix+"."+h+".bam")
-        outputbams[h] = pysam.AlignmentFile(outbamname, "wb", header = samheader)
-    
+        outbamname = os.path.join(
+            args.outdir, args.samplename + "." + args.prefix + "." + h + ".bam"
+        )
+        outputbams[h] = pysam.AlignmentFile(outbamname, "wb", header=samheader)
+
     for read in samfile.fetch():
-        chrom=read.reference_name
-        regionname=seqname2regionname[chrom]
+        chrom = read.reference_name
+        regionname = seqname2regionname[chrom]
         if regionname in hosts or regionname in viruses:
             outputbams[regionname].write(read)
     samfile.close()
@@ -118,6 +170,5 @@ def main():
         o.close()
 
 
-
 if __name__ == "__main__":
     main()
diff --git a/workflow/scripts/bam_to_bigwig.sh b/workflow/scripts/bam_to_bigwig.sh
index 31c8b2f..9446f16 100755
--- a/workflow/scripts/bam_to_bigwig.sh
+++ b/workflow/scripts/bam_to_bigwig.sh
@@ -25,4 +25,4 @@ if [ "$(wc -l ${tmpdir}/${bdg}|awk '{print $1}')" != "0" ];then
     samtools view -H $bam | grep ^@SQ | cut -f2,3 | sed "s/SN://g" | sed "s/LN://g" > ${tmpdir}/${sizes}
     bedGraphToBigWig ${tmpdir}/${bdg} ${tmpdir}/${sizes} $bw
 fi
-rm -f ${tmpdir}/${bdg} ${tmpdir}/${sizes}
\ No newline at end of file
+rm -f ${tmpdir}/${bdg} ${tmpdir}/${sizes}
diff --git a/workflow/scripts/circExplorer_get_annotated_counts_per_sample.py b/workflow/scripts/circExplorer_get_annotated_counts_per_sample.py
index c5ccb8c..60cb734 100755
--- a/workflow/scripts/circExplorer_get_annotated_counts_per_sample.py
+++ b/workflow/scripts/circExplorer_get_annotated_counts_per_sample.py
@@ -1,134 +1,274 @@
 import argparse
 
+
 class BSJ:
-    def __init__(self,chrom="",start=-1,end=-1,strand=".",known_novel="novel",read_count=-1,counted=-1):
-        self.chrom=chrom
-        self.start=start
-        self.end=end
-        self.strand=strand
-        self.known_novel=known_novel
-        self.read_count=read_count
-        self.counted=counted
+    def __init__(
+        self,
+        chrom="",
+        start=-1,
+        end=-1,
+        strand=".",
+        known_novel="novel",
+        read_count=-1,
+        counted=-1,
+    ):
+        self.chrom = chrom
+        self.start = start
+        self.end = end
+        self.strand = strand
+        self.known_novel = known_novel
+        self.read_count = read_count
+        self.counted = counted
+
     def __str__(self):
         # id="##".join([self.chrom,str(self.start),str(self.end),self.strand])
-        return "%s\t%d\t%d\t%s\t%d\t%s\n"%(self.chrom,self.start,self.end,self.strand,self.read_count,self.known_novel)
-    
-def read_regions(regionsfile,host,additives,viruses):
-    host=host.split(",")
-    additives=additives.split(",")
-    viruses=viruses.split(",")
-    infile=open(regionsfile,'r')
-    regions=dict()
+        return "%s\t%d\t%d\t%s\t%d\t%s\n" % (
+            self.chrom,
+            self.start,
+            self.end,
+            self.strand,
+            self.read_count,
+            self.known_novel,
+        )
+
+
+def read_regions(regionsfile, host, additives, viruses):
+    host = host.split(",")
+    additives = additives.split(",")
+    viruses = viruses.split(",")
+    infile = open(regionsfile, "r")
+    regions = dict()
     for l in infile.readlines():
         l = l.strip().split("\t")
-        region_name=l[0]
-        regions[region_name]=dict()
-        regions[region_name]['sequences']=dict()
+        region_name = l[0]
+        regions[region_name] = dict()
+        regions[region_name]["sequences"] = dict()
         if region_name in host:
-            regions[region_name]['host_additive_virus']="host"
+            regions[region_name]["host_additive_virus"] = "host"
         elif region_name in additives:
-            regions[region_name]['host_additive_virus']="additive"
+            regions[region_name]["host_additive_virus"] = "additive"
         elif region_name in viruses:
-            regions[region_name]['host_additive_virus']="virus"
+            regions[region_name]["host_additive_virus"] = "virus"
         else:
             exit("%s has unknown region. Its not a host or a additive or a virus!!")
-        sequence_names=l[1].split()
+        sequence_names = l[1].split()
         for s in sequence_names:
-            regions[region_name]['sequences'][s]=1
-    return regions        
+            regions[region_name]["sequences"][s] = 1
+    return regions
 
-def _get_host_additive_virus(regions,seqname):
-    for k,v in regions.items():
-        if seqname in v['sequences']:
-            return v['host_additive_virus']
+
+def _get_host_additive_virus(regions, seqname):
+    for k, v in regions.items():
+        if seqname in v["sequences"]:
+            return v["host_additive_virus"]
     else:
-        exit("Sequence: %s does not have a region."%(seqname))
+        exit("Sequence: %s does not have a region." % (seqname))
+
 
-def read_BSJs(filename,regions,host_min,host_max,virus_min,virus_max,known_novel="novel",counted=-1,threshold=0):
-    infile=open(filename,'r')
-    BSJdict=dict()
+def read_BSJs(
+    filename,
+    regions,
+    host_min,
+    host_max,
+    virus_min,
+    virus_max,
+    known_novel="novel",
+    counted=-1,
+    threshold=0,
+):
+    infile = open(filename, "r")
+    BSJdict = dict()
     for l in infile.readlines():
-        l=l.strip().split("\t")
-        chrom=l[0]
-        start=int(l[1])
-        end=int(l[2])
-        strand=l[5]
-        circid="##".join([chrom,str(start),str(end)])
+        l = l.strip().split("\t")
+        chrom = l[0]
+        start = int(l[1])
+        end = int(l[2])
+        strand = l[5]
+        circid = "##".join([chrom, str(start), str(end)])
         # count=int(l[3].split("/")[1])
-        count=int(l[3])
+        count = int(l[3])
         if count < threshold:
             continue
-        host_additive_virus=_get_host_additive_virus(regions=regions,seqname=chrom)
+        host_additive_virus = _get_host_additive_virus(regions=regions, seqname=chrom)
         # if host_additive_virus == "additive": continue
-        size = end-start
+        size = end - start
         if host_additive_virus == "host" or host_additive_virus == "additive":
-            if size < host_min: continue
-            if size > host_max: continue
+            if size < host_min:
+                continue
+            if size > host_max:
+                continue
         if host_additive_virus == "virus":
-            if size < virus_min : continue
-            if size > virus_max : continue
-        BSJdict[circid]=BSJ(chrom=chrom,start=start,end=end,strand=strand,known_novel=known_novel,read_count=count,counted=counted)
-    return(BSJdict)
+            if size < virus_min:
+                continue
+            if size > virus_max:
+                continue
+        BSJdict[circid] = BSJ(
+            chrom=chrom,
+            start=start,
+            end=end,
+            strand=strand,
+            known_novel=known_novel,
+            read_count=count,
+            counted=counted,
+        )
+    return BSJdict
+
 
-parser = argparse.ArgumentParser(description='Create CircExplorer2 Per Sample Counts Table')
+parser = argparse.ArgumentParser(
+    description="Create CircExplorer2 Per Sample Counts Table"
+)
 # INPUTS
-parser.add_argument('--back_spliced_bed', dest='bsb', type=str, required=True,
-                    help='back_spliced.bed')
-parser.add_argument('--back_spliced_min_reads', dest='back_spliced_min_reads', type=int, required=True,
-                    help='back_spliced minimum read threshold') # in addition to "known" and "low-conf" circRNAs identified by circexplorer, we also include those found in back_spliced.bed file but not classified as known/low-conf only if the number of reads supporting the BSJ call is greater than this number
-parser.add_argument('--circularRNA_known', dest='ck', type=str, required=True,
-                    help='circularRNA_known.txt')
-parser.add_argument('--low_conf', dest='lc', type=str, required=False,
-                    help='low_conf.circularRNA_known.txt')
-parser.add_argument('--host', dest='host', type=str, required=True,
-                    help='host name eg.hg38... single value...host_filter_min/host_filter_max filters are applied to this region only')
-parser.add_argument('--additives', dest='additives', type=str, required=True,
-                    help='additive name(s) eg.ERCC... comma-separated list... all BSJs in this region are filtered out')
-parser.add_argument('--viruses', dest='viruses', type=str, required=True,
-                    help='virus name(s) eg.NC_009333.1... comma-separated list...virus_filter_min/virus_filter_max filters are applied to this region only')
-parser.add_argument('--host_filter_min', dest='host_filter_min', type=int, required=False, default=150,
-                    help='min BSJ size filter for host')
-parser.add_argument('--virus_filter_min', dest='virus_filter_min', type=int, required=False, default=150,
-                    help='min BSJ size filter for virus')
-parser.add_argument('--host_filter_max', dest='host_filter_max', type=int, required=False, default=5000,
-                    help='max BSJ size filter for host')
-parser.add_argument('--virus_filter_max', dest='virus_filter_max', type=int, required=False, default=5000,
-                    help='max BSJ size filter for virus')
-parser.add_argument('--regions', dest='regions', type=str, required=True,
-                    help='regions file eg. ref.fa.regions')
+parser.add_argument(
+    "--back_spliced_bed", dest="bsb", type=str, required=True, help="back_spliced.bed"
+)
+parser.add_argument(
+    "--back_spliced_min_reads",
+    dest="back_spliced_min_reads",
+    type=int,
+    required=True,
+    help="back_spliced minimum read threshold",
+)  # in addition to "known" and "low-conf" circRNAs identified by circexplorer, we also include those found in back_spliced.bed file but not classified as known/low-conf only if the number of reads supporting the BSJ call is greater than this number
+parser.add_argument(
+    "--circularRNA_known",
+    dest="ck",
+    type=str,
+    required=True,
+    help="circularRNA_known.txt",
+)
+parser.add_argument(
+    "--low_conf",
+    dest="lc",
+    type=str,
+    required=False,
+    help="low_conf.circularRNA_known.txt",
+)
+parser.add_argument(
+    "--host",
+    dest="host",
+    type=str,
+    required=True,
+    help="host name eg.hg38... single value...host_filter_min/host_filter_max filters are applied to this region only",
+)
+parser.add_argument(
+    "--additives",
+    dest="additives",
+    type=str,
+    required=True,
+    help="additive name(s) eg.ERCC... comma-separated list... all BSJs in this region are filtered out",
+)
+parser.add_argument(
+    "--viruses",
+    dest="viruses",
+    type=str,
+    required=True,
+    help="virus name(s) eg.NC_009333.1... comma-separated list...virus_filter_min/virus_filter_max filters are applied to this region only",
+)
+parser.add_argument(
+    "--host_filter_min",
+    dest="host_filter_min",
+    type=int,
+    required=False,
+    default=150,
+    help="min BSJ size filter for host",
+)
+parser.add_argument(
+    "--virus_filter_min",
+    dest="virus_filter_min",
+    type=int,
+    required=False,
+    default=150,
+    help="min BSJ size filter for virus",
+)
+parser.add_argument(
+    "--host_filter_max",
+    dest="host_filter_max",
+    type=int,
+    required=False,
+    default=5000,
+    help="max BSJ size filter for host",
+)
+parser.add_argument(
+    "--virus_filter_max",
+    dest="virus_filter_max",
+    type=int,
+    required=False,
+    default=5000,
+    help="max BSJ size filter for virus",
+)
+parser.add_argument(
+    "--regions",
+    dest="regions",
+    type=str,
+    required=True,
+    help="regions file eg. ref.fa.regions",
+)
 # OUTPUTS
-parser.add_argument('-o',dest='outfile',required=True,help='counts TSV table')
+parser.add_argument("-o", dest="outfile", required=True, help="counts TSV table")
 args = parser.parse_args()
 
-regions=read_regions(regionsfile=args.regions,host=args.host,additives=args.additives,viruses=args.viruses)
-o=open(args.outfile,'w')
+regions = read_regions(
+    regionsfile=args.regions,
+    host=args.host,
+    additives=args.additives,
+    viruses=args.viruses,
+)
+o = open(args.outfile, "w")
 o.write("#chrom\tstart\tend\tstrand\tread_count\tknown_novel\n")
-all_BSJs=read_BSJs(args.bsb,counted=0,threshold=args.back_spliced_min_reads,regions=regions,host_min=args.host_filter_min,host_max=args.host_filter_max,virus_min=args.virus_filter_min,virus_max=args.virus_filter_max)
+all_BSJs = read_BSJs(
+    args.bsb,
+    counted=0,
+    threshold=args.back_spliced_min_reads,
+    regions=regions,
+    host_min=args.host_filter_min,
+    host_max=args.host_filter_max,
+    virus_min=args.virus_filter_min,
+    virus_max=args.virus_filter_max,
+)
 
 
-known_BSJs=read_BSJs(args.ck,known_novel="known",counted=0,threshold=args.back_spliced_min_reads,regions=regions,host_min=args.host_filter_min,host_max=args.host_filter_max,virus_min=args.virus_filter_min,virus_max=args.virus_filter_max)
+known_BSJs = read_BSJs(
+    args.ck,
+    known_novel="known",
+    counted=0,
+    threshold=args.back_spliced_min_reads,
+    regions=regions,
+    host_min=args.host_filter_min,
+    host_max=args.host_filter_max,
+    virus_min=args.virus_filter_min,
+    virus_max=args.virus_filter_max,
+)
 if args.lc:
-    low_conf_BSJs=read_BSJs(args.lc,known_novel="known",counted=0,threshold=args.back_spliced_min_reads,regions=regions,host_min=args.host_filter_min,host_max=args.host_filter_max,virus_min=args.virus_filter_min,virus_max=args.virus_filter_max)
-    for k,v in all_BSJs.items():
+    low_conf_BSJs = read_BSJs(
+        args.lc,
+        known_novel="known",
+        counted=0,
+        threshold=args.back_spliced_min_reads,
+        regions=regions,
+        host_min=args.host_filter_min,
+        host_max=args.host_filter_max,
+        virus_min=args.virus_filter_min,
+        virus_max=args.virus_filter_max,
+    )
+    for k, v in all_BSJs.items():
         if k in low_conf_BSJs:
-            all_BSJs[k].known_novel="low_conf"
-            all_BSJs[k].strand=v.strand
-            all_BSJs[k].counted=1
-            low_conf_BSJs[k].counted=1
+            all_BSJs[k].known_novel = "low_conf"
+            all_BSJs[k].strand = v.strand
+            all_BSJs[k].counted = 1
+            low_conf_BSJs[k].counted = 1
 
-for k,v in all_BSJs.items():
+for k, v in all_BSJs.items():
     if k in known_BSJs:
-        all_BSJs[k].known_novel="known"
-        all_BSJs[k].strand=known_BSJs[k].strand
-        all_BSJs[k].counted=1
-        known_BSJs[k].counted=1
+        all_BSJs[k].known_novel = "known"
+        all_BSJs[k].strand = known_BSJs[k].strand
+        all_BSJs[k].counted = 1
+        known_BSJs[k].counted = 1
     o.write(str(all_BSJs[k]))
 
-lst=[known_BSJs]
+lst = [known_BSJs]
 if args.lc:
     lst.append(low_conf_BSJs)
 for l in lst:
-    for k,v in l.items():
-        if l[k].counted!=1:
+    for k, v in l.items():
+        if l[k].counted != 1:
             o.write(str(v))
 o.close()
diff --git a/workflow/scripts/create_circExplorer_linear_bam.py b/workflow/scripts/create_circExplorer_linear_bam.py
index e3fb5f7..eaf956f 100755
--- a/workflow/scripts/create_circExplorer_linear_bam.py
+++ b/workflow/scripts/create_circExplorer_linear_bam.py
@@ -6,96 +6,102 @@
 
 pp = pprint.PrettyPrinter(indent=4)
 
-def read_regions(regionsfile,host,additives,viruses):
-    host=host.split(",")
-    additives=additives.split(",")
-    viruses=viruses.split(",")
-    infile=open(regionsfile,'r')
-    regions=dict()
+
+def read_regions(regionsfile, host, additives, viruses):
+    host = host.split(",")
+    additives = additives.split(",")
+    viruses = viruses.split(",")
+    infile = open(regionsfile, "r")
+    regions = dict()
     for l in infile.readlines():
         l = l.strip().split("\t")
-        region_name=l[0]
-        regions[region_name]=dict()
-        regions[region_name]['sequences']=dict()
+        region_name = l[0]
+        regions[region_name] = dict()
+        regions[region_name]["sequences"] = dict()
         if region_name in host:
-            regions[region_name]['host_additive_virus']="host"
+            regions[region_name]["host_additive_virus"] = "host"
         elif region_name in additives:
-            regions[region_name]['host_additive_virus']="additive"
+            regions[region_name]["host_additive_virus"] = "additive"
         elif region_name in viruses:
-            regions[region_name]['host_additive_virus']="virus"
+            regions[region_name]["host_additive_virus"] = "virus"
         else:
             exit("%s has unknown region. Its not a host or a additive or a virus!!")
-        sequence_names=l[1].split()
+        sequence_names = l[1].split()
         for s in sequence_names:
-            regions[region_name]['sequences'][s]=1
-    return regions        
+            regions[region_name]["sequences"][s] = 1
+    return regions
+
 
-def _get_host_additive_virus(regions,seqname):
-    for k,v in regions.items():
-        if seqname in v['sequences']:
-            return v['host_additive_virus']
+def _get_host_additive_virus(regions, seqname):
+    for k, v in regions.items():
+        if seqname in v["sequences"]:
+            return v["host_additive_virus"]
     else:
-        exit("Sequence: %s does not have a region."%(seqname))
+        exit("Sequence: %s does not have a region." % (seqname))
+
 
 class JUNCTION:
-    def __init__(self,jid,chrom="",start=-1,end=-1):
-        self.jid=jid
-        self.chrom=chrom
-        self.start=int(start)
-        self.end=int(end)
-        self.score=0
-        self.rids=set()
-        self.refcoords=dict()
-        self.keeprids=set()
-    
-    def append_rid_refcoords(self,rid,coords):
-        if not rid in self.rids: self.refcoords[rid]=dict()
+    def __init__(self, jid, chrom="", start=-1, end=-1):
+        self.jid = jid
+        self.chrom = chrom
+        self.start = int(start)
+        self.end = int(end)
+        self.score = 0
+        self.rids = set()
+        self.refcoords = dict()
+        self.keeprids = set()
+
+    def append_rid_refcoords(self, rid, coords):
+        if not rid in self.rids:
+            self.refcoords[rid] = dict()
         self.rids.add(rid)
         for c in coords:
-            if not c in self.refcoords[rid]: self.refcoords[rid][c]=1
+            if not c in self.refcoords[rid]:
+                self.refcoords[rid][c] = 1
 
-    def append_keeprid(self,rid):
+    def append_keeprid(self, rid):
         self.keeprids.add(rid)
-    
-    def set_chrom_start_end(self,chrom,start,end):
-        self.chrom=chrom
-        self.start=int(start)
-        self.end=int(end)
+
+    def set_chrom_start_end(self, chrom, start, end):
+        self.chrom = chrom
+        self.start = int(start)
+        self.end = int(end)
+
 
 class BSJ:
     def __init__(self):
-        self.chrom=""
-        self.start=""
-        self.end=""
-        self.score=0
-        self.name="."
-        self.strand="U"
-        self.bitids=set()
-        self.rids=set()
-        
+        self.chrom = ""
+        self.start = ""
+        self.end = ""
+        self.score = 0
+        self.name = "."
+        self.strand = "U"
+        self.bitids = set()
+        self.rids = set()
+
     def plusone(self):
-        self.score+=1
-    
-    def set_strand(self,strand):
-        self.strand=strand
-    
-    def set_chrom(self,chrom):
-        self.chrom=chrom
-
-    def set_start(self,start):
-        self.start=start
-
-    def set_end(self,end):
-        self.end=end
-    
-    def append_bitid(self,bitid):
+        self.score += 1
+
+    def set_strand(self, strand):
+        self.strand = strand
+
+    def set_chrom(self, chrom):
+        self.chrom = chrom
+
+    def set_start(self, start):
+        self.start = start
+
+    def set_end(self, end):
+        self.end = end
+
+    def append_bitid(self, bitid):
         self.bitids.add(bitid)
 
-    def append_rid(self,rid):
+    def append_rid(self, rid):
         self.rids.add(rid)
-        
-    def write_out_BSJ(self,outbed):
-        t=[]
+
+    def write_out_BSJ(self, outbed):
+        t = []
         t.append(self.chrom)
         t.append(str(self.start))
         t.append(str(self.end))
@@ -104,145 +110,150 @@ def write_out_BSJ(self,outbed):
         t.append(self.strand)
         t.append(",".join(self.bitids))
         t.append(",".join(self.rids))
-        outbed.write("\t".join(t)+"\n")
+        outbed.write("\t".join(t) + "\n")
 
-    def update_score_and_found_count(self,junctions_found):
+    def update_score_and_found_count(self, junctions_found):
         self.score = len(self.rids)
-        jid = self.chrom + "##" + str(self.start) + "##" + str(int(self.end)-1)
-        junctions_found[jid]+=self.score
+        jid = self.chrom + "##" + str(self.start) + "##" + str(int(self.end) - 1)
+        junctions_found[jid] += self.score
+
 
-        
 class Readinfo:
-    def __init__(self,readid,rname):
-        self.readid=readid
-        self.refname=rname
-        self.bitflags=list()
-        self.bitid=""
-        self.strand="."
-        self.start=-1
-        self.end=-1
-        self.refcoordinates=dict()
-        self.isread1=dict()
-        self.isreverse=dict()
-        self.issecondary=dict()
-        self.issupplementary=dict()
-    
+    def __init__(self, readid, rname):
+        self.readid = readid
+        self.refname = rname
+        self.bitflags = list()
+        self.bitid = ""
+        self.strand = "."
+        self.start = -1
+        self.end = -1
+        self.refcoordinates = dict()
+        self.isread1 = dict()
+        self.isreverse = dict()
+        self.issecondary = dict()
+        self.issupplementary = dict()
+
     def __str__(self):
-        s = "readid: %s"%(self.readid)
-        s = "%s\tbitflags: %s"%(s,self.bitflags)
-        s = "%s\tbitid: %s"%(s,self.bitid)
+        s = "readid: %s" % (self.readid)
+        s = "%s\tbitflags: %s" % (s, self.bitflags)
+        s = "%s\tbitid: %s" % (s, self.bitid)
         for bf in self.bitflags:
-            s = "%s\t%s\trefcoordinates: %s"%(s,bf,", ".join(list(map(lambda x:str(x),self.refcoordinates[bf]))))
+            s = "%s\t%s\trefcoordinates: %s" % (
+                s,
+                bf,
+                ", ".join(list(map(lambda x: str(x), self.refcoordinates[bf]))),
+            )
         return s
 
-    def set_refcoordinates(self,bitflag,refpos):
-        self.refcoordinates[bitflag]=refpos
-    
-    def set_read1_reverse_secondary_supplementary(self,bitflag,read):
+    def set_refcoordinates(self, bitflag, refpos):
+        self.refcoordinates[bitflag] = refpos
+
+    def set_read1_reverse_secondary_supplementary(self, bitflag, read):
         if read.is_read1:
-            self.isread1[bitflag]="Y"
+            self.isread1[bitflag] = "Y"
         else:
-            self.isread1[bitflag]="N"
+            self.isread1[bitflag] = "N"
         if read.is_reverse:
-            self.isreverse[bitflag]="Y"
+            self.isreverse[bitflag] = "Y"
         else:
-            self.isreverse[bitflag]="N"
+            self.isreverse[bitflag] = "N"
         if read.is_secondary:
-            self.issecondary[bitflag]="Y"
+            self.issecondary[bitflag] = "Y"
         else:
-            self.issecondary[bitflag]="N"
+            self.issecondary[bitflag] = "N"
         if read.is_supplementary:
-            self.issupplementary[bitflag]="Y"
+            self.issupplementary[bitflag] = "Y"
         else:
-            self.issupplementary[bitflag]="N"
-    
-    def append_alignment(self,read):
+            self.issupplementary[bitflag] = "N"
+
+    def append_alignment(self, read):
         self.alignments.append(read)
-    
-    def append_bitflag(self,bf):
+
+    def append_bitflag(self, bf):
         self.bitflags.append(bf)
-    
+
     # def extend_ref_positions(self,refcoords):
     # 	self.refcoordinates.extend(refcoords)
-    
+
     def generate_bitid(self):
-        bitlist=sorted(self.bitflags)
-        self.bitid="##".join(list(map(lambda x:str(x),bitlist)))
-# 		self.bitid=str(bitlist[0])+"##"+str(bitlist[1])+"##"+str(bitlist[2])
-    
+        bitlist = sorted(self.bitflags)
+        self.bitid = "##".join(list(map(lambda x: str(x), bitlist)))
+
+    # 		self.bitid=str(bitlist[0])+"##"+str(bitlist[1])+"##"+str(bitlist[2])
+
     def get_strand(self):
-        if self.bitid=="83##163##2129":
-            self.strand="+"
-        elif self.bitid=="339##419##2385":
-            self.strand="+"
-        elif self.bitid=="83##163##2209":
-            self.strand="+"
-        elif self.bitid=="339##419##2465":
-            self.strand="+"		
-        elif self.bitid=="99##147##2193":
-            self.strand="-"
-        elif self.bitid=="355##403##2449":
-            self.strand="-"
-        elif self.bitid=="99##147##2145":
-            self.strand="-"
-        elif self.bitid=="355##403##2401":
-            self.strand="-"
-        elif self.bitid=="16##2064":
-            self.strand="+"
-        elif self.bitid=="272##2320":
-            self.strand="+"
-        elif self.bitid=="0##2048":
-            self.strand="-"
-        elif self.bitid=="256##2304":
-            self.strand="-"
-        elif self.bitid=="153##2201":
-            self.strand="-"
+        if self.bitid == "83##163##2129":
+            self.strand = "+"
+        elif self.bitid == "339##419##2385":
+            self.strand = "+"
+        elif self.bitid == "83##163##2209":
+            self.strand = "+"
+        elif self.bitid == "339##419##2465":
+            self.strand = "+"
+        elif self.bitid == "99##147##2193":
+            self.strand = "-"
+        elif self.bitid == "355##403##2449":
+            self.strand = "-"
+        elif self.bitid == "99##147##2145":
+            self.strand = "-"
+        elif self.bitid == "355##403##2401":
+            self.strand = "-"
+        elif self.bitid == "16##2064":
+            self.strand = "+"
+        elif self.bitid == "272##2320":
+            self.strand = "+"
+        elif self.bitid == "0##2048":
+            self.strand = "-"
+        elif self.bitid == "256##2304":
+            self.strand = "-"
+        elif self.bitid == "153##2201":
+            self.strand = "-"
         else:
-            self.strand="U"
+            self.strand = "U"
 
-    def validate_BSJ_read(self,junctions):
+    def validate_BSJ_read(self, junctions):
         """
         Checks if read is truly a BSJ originitor.
         * Defines left, right and middle alignments
         * Left and right alignments should not overlap
         * Middle alignment should be between left and right alignments
         """
-        if len(self.bitid.split("##"))==3:
-            left=-1
-            right=-1
-            middle=-1
-            if self.bitid=="83##163##2129":
-                left=2129
-                right=83
-                middle=163
-            if self.bitid=="339##419##2385":
-                left=2385
-                right=339
-                middle=419				
-            if self.bitid=="83##163##2209":
-                left=163
-                right=2209
-                middle=83
-            if self.bitid=="339##419##2465":
-                left=419
-                right=2465
-                middle=339
-            if self.bitid=="99##147##2145":
-                left=99
-                right=2145
-                middle=147
-            if self.bitid=="355##403##2401":
-                left=355
-                right=2401
-                middle=403
-            if self.bitid=="99##147##2193":
-                left=2193
-                right=147
-                middle=99
-            if self.bitid=="355##403##2449":
-                left=2449
-                right=403
-                middle=355
+        if len(self.bitid.split("##")) == 3:
+            left = -1
+            right = -1
+            middle = -1
+            if self.bitid == "83##163##2129":
+                left = 2129
+                right = 83
+                middle = 163
+            if self.bitid == "339##419##2385":
+                left = 2385
+                right = 339
+                middle = 419
+            if self.bitid == "83##163##2209":
+                left = 163
+                right = 2209
+                middle = 83
+            if self.bitid == "339##419##2465":
+                left = 419
+                right = 2465
+                middle = 339
+            if self.bitid == "99##147##2145":
+                left = 99
+                right = 2145
+                middle = 147
+            if self.bitid == "355##403##2401":
+                left = 355
+                right = 2401
+                middle = 403
+            if self.bitid == "99##147##2193":
+                left = 2193
+                right = 147
+                middle = 99
+            if self.bitid == "355##403##2449":
+                left = 2449
+                right = 403
+                middle = 355
             # print(left,right,middle)
             if left == -1 or right == -1 or middle == -1:
                 return False
@@ -253,46 +264,46 @@ def validate_BSJ_read(self,junctions):
             # print("validate_BSJ_read",self.readid,self.refcoordinates[middle][0],self.refcoordinates[middle][-1])
             leftmost = str(self.refcoordinates[left][0])
             rightmost = str(self.refcoordinates[right][-1])
-            possiblejid = chrom+"##"+leftmost+"##"+rightmost
+            possiblejid = chrom + "##" + leftmost + "##" + rightmost
             # print("validate_BSJ_read",self.readid,possiblejid)
             if possiblejid in junctions:
                 self.start = leftmost
-                self.end = str(int(rightmost) + 1)    # this will be added to the BED file
+                self.end = str(int(rightmost) + 1)  # this will be added to the BED file
                 return True
         else:
             return False
-            
-    
-    
+
     def get_bsjid(self):
-        t=[]
+        t = []
         t.append(self.refname)
         t.append(self.start)
         t.append(self.end)
         t.append(self.strand)
         return "##".join(t)
-    
-    def write_out_reads(self,outbam):
+
+    def write_out_reads(self, outbam):
         for r in self.alignments:
             outbam.write(r)
-        
-            
+
+
 def get_uniq_readid(r):
-    rname=r.query_name
-    hi=r.get_tag("HI")
-    rid=rname+"##"+str(hi)
+    rname = r.query_name
+    hi = r.get_tag("HI")
+    rid = rname + "##" + str(hi)
     return rid
 
+
 def get_bitflag(r):
-    bitflag=str(r).split("\t")[1]
+    bitflag = str(r).split("\t")[1]
     return int(bitflag)
 
+
 def _bsjid2jid(bsjid):
-    x=bsjid.split("##")
-    chrom=x[0]
-    start=x[1]
-    end=str(int(x[2])-1)
-    return "##".join([chrom,start,end])
+    x = bsjid.split("##")
+    chrom = x[0]
+    start = x[1]
+    end = str(int(x[2]) - 1)
+    return "##".join([chrom, start, end])
 
 
 def main():
@@ -304,153 +315,279 @@ def main():
         where the chrom, start and end represent the BSJ the read is depicting.
         """
     )
-    parser.add_argument("-i","--inbam",dest="inbam",required=True,type=str,
-        help="Input NON-Chimeric-only STAR2p BAM file")
-    parser.add_argument('-t','--sample_counts_table', dest='countstable', type=str, required=True,
-                    help='circExplore per-sample counts table')	# get coordinates of the circRNA
-    parser.add_argument("-s",'--sample_name', dest='samplename', type=str, required=False, default = 'sample1',
-        help='Sample Name: SM for RG')
-    parser.add_argument("-l",'--library', dest='library', type=str, required=False, default = 'lib1',
-        help='Sample Name: LB for RG')
-    parser.add_argument("-f",'--platform', dest='platform', type=str, required=False, default = 'illumina',
-        help='Sample Name: PL for RG')
-    parser.add_argument("-u",'--unit', dest='unit', type=str, required=False, default = 'unit1',
-        help='Sample Name: PU for RG')
-    parser.add_argument("-o","--outbam",dest="outbam",required=True,type=argparse.FileType('w'),
-        help="Output bam file ... both strands")
-    parser.add_argument("-p","--plusbam",dest="plusbam",required=True,type=argparse.FileType('w'),
-        help="Output plus strand bam file")
-    parser.add_argument("-m","--minusbam",dest="minusbam",required=True,type=argparse.FileType('w'),
-        help="Output plus strand bam file")
-    parser.add_argument("-b","--bed",dest="bed",required=True,type=argparse.FileType('w', encoding='UTF-8'),
-        help="Output BSJ bed file (with strand info)")
-    parser.add_argument("-j","--junctionsfound",dest="junctionsfound",required=True,type=argparse.FileType('w', encoding='UTF-8'),
-        help="Output TSV file with counts of junctions expected vs found")
-    parser.add_argument('--regions', dest='regions', type=str, required=True,
-        help='regions file eg. ref.fa.regions')
-    parser.add_argument('--host', dest='host', type=str, required=True,
-        help='host name eg.hg38... single value...host_filter_min/host_filter_max filters are applied to this region only')
-    parser.add_argument('--additives', dest='additives', type=str, required=True,
-        help='additive name(s) eg.ERCC... comma-separated list... all BSJs in this region are filtered out')
-    parser.add_argument('--viruses', dest='viruses', type=str, required=True,
-        help='virus name(s) eg.NC_009333.1... comma-separated list...virus_filter_min/virus_filter_max filters are applied to this region only')
-    args = parser.parse_args()		
+    parser.add_argument(
+        "-i",
+        "--inbam",
+        dest="inbam",
+        required=True,
+        type=str,
+        help="Input NON-Chimeric-only STAR2p BAM file",
+    )
+    parser.add_argument(
+        "-t",
+        "--sample_counts_table",
+        dest="countstable",
+        type=str,
+        required=True,
+        help="circExplore per-sample counts table",
+    )  # get coordinates of the circRNA
+    parser.add_argument(
+        "-s",
+        "--sample_name",
+        dest="samplename",
+        type=str,
+        required=False,
+        default="sample1",
+        help="Sample Name: SM for RG",
+    )
+    parser.add_argument(
+        "-l",
+        "--library",
+        dest="library",
+        type=str,
+        required=False,
+        default="lib1",
+        help="Sample Name: LB for RG",
+    )
+    parser.add_argument(
+        "-f",
+        "--platform",
+        dest="platform",
+        type=str,
+        required=False,
+        default="illumina",
+        help="Sample Name: PL for RG",
+    )
+    parser.add_argument(
+        "-u",
+        "--unit",
+        dest="unit",
+        type=str,
+        required=False,
+        default="unit1",
+        help="Sample Name: PU for RG",
+    )
+    parser.add_argument(
+        "-o",
+        "--outbam",
+        dest="outbam",
+        required=True,
+        type=argparse.FileType("w"),
+        help="Output bam file ... both strands",
+    )
+    parser.add_argument(
+        "-p",
+        "--plusbam",
+        dest="plusbam",
+        required=True,
+        type=argparse.FileType("w"),
+        help="Output plus strand bam file",
+    )
+    parser.add_argument(
+        "-m",
+        "--minusbam",
+        dest="minusbam",
+        required=True,
+        type=argparse.FileType("w"),
+        help="Output plus strand bam file",
+    )
+    parser.add_argument(
+        "-b",
+        "--bed",
+        dest="bed",
+        required=True,
+        type=argparse.FileType("w", encoding="UTF-8"),
+        help="Output BSJ bed file (with strand info)",
+    )
+    parser.add_argument(
+        "-j",
+        "--junctionsfound",
+        dest="junctionsfound",
+        required=True,
+        type=argparse.FileType("w", encoding="UTF-8"),
+        help="Output TSV file with counts of junctions expected vs found",
+    )
+    parser.add_argument(
+        "--regions",
+        dest="regions",
+        type=str,
+        required=True,
+        help="regions file eg. ref.fa.regions",
+    )
+    parser.add_argument(
+        "--host",
+        dest="host",
+        type=str,
+        required=True,
+        help="host name eg.hg38... single value...host_filter_min/host_filter_max filters are applied to this region only",
+    )
+    parser.add_argument(
+        "--additives",
+        dest="additives",
+        type=str,
+        required=True,
+        help="additive name(s) eg.ERCC... comma-separated list... all BSJs in this region are filtered out",
+    )
+    parser.add_argument(
+        "--viruses",
+        dest="viruses",
+        type=str,
+        required=True,
+        help="virus name(s) eg.NC_009333.1... comma-separated list...virus_filter_min/virus_filter_max filters are applied to this region only",
+    )
+    args = parser.parse_args()
     samfile = pysam.AlignmentFile(args.inbam, "rb")
     samheader = samfile.header.to_dict()
-    samheader['RG']=list()
-# 	bsjfile = open(args.bed,"w")
-    junctionsfile = open(args.countstable,'r')
-    junctions=dict()
-    junction_chroms=set()
+    samheader["RG"] = list()
+    # 	bsjfile = open(args.bed,"w")
+    junctionsfile = open(args.countstable, "r")
+    junctions = dict()
+    junction_chroms = set()
     print("Reading...junctions!...")
     for l in junctionsfile.readlines():
-        if "read_count" in l: continue
+        if "read_count" in l:
+            continue
         l = l.strip().split("\t")
         chrom = l[0]
         junction_chroms.add(chrom)
         start = l[1]
-        end = str(int(l[2])-1)
-        jid = chrom+"##"+start+"##"+end                     # create a unique junction ID for each line in the BSJ junction file and make it the dict key ... easy for searching!
-        samheader['RG'].append({'ID':jid, 'LB':args.library, 'PL':args.platform, 'PU':args.unit,'SM':args.samplename})
-        junctions[jid] = JUNCTION(jid,chrom=chrom,start=start,end=end)
+        end = str(int(l[2]) - 1)
+        jid = (
+            chrom + "##" + start + "##" + end
+        )  # create a unique junction ID for each line in the BSJ junction file and make it the dict key ... easy for searching!
+        samheader["RG"].append(
+            {
+                "ID": jid,
+                "LB": args.library,
+                "PL": args.platform,
+                "PU": args.unit,
+                "SM": args.samplename,
+            }
+        )
+        junctions[jid] = JUNCTION(jid, chrom=chrom, start=start, end=end)
     junctionsfile.close()
     sequences = set()
-    for v in samheader['SQ']:
-        sequences.add(v['SN'])
+    for v in samheader["SQ"]:
+        sequences.add(v["SN"])
     # pp.pprint(junctions)
     # print(sequences)
     if not junction_chroms.issubset(sequences):
-        print("Junction file has junction on chromosome which are NOT part of the supplied BAM file!!!")
+        print(
+            "Junction file has junction on chromosome which are NOT part of the supplied BAM file!!!"
+        )
         exit()
 
-    print("Done reading %d junctions."%(len(junctions)))
+    print("Done reading %d junctions." % (len(junctions)))
     print("Reading...regions file!...")
     host_virus_sequences = set()
-    regions = read_regions(regionsfile=args.regions,host=args.host,additives=args.additives,viruses=args.viruses)
+    regions = read_regions(
+        regionsfile=args.regions,
+        host=args.host,
+        additives=args.additives,
+        viruses=args.viruses,
+    )
     for s in sequences:
-        hav = _get_host_additive_virus(regions,s)
-        if hav == "host": host_virus_sequences.add(s)
-        if hav == "virus": host_virus_sequences.add(s)
+        hav = _get_host_additive_virus(regions, s)
+        if hav == "host":
+            host_virus_sequences.add(s)
+        if hav == "virus":
+            host_virus_sequences.add(s)
     # print(host_virus_sequences)
     host_virus_sequences = host_virus_sequences.intersection(junction_chroms)
     # print(host_virus_sequences)
-    rid2jid=dict()
-    jid2rid=dict()
-    for jid,junc in junctions.items():
+    rid2jid = dict()
+    jid2rid = dict()
+    for jid, junc in junctions.items():
         # print(jid)
-        for read in samfile.fetch(junc.chrom,junc.start-2,junc.end+2):
-            if read.reference_id != read.next_reference_id: continue    # only works for PE ... for SE read.next_reference_id is -1
-            if ( not read.is_proper_pair ) or read.is_secondary or read.is_supplementary or read.is_unmapped : continue
-            rid=get_uniq_readid(read)
-            rid2jid[rid]=jid
-            if not jid in jid2rid: jid2rid[jid]=set()
+        for read in samfile.fetch(junc.chrom, junc.start - 2, junc.end + 2):
+            if read.reference_id != read.next_reference_id:
+                continue  # only works for PE ... for SE read.next_reference_id is -1
+            if (
+                (not read.is_proper_pair)
+                or read.is_secondary
+                or read.is_supplementary
+                or read.is_unmapped
+            ):
+                continue
+            rid = get_uniq_readid(read)
+            rid2jid[rid] = jid
+            if not jid in jid2rid:
+                jid2rid[jid] = set()
             jid2rid[jid].add(rid)
         samfile.reset()
-    
-    outfile = pysam.AlignmentFile(args.outbam, "wb", header = samheader)
+
+    outfile = pysam.AlignmentFile(args.outbam, "wb", header=samheader)
     for read in samfile.fetch():
-        rid=get_uniq_readid(read)
+        rid = get_uniq_readid(read)
         if rid in rid2jid:
             read.set_tag("RG", jid, value_type="Z")
             outbam.write(read)
     outbam.close()
     samfile.close()
     args.junctionsfound.write("#chrom\tstart\tend\tfound_linear_reads\n")
-    for jid,junc in junctions.items():
-        args.junctionsfound.write("%s\t%d\t%d\t%d\n"%(junc.chrom,junc.start,junc.end,len(jid2rid[jid])))
+    for jid, junc in junctions.items():
+        args.junctionsfound.write(
+            "%s\t%d\t%d\t%d\n" % (junc.chrom, junc.start, junc.end, len(jid2rid[jid]))
+        )
     args.junctionsfound.close()
     exit()
 
+    #     # print("rid",rid)
+    #     # print("junctions[jid].rids",junctions[jid].rids)
+    #     # print("junctions[jid].refcoords:")
+    #     # pp.pprint(junctions[jid].refcoords)
+    #     junctions[jid].append_rid_refcoords(rid,read.get_reference_positions())
+    #     print("junctions[jid].rids",junctions[jid].rids)
+    #     print("junctions[jid].refcoords:")
+    #     pp.pprint(junctions[jid].refcoords)
+    # for rid in junctions[jid].rids:
+    #     print(rid)
+    #     # if junc.start in junctions[jid].refcoords[rid] and junc.end in junctions[jid].refcoords[rid]:
+    #     if junc.start in junctions[jid].refcoords[rid] or junc.end in junctions[jid].refcoords[rid]:
+    #         junctions[jid].append_keeprid(rid)
+    # print(len(junctions[jid].rids))
+    # print(len(junctions[jid].keeprids))
+    # print(junctions[jid].keeprids)
+    # exit()
 
-    
-        #     # print("rid",rid)
-        #     # print("junctions[jid].rids",junctions[jid].rids)
-        #     # print("junctions[jid].refcoords:")
-        #     # pp.pprint(junctions[jid].refcoords)
-        #     junctions[jid].append_rid_refcoords(rid,read.get_reference_positions())
-        #     print("junctions[jid].rids",junctions[jid].rids)
-        #     print("junctions[jid].refcoords:")
-        #     pp.pprint(junctions[jid].refcoords)
-        # for rid in junctions[jid].rids:
-        #     print(rid)
-        #     # if junc.start in junctions[jid].refcoords[rid] and junc.end in junctions[jid].refcoords[rid]:
-        #     if junc.start in junctions[jid].refcoords[rid] or junc.end in junctions[jid].refcoords[rid]:
-        #         junctions[jid].append_keeprid(rid)
-        # print(len(junctions[jid].rids))
-        # print(len(junctions[jid].keeprids))
-        # print(junctions[jid].keeprids)
-        # exit()
-
-
-
-
-
-
-
-    bigdict=dict()
+    bigdict = dict()
     # print("Opening...")
     # print(args.inbam)
     print("Reading...alignments!...")
-    count=0
-    count2=0
+    count = 0
+    count2 = 0
     for read in samfile.fetch():
-        count+=1
-        if debug: print(read,read.reference_id,read.next_reference_id)    
-        if read.reference_id != read.next_reference_id: continue    # only works for PE ... for SE read.next_reference_id is -1
-        count2+=1
-        rid=get_uniq_readid(read)                           # add the HI number to the readid
-        if debug:print(rid)
+        count += 1
+        if debug:
+            print(read, read.reference_id, read.next_reference_id)
+        if read.reference_id != read.next_reference_id:
+            continue  # only works for PE ... for SE read.next_reference_id is -1
+        count2 += 1
+        rid = get_uniq_readid(read)  # add the HI number to the readid
+        if debug:
+            print(rid)
         if not rid in bigdict:
-            bigdict[rid]=Readinfo(rid,read.reference_name)
+            bigdict[rid] = Readinfo(rid, read.reference_name)
         # bigdict[rid].append_alignment(read)                 # since rid has HI number included ... this separates alignment by HI
-        bitflag=get_bitflag(read)
-        if debug:print(bitflag)
-        bigdict[rid].append_bitflag(bitflag)                # each rid can have upto 3 lines in the BAM with each having its own bitflag ... collect all bigflags in a list here 
-        refpos=list(filter(lambda x:x!=None,read.get_reference_positions(full_length=True)))
-        bigdict[rid].set_refcoordinates(bitflag,refpos)     # maintain a list of reference coordinated that are "aligned" for each bitflag in each rid alignment
+        bitflag = get_bitflag(read)
+        if debug:
+            print(bitflag)
+        bigdict[rid].append_bitflag(
+            bitflag
+        )  # each rid can have upto 3 lines in the BAM with each having its own bitflag ... collect all bigflags in a list here
+        refpos = list(
+            filter(lambda x: x != None, read.get_reference_positions(full_length=True))
+        )
+        bigdict[rid].set_refcoordinates(
+            bitflag, refpos
+        )  # maintain a list of reference coordinated that are "aligned" for each bitflag in each rid alignment
         # bigdict[rid].set_read1_reverse_secondary_supplementary(bitflag,read)
-        if debug:print(bigdict[rid])
-    print("Done reading %d chimeric alignments. [%d same chrom chimeras]"%(count,count2))
+        if debug:
+            print(bigdict[rid])
+    print(
+        "Done reading %d chimeric alignments. [%d same chrom chimeras]"
+        % (count, count2)
+    )
     # samfile.close()
     # print("Closed")
     # print("Reopening")
@@ -460,49 +597,57 @@ def main():
     samfile.reset()
     print("Writing BAMs")
     print("Re-Reading...alignments!...")
-    plusfile = pysam.AlignmentFile(args.plusbam, "wb", header = samheader)
-    minusfile = pysam.AlignmentFile(args.minusbam, "wb", header = samheader)
-    outfile = pysam.AlignmentFile(args.outbam, "wb", header = samheader)
-    bsjdict=dict()
-    bitid_counts=dict()
+    plusfile = pysam.AlignmentFile(args.plusbam, "wb", header=samheader)
+    minusfile = pysam.AlignmentFile(args.minusbam, "wb", header=samheader)
+    outfile = pysam.AlignmentFile(args.outbam, "wb", header=samheader)
+    bsjdict = dict()
+    bitid_counts = dict()
     for read in samfile.fetch():
-        if read.reference_id != read.next_reference_id: continue
-        rid=get_uniq_readid(read)
+        if read.reference_id != read.next_reference_id:
+            continue
+        rid = get_uniq_readid(read)
         if rid in bigdict:
-            bigdict[rid].generate_bitid()                       # separate all bitflags for the same rid with ## and create a unique single bitflag ... bitflags are pre-sorted
-            if debug:print(bigdict[rid])                        
-            bigdict[rid].get_strand()                           # use the unique aggregated bitid to extract the strand information ... all possible cases are explicitly covered
-            if not bigdict[rid].validate_BSJ_read(junctions=junctions): # ensure that the read alignments leftmost and rightmost coordinates match with one of the BSJ junctions... if yes then that rid represents a BSJ. Also add start and end to the BSJ object
+            bigdict[
+                rid
+            ].generate_bitid()  # separate all bitflags for the same rid with ## and create a unique single bitflag ... bitflags are pre-sorted
+            if debug:
+                print(bigdict[rid])
+            bigdict[
+                rid
+            ].get_strand()  # use the unique aggregated bitid to extract the strand information ... all possible cases are explicitly covered
+            if not bigdict[rid].validate_BSJ_read(
+                junctions=junctions
+            ):  # ensure that the read alignments leftmost and rightmost coordinates match with one of the BSJ junctions... if yes then that rid represents a BSJ. Also add start and end to the BSJ object
                 continue
             # bigdict[rid].get_start_end()
             # print(bigdict[rid])
-            bsjid=bigdict[rid].get_bsjid()
-            jid=_bsjid2jid(bsjid)
+            bsjid = bigdict[rid].get_bsjid()
+            jid = _bsjid2jid(bsjid)
             read.set_tag("RG", jid, value_type="Z")
-            if bigdict[rid].strand=="+":
+            if bigdict[rid].strand == "+":
                 plusfile.write(read)
-            if bigdict[rid].strand=="-":
+            if bigdict[rid].strand == "-":
                 minusfile.write(read)
             outfile.write(read)
             if not bsjid in bsjdict:
-                bsjdict[bsjid]=BSJ()
+                bsjdict[bsjid] = BSJ()
                 bsjdict[bsjid].set_chrom(bigdict[rid].refname)
                 bsjdict[bsjid].set_start(bigdict[rid].start)
                 bsjdict[bsjid].set_end(bigdict[rid].end)
                 bsjdict[bsjid].set_strand(bigdict[rid].strand)
             bsjdict[bsjid].append_bitid(bigdict[rid].bitid)
             if not bigdict[rid].bitid in bitid_counts:
-                bitid_counts[bigdict[rid].bitid]=0
-            bitid_counts[bigdict[rid].bitid]+=1
+                bitid_counts[bigdict[rid].bitid] = 0
+            bitid_counts[bigdict[rid].bitid] += 1
             bsjdict[bsjid].append_rid(rid)
-    print("Done!")	
+    print("Done!")
     for b in bitid_counts.keys():
-        print(b,bitid_counts[b])
+        print(b, bitid_counts[b])
     print("Writing BED")
     for bsjid in bsjdict.keys():
         bsjdict[bsjid].update_score_and_found_count(junctions_found)
         bsjdict[bsjid].write_out_BSJ(args.bed)
-        
+
     plusfile.close()
     minusfile.close()
     samfile.close()
@@ -510,16 +655,17 @@ def main():
     args.bed.close()
     args.junctionsfound.write("#chrom\tstart\tend\texpected_counts\tfound_counts\n")
     for jid in junctions.keys():
-        x=jid.split("##")
-        chrom=x[0]
-        start=int(x[1])
-        end=int(x[2])+1
-        args.junctionsfound.write("%s\t%d\t%d\t%d\t%d\n"%(chrom,start,end,junctions[jid],junctions_found[jid]))
+        x = jid.split("##")
+        chrom = x[0]
+        start = int(x[1])
+        end = int(x[2]) + 1
+        args.junctionsfound.write(
+            "%s\t%d\t%d\t%d\t%d\n"
+            % (chrom, start, end, junctions[jid], junctions_found[jid])
+        )
     args.junctionsfound.close()
     print("ALL Done!")
-            
+
 
 if __name__ == "__main__":
     main()
-
-
diff --git a/workflow/scripts/create_circExplorer_per_sample_counts_table.py b/workflow/scripts/create_circExplorer_per_sample_counts_table.py
index a60e321..50c04e9 100755
--- a/workflow/scripts/create_circExplorer_per_sample_counts_table.py
+++ b/workflow/scripts/create_circExplorer_per_sample_counts_table.py
@@ -16,41 +16,60 @@
 # 11      linear_spliced_BSJ_reads_opposite_strand
 
 
-def _df_setcol_as_int(df,collist):
+def _df_setcol_as_int(df, collist):
     for c in collist:
-        df[[c]]=df[[c]].astype(int)
+        df[[c]] = df[[c]].astype(int)
     return df
 
-def _df_setcol_as_str(df,collist):
+
+def _df_setcol_as_str(df, collist):
     for c in collist:
-        df[[c]]=df[[c]].astype(str)
+        df[[c]] = df[[c]].astype(str)
     return df
 
+
 def main():
     # debug = True
     debug = False
-    parser = argparse.ArgumentParser(
+    parser = argparse.ArgumentParser()
+    parser.add_argument(
+        "--annotationcounts",
+        dest="annotationcounts",
+        required=True,
+        type=str,
+        help="annotated_counts.tsv counts file",
+    )
+    parser.add_argument(
+        "--allfoundcounts",
+        dest="allfoundcounts",
+        required=True,
+        type=str,
+        help="readcounts.tsv",
+    )
+    parser.add_argument(
+        "--countstable",
+        dest="mergedcounts",
+        required=True,
+        type=str,
+        help="merged counts_table.tsv file",
     )
-    parser.add_argument("--annotationcounts",dest="annotationcounts",required=True,type=str,
-        help="annotated_counts.tsv counts file")
-    parser.add_argument("--allfoundcounts",dest="allfoundcounts",required=True,type=str,
-        help="readcounts.tsv")
-    parser.add_argument("--countstable",dest="mergedcounts",required=True,type=str,
-        help="merged counts_table.tsv file")
     args = parser.parse_args()
 
-    bcounts = pandas.read_csv(args.annotationcounts,header=0,sep="\t")
-    lcounts = pandas.read_csv(args.allfoundcounts,header=0,sep="\t")
+    bcounts = pandas.read_csv(args.annotationcounts, header=0, sep="\t")
+    lcounts = pandas.read_csv(args.allfoundcounts, header=0, sep="\t")
     # print(bcounts.head())
     # print(lcounts.head())
-    mcounts = bcounts.merge(lcounts,how='outer',on=["#chrom","start","end","strand"])
-    mcounts.fillna(value=0,inplace=True)
-    strcols = [ '#chrom', 'strand', 'known_novel' ]
-    intcols = list ( set(mcounts.columns) - set(strcols) )
-    mcounts = _df_setcol_as_str(mcounts,strcols)
-    mcounts = _df_setcol_as_int(mcounts,intcols)
-    mcounts.drop(["read_count"],axis=1,inplace=True)
-    mcounts.to_csv(args.mergedcounts,index=False,doublequote=False,sep="\t")
+    mcounts = bcounts.merge(
+        lcounts, how="outer", on=["#chrom", "start", "end", "strand"]
+    )
+    mcounts.fillna(value=0, inplace=True)
+    strcols = ["#chrom", "strand", "known_novel"]
+    intcols = list(set(mcounts.columns) - set(strcols))
+    mcounts = _df_setcol_as_str(mcounts, strcols)
+    mcounts = _df_setcol_as_int(mcounts, intcols)
+    mcounts.drop(["read_count"], axis=1, inplace=True)
+    mcounts.to_csv(args.mergedcounts, index=False, doublequote=False, sep="\t")
+
 
 if __name__ == "__main__":
-    main()
\ No newline at end of file
+    main()
diff --git a/workflow/scripts/create_dcc_per_sample_counts_table.py b/workflow/scripts/create_dcc_per_sample_counts_table.py
index b6af479..e791a33 100755
--- a/workflow/scripts/create_dcc_per_sample_counts_table.py
+++ b/workflow/scripts/create_dcc_per_sample_counts_table.py
@@ -1,21 +1,33 @@
 import argparse
 import pandas
 
-parser = argparse.ArgumentParser(description='Merge information from CircCoordinates and CircRNACount files generated by DCC')
-parser.add_argument('--CircCoordinates', dest='CircCoordinates', type=str, required=True,
-                    help='CircCoordinates file from DCC')
-parser.add_argument('--CircRNALinearCount', dest='CircRNACount', type=str, required=True,
-                    help='CircRNACount + LinearCount output file from DCC')
+parser = argparse.ArgumentParser(
+    description="Merge information from CircCoordinates and CircRNACount files generated by DCC"
+)
+parser.add_argument(
+    "--CircCoordinates",
+    dest="CircCoordinates",
+    type=str,
+    required=True,
+    help="CircCoordinates file from DCC",
+)
+parser.add_argument(
+    "--CircRNALinearCount",
+    dest="CircRNACount",
+    type=str,
+    required=True,
+    help="CircRNACount + LinearCount output file from DCC",
+)
 # parser.add_argument('--samplename', dest='samplename', type=str, required=True,
 #                     help='Sample Name')
-parser.add_argument('-o',dest='outfile',required=True,help='merged table')
+parser.add_argument("-o", dest="outfile", required=True, help="merged table")
 args = parser.parse_args()
 
 # sn=args.samplename
 
 # load files
-CircCoordinates=pandas.read_csv(args.CircCoordinates,sep="\t",header=0)
-CircRNACount=pandas.read_csv(args.CircRNACount,sep="\t",header=0)
+CircCoordinates = pandas.read_csv(args.CircCoordinates, sep="\t", header=0)
+CircRNACount = pandas.read_csv(args.CircRNACount, sep="\t", header=0)
 
 # CircRNACount columns are:
 # | # | ColName                        |
@@ -53,37 +65,85 @@
 # | 7 | Start-End     |
 # | 8 | OverallRegion |
 
-old_names = CircCoordinates.columns 
-new_names = ['chr', 'start', 'end', 'gene', 'junction_type', 'strand2', 'start_end_region', 'overall_region']
+old_names = CircCoordinates.columns
+new_names = [
+    "chr",
+    "start",
+    "end",
+    "gene",
+    "junction_type",
+    "strand2",
+    "start_end_region",
+    "overall_region",
+]
 CircCoordinates.rename(columns=dict(zip(old_names, new_names)), inplace=True)
-CircCoordinates[['junction_type']]=CircCoordinates[['junction_type']].astype(str)
-CircCoordinates.loc[CircCoordinates['junction_type']=="0",'junction_type']="Non-canonical"
-CircCoordinates.loc[CircCoordinates['junction_type']=="1",'junction_type']="GT/AG"
-CircCoordinates.loc[CircCoordinates['junction_type']=="2",'junction_type']="CT/AC"
-CircCoordinates.loc[CircCoordinates['junction_type']=="3",'junction_type']="GC/AG"
-CircCoordinates.loc[CircCoordinates['junction_type']=="4",'junction_type']="CT/GC"
-CircCoordinates.loc[CircCoordinates['junction_type']=="5",'junction_type']="AT/AC"
-CircCoordinates.loc[CircCoordinates['junction_type']=="6",'junction_type']="GT/AT"
+CircCoordinates[["junction_type"]] = CircCoordinates[["junction_type"]].astype(str)
+CircCoordinates.loc[
+    CircCoordinates["junction_type"] == "0", "junction_type"
+] = "Non-canonical"
+CircCoordinates.loc[CircCoordinates["junction_type"] == "1", "junction_type"] = "GT/AG"
+CircCoordinates.loc[CircCoordinates["junction_type"] == "2", "junction_type"] = "CT/AC"
+CircCoordinates.loc[CircCoordinates["junction_type"] == "3", "junction_type"] = "GC/AG"
+CircCoordinates.loc[CircCoordinates["junction_type"] == "4", "junction_type"] = "CT/GC"
+CircCoordinates.loc[CircCoordinates["junction_type"] == "5", "junction_type"] = "AT/AC"
+CircCoordinates.loc[CircCoordinates["junction_type"] == "6", "junction_type"] = "GT/AT"
 # strand is flipped in CircCoordinates file ... flipping it back
-CircCoordinates['strand']="."
-CircCoordinates.loc[CircCoordinates['strand2']=="-",'strand']="+"
-CircCoordinates.loc[CircCoordinates['strand2']=="+",'strand']="-"
+CircCoordinates["strand"] = "."
+CircCoordinates.loc[CircCoordinates["strand2"] == "-", "strand"] = "+"
+CircCoordinates.loc[CircCoordinates["strand2"] == "+", "strand"] = "-"
 
-CircCoordinates['dcc_annotation']=CircCoordinates['gene'].astype(str)+"##"+CircCoordinates['junction_type'].astype(str)+"##"+CircCoordinates['start_end_region'].astype(str)
+CircCoordinates["dcc_annotation"] = (
+    CircCoordinates["gene"].astype(str)
+    + "##"
+    + CircCoordinates["junction_type"].astype(str)
+    + "##"
+    + CircCoordinates["start_end_region"].astype(str)
+)
 
-CircCoordinates['circRNA_id']=CircCoordinates['chr'].astype(str)+"##"+CircCoordinates['start'].astype(str)+"##"+CircCoordinates['end'].astype(str)+"##"+CircCoordinates['strand'].astype(str)
-CircCoordinates.drop(['chr', 'start', 'end', 'strand', 'strand2', 'gene','junction_type','start_end_region','overall_region'],axis=1,inplace=True)
-CircCoordinates.set_index(['circRNA_id'],inplace=True)
+CircCoordinates["circRNA_id"] = (
+    CircCoordinates["chr"].astype(str)
+    + "##"
+    + CircCoordinates["start"].astype(str)
+    + "##"
+    + CircCoordinates["end"].astype(str)
+    + "##"
+    + CircCoordinates["strand"].astype(str)
+)
+CircCoordinates.drop(
+    [
+        "chr",
+        "start",
+        "end",
+        "strand",
+        "strand2",
+        "gene",
+        "junction_type",
+        "start_end_region",
+        "overall_region",
+    ],
+    axis=1,
+    inplace=True,
+)
+CircCoordinates.set_index(["circRNA_id"], inplace=True)
 # CircCoordinates.to_csv("tmp",sep="\t",header=True,index=True)
 
-old_names = CircRNACount.columns 
-new_names = ['chr', 'start', 'end', 'strand', 'read_count', 'linear_read_count']
+old_names = CircRNACount.columns
+new_names = ["chr", "start", "end", "strand", "read_count", "linear_read_count"]
 CircRNACount.rename(columns=dict(zip(old_names, new_names)), inplace=True)
-CircRNACount['circRNA_id']=CircRNACount['chr'].astype(str)+"##"+CircRNACount['start'].astype(str)+"##"+CircRNACount['end'].astype(str)+"##"+CircRNACount['strand'].astype(str)
-CircRNACount.set_index(['circRNA_id'],inplace=True)
+CircRNACount["circRNA_id"] = (
+    CircRNACount["chr"].astype(str)
+    + "##"
+    + CircRNACount["start"].astype(str)
+    + "##"
+    + CircRNACount["end"].astype(str)
+    + "##"
+    + CircRNACount["strand"].astype(str)
+)
+CircRNACount.set_index(["circRNA_id"], inplace=True)
 # CircRNACount.to_csv("tmp2",sep="\t",header=True,index=True)
 
-CircRNACount=CircRNACount.merge(CircCoordinates,left_index=True,right_index=True,how="left",sort=False)
-CircRNACount.fillna("0",inplace=True)
-CircRNACount.to_csv(args.outfile,sep="\t",header=True,index=False)
-
+CircRNACount = CircRNACount.merge(
+    CircCoordinates, left_index=True, right_index=True, how="left", sort=False
+)
+CircRNACount.fillna("0", inplace=True)
+CircRNACount.to_csv(args.outfile, sep="\t", header=True, index=False)
diff --git a/workflow/scripts/create_mapsplice_per_sample_counts_table.py b/workflow/scripts/create_mapsplice_per_sample_counts_table.py
index 0af0b79..487b7da 100755
--- a/workflow/scripts/create_mapsplice_per_sample_counts_table.py
+++ b/workflow/scripts/create_mapsplice_per_sample_counts_table.py
@@ -3,29 +3,87 @@
 
 # pandas.options.mode.chained_assignment = None
 
-parser = argparse.ArgumentParser(description='Create per sample Counts Table from MapSplice Outputs')
-parser.add_argument('--circularRNAstxt', dest='circularRNAstxt', type=str, required=True,
-                    help='circular_RNAs.txt file from MapSplice')
-parser.add_argument('--back_spliced_min_reads', dest='back_spliced_min_reads', type=int, required=True,
-                    help='back_spliced minimum read threshold')
-parser.add_argument('--host', dest='host', type=str, required=True,
-                    help='host name eg.hg38... single value...host_filter_min/host_filter_max filters are applied to this region only')
-parser.add_argument('--additives', dest='additives', type=str, required=True,
-                    help='additive name(s) eg.ERCC... comma-separated list... all BSJs in this region are filtered out')
-parser.add_argument('--viruses', dest='viruses', type=str, required=True,
-                    help='virus name(s) eg.NC_009333.1... comma-separated list...virus_filter_min/virus_filter_max filters are applied to this region only')
-parser.add_argument('--host_filter_min', dest='host_filter_min', type=int, required=False, default=150,
-                    help='min BSJ size filter for host')
-parser.add_argument('--virus_filter_min', dest='virus_filter_min', type=int, required=False, default=150,
-                    help='min BSJ size filter for virus')
-parser.add_argument('--host_filter_max', dest='host_filter_max', type=int, required=False, default=5000,
-                    help='max BSJ size filter for host')
-parser.add_argument('--virus_filter_max', dest='virus_filter_max', type=int, required=False, default=5000,
-                    help='max BSJ size filter for virus')
-parser.add_argument('--regions', dest='regions', type=str, required=True,
-                    help='regions file eg. ref.fa.regions')
-parser.add_argument('-o',dest='outfile',required=True,help='output table')
-parser.add_argument('-fo',dest='filteredoutfile',required=True,help='filtered output table')
+parser = argparse.ArgumentParser(
+    description="Create per sample Counts Table from MapSplice Outputs"
+)
+parser.add_argument(
+    "--circularRNAstxt",
+    dest="circularRNAstxt",
+    type=str,
+    required=True,
+    help="circular_RNAs.txt file from MapSplice",
+)
+parser.add_argument(
+    "--back_spliced_min_reads",
+    dest="back_spliced_min_reads",
+    type=int,
+    required=True,
+    help="back_spliced minimum read threshold",
+)
+parser.add_argument(
+    "--host",
+    dest="host",
+    type=str,
+    required=True,
+    help="host name eg.hg38... single value...host_filter_min/host_filter_max filters are applied to this region only",
+)
+parser.add_argument(
+    "--additives",
+    dest="additives",
+    type=str,
+    required=True,
+    help="additive name(s) eg.ERCC... comma-separated list... all BSJs in this region are filtered out",
+)
+parser.add_argument(
+    "--viruses",
+    dest="viruses",
+    type=str,
+    required=True,
+    help="virus name(s) eg.NC_009333.1... comma-separated list...virus_filter_min/virus_filter_max filters are applied to this region only",
+)
+parser.add_argument(
+    "--host_filter_min",
+    dest="host_filter_min",
+    type=int,
+    required=False,
+    default=150,
+    help="min BSJ size filter for host",
+)
+parser.add_argument(
+    "--virus_filter_min",
+    dest="virus_filter_min",
+    type=int,
+    required=False,
+    default=150,
+    help="min BSJ size filter for virus",
+)
+parser.add_argument(
+    "--host_filter_max",
+    dest="host_filter_max",
+    type=int,
+    required=False,
+    default=5000,
+    help="max BSJ size filter for host",
+)
+parser.add_argument(
+    "--virus_filter_max",
+    dest="virus_filter_max",
+    type=int,
+    required=False,
+    default=5000,
+    help="max BSJ size filter for virus",
+)
+parser.add_argument(
+    "--regions",
+    dest="regions",
+    type=str,
+    required=True,
+    help="regions file eg. ref.fa.regions",
+)
+parser.add_argument("-o", dest="outfile", required=True, help="output table")
+parser.add_argument(
+    "-fo", dest="filteredoutfile", required=True, help="filtered output table"
+)
 args = parser.parse_args()
 
 # sn=args.samplename
@@ -38,7 +96,7 @@
 regions["host"] = list()
 regions["additive"] = list()
 regions["virus"] = list()
-r = open(args.regions,'r')
+r = open(args.regions, "r")
 rlines = r.readlines()
 r.close()
 allseqs = list()
@@ -54,70 +112,214 @@
         host_additive_virus = "virus"
     regions[host_additive_virus].extend(seq)
     allseqs.extend(seq)
-regions["additive"] = list((set(allseqs)-set(regions["host"]))-set(regions["virus"]))
+regions["additive"] = list(
+    (set(allseqs) - set(regions["host"])) - set(regions["virus"])
+)
 
 # load files
-circularRNAstxt=pandas.read_csv(args.circularRNAstxt,sep="\t",header=None)
+circularRNAstxt = pandas.read_csv(args.circularRNAstxt, sep="\t", header=None)
 
 # file has no column lables ... add them
 # ref: https://github.com/Aufiero/circRNAprofiler/blob/master/R/importFilesPredictionTool.R
-circularRNAstxt.columns=["chrom", "donor_end", "acceptor_start", "id", "coverage", "strand", "rgb", "block_count", "block_size", "block_distance", "entropy", "flank_case", "flank_string", "min_mismatch", "max_mismatch", "ave_mismatch", "max_min_suffix", "max_min_prefix", "min_anchor_difference", "unique_read_count", "multi_read_count", "paired_read_count", "left_paired_read_count", "right_paired_read_count", "multiple_paired_read_count", "unique_paired_read_count", "single_read_count", "encompassing_read", "doner_start", "acceptor_end", "doner_iosforms", "acceptor_isoforms", "obsolete1", "obsolete2", "obsolete3", "obsolete4", "minimal_doner_isoform_length", "maximal_doner_isoform_length", "minimal_acceptor_isoform_length", "maximal_acceptor_isoform_length", "paired_reads_entropy", "mismatch_per_bp", "anchor_score", "max_doner_fragment", "max_acceptor_fragment", "max_cur_fragment", "min_cur_fragment", "ave_cur_fragment", "doner_encompass_unique", "doner_encompass_multiple", "acceptor_encompass_unique", "acceptor_encompass_multiple", "doner_match_to_normal", "acceptor_match_to_normal", "doner_seq", "acceptor_seq", "match_gene_strand", "annotated_type", "fusion_type", "gene_strand", "annotated_gene_donor", "annotated_gene_acceptor", "dummy"]
+circularRNAstxt.columns = [
+    "chrom",
+    "donor_end",
+    "acceptor_start",
+    "id",
+    "coverage",
+    "strand",
+    "rgb",
+    "block_count",
+    "block_size",
+    "block_distance",
+    "entropy",
+    "flank_case",
+    "flank_string",
+    "min_mismatch",
+    "max_mismatch",
+    "ave_mismatch",
+    "max_min_suffix",
+    "max_min_prefix",
+    "min_anchor_difference",
+    "unique_read_count",
+    "multi_read_count",
+    "paired_read_count",
+    "left_paired_read_count",
+    "right_paired_read_count",
+    "multiple_paired_read_count",
+    "unique_paired_read_count",
+    "single_read_count",
+    "encompassing_read",
+    "doner_start",
+    "acceptor_end",
+    "doner_iosforms",
+    "acceptor_isoforms",
+    "obsolete1",
+    "obsolete2",
+    "obsolete3",
+    "obsolete4",
+    "minimal_doner_isoform_length",
+    "maximal_doner_isoform_length",
+    "minimal_acceptor_isoform_length",
+    "maximal_acceptor_isoform_length",
+    "paired_reads_entropy",
+    "mismatch_per_bp",
+    "anchor_score",
+    "max_doner_fragment",
+    "max_acceptor_fragment",
+    "max_cur_fragment",
+    "min_cur_fragment",
+    "ave_cur_fragment",
+    "doner_encompass_unique",
+    "doner_encompass_multiple",
+    "acceptor_encompass_unique",
+    "acceptor_encompass_multiple",
+    "doner_match_to_normal",
+    "acceptor_match_to_normal",
+    "doner_seq",
+    "acceptor_seq",
+    "match_gene_strand",
+    "annotated_type",
+    "fusion_type",
+    "gene_strand",
+    "annotated_gene_donor",
+    "annotated_gene_acceptor",
+    "dummy",
+]
 
 # 'chrom' is in the format 'donor_chr~acceptor_chr' ... hence needs to be split
-circularRNAstxt[['Donor', 'Acceptor']] = circularRNAstxt['chrom'].str.split('~', expand=True)
+circularRNAstxt[["Donor", "Acceptor"]] = circularRNAstxt["chrom"].str.split(
+    "~", expand=True
+)
 
 # only select rows with ++ or -- strand
-circularRNAstxtnew = pandas.concat([circularRNAstxt[circularRNAstxt['strand'] == '++' ],circularRNAstxt[circularRNAstxt['strand'] == '--' ]],ignore_index=True,sort=False)
+circularRNAstxtnew = pandas.concat(
+    [
+        circularRNAstxt[circularRNAstxt["strand"] == "++"],
+        circularRNAstxt[circularRNAstxt["strand"] == "--"],
+    ],
+    ignore_index=True,
+    sort=False,
+)
 # strand is either ++ or -- .. needs to be fixed to + or -
-circularRNAstxtnew.replace('++','+',inplace=True)
-circularRNAstxtnew.replace('--','-',inplace=True)
+circularRNAstxtnew.replace("++", "+", inplace=True)
+circularRNAstxtnew.replace("--", "-", inplace=True)
 
 # subset columns and rename them and fix start/end order
-circularRNAstxtnew=circularRNAstxtnew[['Acceptor', 'donor_end', 'acceptor_start', 'strand', 'coverage', 'fusion_type', 'entropy']]
+circularRNAstxtnew = circularRNAstxtnew[
+    [
+        "Acceptor",
+        "donor_end",
+        "acceptor_start",
+        "strand",
+        "coverage",
+        "fusion_type",
+        "entropy",
+    ]
+]
 
-plus_strand = circularRNAstxtnew[circularRNAstxtnew['strand']=='+']
-plus_strand.columns = ['chrom','end','start','strand','read_count','fusion_type', 'entropy'] # start and end need to be switched!
+plus_strand = circularRNAstxtnew[circularRNAstxtnew["strand"] == "+"]
+plus_strand.columns = [
+    "chrom",
+    "end",
+    "start",
+    "strand",
+    "read_count",
+    "fusion_type",
+    "entropy",
+]  # start and end need to be switched!
 
-minus_strand = circularRNAstxtnew[circularRNAstxtnew['strand']=='-']
-minus_strand.columns = ['chrom','start','end','strand','read_count','fusion_type', 'entropy']
+minus_strand = circularRNAstxtnew[circularRNAstxtnew["strand"] == "-"]
+minus_strand.columns = [
+    "chrom",
+    "start",
+    "end",
+    "strand",
+    "read_count",
+    "fusion_type",
+    "entropy",
+]
 
-circularRNAstxtnew = pandas.concat([plus_strand,minus_strand],ignore_index=True,sort=False)
+circularRNAstxtnew = pandas.concat(
+    [plus_strand, minus_strand], ignore_index=True, sort=False
+)
 # circularRNAstxtnew.columns=['chrom','start','end','strand','read_count','fusion_type', 'entropy']
 
 # create mapsplice_annotation column to include "fusion_type" along with "entropy"
-circularRNAstxtnew['mapsplice_annotation']=circularRNAstxtnew['fusion_type'].astype(str)+"##"+circularRNAstxtnew['entropy'].astype(str)
-circularRNAstxtnew.drop(['fusion_type', 'entropy'],axis=1,inplace=True)
-circularRNAstxtnew.fillna(value="-11",inplace=True)
-circularRNAstxtnew = circularRNAstxtnew.astype({"chrom": str, "start": int, "end": int, "strand": str, "read_count": int, "mapsplice_annotation": str})
+circularRNAstxtnew["mapsplice_annotation"] = (
+    circularRNAstxtnew["fusion_type"].astype(str)
+    + "##"
+    + circularRNAstxtnew["entropy"].astype(str)
+)
+circularRNAstxtnew.drop(["fusion_type", "entropy"], axis=1, inplace=True)
+circularRNAstxtnew.fillna(value="-11", inplace=True)
+circularRNAstxtnew = circularRNAstxtnew.astype(
+    {
+        "chrom": str,
+        "start": int,
+        "end": int,
+        "strand": str,
+        "read_count": int,
+        "mapsplice_annotation": str,
+    }
+)
 
 # create index
-circularRNAstxtnew['circRNA_id']=circularRNAstxtnew['chrom'].astype(str)+"##"+circularRNAstxtnew['start'].astype(str)+"##"+circularRNAstxtnew['end'].astype(str)+"##"+circularRNAstxtnew['strand'].astype(str)
-circularRNAstxtnew.set_index(['circRNA_id'],inplace=True)
+circularRNAstxtnew["circRNA_id"] = (
+    circularRNAstxtnew["chrom"].astype(str)
+    + "##"
+    + circularRNAstxtnew["start"].astype(str)
+    + "##"
+    + circularRNAstxtnew["end"].astype(str)
+    + "##"
+    + circularRNAstxtnew["strand"].astype(str)
+)
+circularRNAstxtnew.set_index(["circRNA_id"], inplace=True)
 
 # sort and write out
-circularRNAstxtnew.sort_values(by=['chrom','start'],inplace=True)
-circularRNAstxtnew.to_csv(args.outfile,sep="\t",header=True,index=False)
+circularRNAstxtnew.sort_values(by=["chrom", "start"], inplace=True)
+circularRNAstxtnew.to_csv(args.outfile, sep="\t", header=True, index=False)
 
-# filter 
+# filter
 # nreads filter
-circularRNAstxtnew = circularRNAstxtnew[~circularRNAstxtnew["chrom"].isin(regions["additive"])]
-circularRNAstxtnew = circularRNAstxtnew[circularRNAstxtnew["read_count"] >= args.back_spliced_min_reads]
+circularRNAstxtnew = circularRNAstxtnew[
+    ~circularRNAstxtnew["chrom"].isin(regions["additive"])
+]
+circularRNAstxtnew = circularRNAstxtnew[
+    circularRNAstxtnew["read_count"] >= args.back_spliced_min_reads
+]
 
 # host distance/size filter
-circularRNAstxtnew_host = circularRNAstxtnew[circularRNAstxtnew["chrom"].isin(regions["host"])]
-circularRNAstxtnew_host["dist"] = abs(circularRNAstxtnew_host["start"] - circularRNAstxtnew_host["end"])
-circularRNAstxtnew_host = circularRNAstxtnew_host[circularRNAstxtnew_host["dist"] > args.host_filter_min]
-circularRNAstxtnew_host = circularRNAstxtnew_host[circularRNAstxtnew_host["dist"] < args.host_filter_max]
-circularRNAstxtnew_host.drop(["dist"],axis=1,inplace=True)
+circularRNAstxtnew_host = circularRNAstxtnew[
+    circularRNAstxtnew["chrom"].isin(regions["host"])
+]
+circularRNAstxtnew_host["dist"] = abs(
+    circularRNAstxtnew_host["start"] - circularRNAstxtnew_host["end"]
+)
+circularRNAstxtnew_host = circularRNAstxtnew_host[
+    circularRNAstxtnew_host["dist"] > args.host_filter_min
+]
+circularRNAstxtnew_host = circularRNAstxtnew_host[
+    circularRNAstxtnew_host["dist"] < args.host_filter_max
+]
+circularRNAstxtnew_host.drop(["dist"], axis=1, inplace=True)
 
 # virus distance/size filter
-circularRNAstxtnew_virus = circularRNAstxtnew[circularRNAstxtnew["chrom"].isin(regions["virus"])]
-circularRNAstxtnew_virus["dist"] = abs(circularRNAstxtnew_virus["start"] - circularRNAstxtnew_virus["end"])
-circularRNAstxtnew_virus = circularRNAstxtnew_virus[circularRNAstxtnew_virus["dist"] > args.virus_filter_min]
-circularRNAstxtnew_virus = circularRNAstxtnew_virus[circularRNAstxtnew_virus["dist"] < args.virus_filter_max]
-circularRNAstxtnew_virus.drop(["dist"],axis=1,inplace=True)
+circularRNAstxtnew_virus = circularRNAstxtnew[
+    circularRNAstxtnew["chrom"].isin(regions["virus"])
+]
+circularRNAstxtnew_virus["dist"] = abs(
+    circularRNAstxtnew_virus["start"] - circularRNAstxtnew_virus["end"]
+)
+circularRNAstxtnew_virus = circularRNAstxtnew_virus[
+    circularRNAstxtnew_virus["dist"] > args.virus_filter_min
+]
+circularRNAstxtnew_virus = circularRNAstxtnew_virus[
+    circularRNAstxtnew_virus["dist"] < args.virus_filter_max
+]
+circularRNAstxtnew_virus.drop(["dist"], axis=1, inplace=True)
 
-circularRNAstxtnew = pandas.concat([circularRNAstxtnew_host,circularRNAstxtnew_virus])
+circularRNAstxtnew = pandas.concat([circularRNAstxtnew_host, circularRNAstxtnew_virus])
 # sort and write out
-circularRNAstxtnew.sort_values(by=['chrom','start'],inplace=True)
-circularRNAstxtnew.to_csv(args.filteredoutfile,sep="\t",header=True,index=False)
\ No newline at end of file
+circularRNAstxtnew.sort_values(by=["chrom", "start"], inplace=True)
+circularRNAstxtnew.to_csv(args.filteredoutfile, sep="\t", header=True, index=False)
diff --git a/workflow/scripts/create_nclscan_per_sample_counts_table.py b/workflow/scripts/create_nclscan_per_sample_counts_table.py
index 91560e6..1fca24b 100755
--- a/workflow/scripts/create_nclscan_per_sample_counts_table.py
+++ b/workflow/scripts/create_nclscan_per_sample_counts_table.py
@@ -1,39 +1,99 @@
 import argparse
 import pandas
 
+
 def _annotation_int2str(i):
-    if i==0: 
+    if i == 0:
         return "Intergenic"
-    elif i==1:
+    elif i == 1:
         return "Intragenic"
     else:
         return "Unknown"
 
+
 # pandas.options.mode.chained_assignment = None
 
-parser = argparse.ArgumentParser(description='Create per sample Counts Table from NCLscan Outputs')
-parser.add_argument('--result', dest='resultsfile', type=str, required=True,
-                    help='.result file from NCLscan')
-parser.add_argument('--back_spliced_min_reads', dest='back_spliced_min_reads', type=int, required=True,
-                    help='back_spliced minimum read threshold')
-parser.add_argument('--host', dest='host', type=str, required=True,
-                    help='host name eg.hg38... single value...host_filter_min/host_filter_max filters are applied to this region only')
-parser.add_argument('--additives', dest='additives', type=str, required=True,
-                    help='additive name(s) eg.ERCC... comma-separated list... all BSJs in this region are filtered out')
-parser.add_argument('--viruses', dest='viruses', type=str, required=True,
-                    help='virus name(s) eg.NC_009333.1... comma-separated list...virus_filter_min/virus_filter_max filters are applied to this region only')
-parser.add_argument('--host_filter_min', dest='host_filter_min', type=int, required=False, default=150,
-                    help='min BSJ size filter for host')
-parser.add_argument('--virus_filter_min', dest='virus_filter_min', type=int, required=False, default=150,
-                    help='min BSJ size filter for virus')
-parser.add_argument('--host_filter_max', dest='host_filter_max', type=int, required=False, default=5000,
-                    help='max BSJ size filter for host')
-parser.add_argument('--virus_filter_max', dest='virus_filter_max', type=int, required=False, default=5000,
-                    help='max BSJ size filter for virus')
-parser.add_argument('--regions', dest='regions', type=str, required=True,
-                    help='regions file eg. ref.fa.regions')
-parser.add_argument('-o',dest='outfile',required=True,help='output table')
-parser.add_argument('-fo',dest='filteredoutfile',required=True,help='filtered output table')
+parser = argparse.ArgumentParser(
+    description="Create per sample Counts Table from NCLscan Outputs"
+)
+parser.add_argument(
+    "--result",
+    dest="resultsfile",
+    type=str,
+    required=True,
+    help=".result file from NCLscan",
+)
+parser.add_argument(
+    "--back_spliced_min_reads",
+    dest="back_spliced_min_reads",
+    type=int,
+    required=True,
+    help="back_spliced minimum read threshold",
+)
+parser.add_argument(
+    "--host",
+    dest="host",
+    type=str,
+    required=True,
+    help="host name eg.hg38... single value...host_filter_min/host_filter_max filters are applied to this region only",
+)
+parser.add_argument(
+    "--additives",
+    dest="additives",
+    type=str,
+    required=True,
+    help="additive name(s) eg.ERCC... comma-separated list... all BSJs in this region are filtered out",
+)
+parser.add_argument(
+    "--viruses",
+    dest="viruses",
+    type=str,
+    required=True,
+    help="virus name(s) eg.NC_009333.1... comma-separated list...virus_filter_min/virus_filter_max filters are applied to this region only",
+)
+parser.add_argument(
+    "--host_filter_min",
+    dest="host_filter_min",
+    type=int,
+    required=False,
+    default=150,
+    help="min BSJ size filter for host",
+)
+parser.add_argument(
+    "--virus_filter_min",
+    dest="virus_filter_min",
+    type=int,
+    required=False,
+    default=150,
+    help="min BSJ size filter for virus",
+)
+parser.add_argument(
+    "--host_filter_max",
+    dest="host_filter_max",
+    type=int,
+    required=False,
+    default=5000,
+    help="max BSJ size filter for host",
+)
+parser.add_argument(
+    "--virus_filter_max",
+    dest="virus_filter_max",
+    type=int,
+    required=False,
+    default=5000,
+    help="max BSJ size filter for virus",
+)
+parser.add_argument(
+    "--regions",
+    dest="regions",
+    type=str,
+    required=True,
+    help="regions file eg. ref.fa.regions",
+)
+parser.add_argument("-o", dest="outfile", required=True, help="output table")
+parser.add_argument(
+    "-fo", dest="filteredoutfile", required=True, help="filtered output table"
+)
 args = parser.parse_args()
 
 # sn=args.samplename
@@ -46,7 +106,7 @@ def _annotation_int2str(i):
 regions["host"] = list()
 regions["additive"] = list()
 regions["virus"] = list()
-r = open(args.regions,'r')
+r = open(args.regions, "r")
 rlines = r.readlines()
 r.close()
 allseqs = list()
@@ -62,17 +122,18 @@ def _annotation_int2str(i):
         host_additive_virus = "virus"
     regions[host_additive_virus].extend(seq)
     allseqs.extend(seq)
-regions["additive"] = list((set(allseqs)-set(regions["host"]))-set(regions["virus"]))
-
+regions["additive"] = list(
+    (set(allseqs) - set(regions["host"])) - set(regions["virus"])
+)
 
 
 # load files
-resultsfile=pandas.read_csv(args.resultsfile,sep="\t",header=None)
+resultsfile = pandas.read_csv(args.resultsfile, sep="\t", header=None)
 
 # file has no column lables ... add them ... the file format is:
 # ref: https://github.com/TreesLab/NCLscan
 # | #  | Description                                | ColName
-# |----|--------------------------------------------|------------ 
+# |----|--------------------------------------------|------------
 # | 1  | Chromosome name of the donor side (5'ss)   | chrd
 # | 2  | Junction coordinate of the donor side      | coordd
 # | 3  | Strand of the donor side                   | strandd
@@ -86,33 +147,81 @@ def _annotation_int2str(i):
 # | 11 | Total number of junc-reads                 | jreads
 # | 12 | Total number of span-reads                 | sreads
 
-resultsfile.columns=["chrd", "coordd", "strandd", "chra", "coorda", "stranda", "gened", "genea", "case", "reads", "jreads", "sreads"]
+resultsfile.columns = [
+    "chrd",
+    "coordd",
+    "strandd",
+    "chra",
+    "coorda",
+    "stranda",
+    "gened",
+    "genea",
+    "case",
+    "reads",
+    "jreads",
+    "sreads",
+]
 resultsfile = resultsfile[resultsfile["chrd"] == resultsfile["chra"]]
 resultsfile = resultsfile[resultsfile["strandd"] == resultsfile["stranda"]]
 
-plus_strand = resultsfile[resultsfile['strandd']=='+']
-plus_strand = plus_strand[["chrd", "coorda", "coordd", "strandd", "reads", "case"]] # start and end need to be switched!
-plus_strand.columns = ['chrom','end','start','strand','read_count', 'nclscan_annotation']
-
-minus_strand = resultsfile[resultsfile['strandd']=='+']
-minus_strand = minus_strand[["chrd", "coordd", "coorda", "strandd", "reads", "case"]] 
-minus_strand.columns = ['chrom','end','start','strand','read_count', 'nclscan_annotation']
-
-outdf = pandas.concat([plus_strand,minus_strand],ignore_index=True,sort=False)
-outdf["nclscan_annotation"] = outdf["nclscan_annotation"] + 1 #change 1 to 2 and 0 to 1 ... as 0 is for no annotation
+plus_strand = resultsfile[resultsfile["strandd"] == "+"]
+plus_strand = plus_strand[
+    ["chrd", "coorda", "coordd", "strandd", "reads", "case"]
+]  # start and end need to be switched!
+plus_strand.columns = [
+    "chrom",
+    "end",
+    "start",
+    "strand",
+    "read_count",
+    "nclscan_annotation",
+]
+
+minus_strand = resultsfile[resultsfile["strandd"] == "+"]
+minus_strand = minus_strand[["chrd", "coordd", "coorda", "strandd", "reads", "case"]]
+minus_strand.columns = [
+    "chrom",
+    "end",
+    "start",
+    "strand",
+    "read_count",
+    "nclscan_annotation",
+]
+
+outdf = pandas.concat([plus_strand, minus_strand], ignore_index=True, sort=False)
+outdf["nclscan_annotation"] = (
+    outdf["nclscan_annotation"] + 1
+)  # change 1 to 2 and 0 to 1 ... as 0 is for no annotation
 outdf["nclscan_annotation"] = outdf["nclscan_annotation"].apply(_annotation_int2str)
 
-outdf = outdf.astype({"chrom": str, "start": int, "end": int, "strand": str, "read_count": int, "nclscan_annotation": str})
+outdf = outdf.astype(
+    {
+        "chrom": str,
+        "start": int,
+        "end": int,
+        "strand": str,
+        "read_count": int,
+        "nclscan_annotation": str,
+    }
+)
 
 # create index
-outdf['circRNA_id']=outdf['chrom'].astype(str)+"##"+outdf['start'].astype(str)+"##"+outdf['end'].astype(str)+"##"+outdf['strand'].astype(str)
-outdf.set_index(['circRNA_id'],inplace=True)
+outdf["circRNA_id"] = (
+    outdf["chrom"].astype(str)
+    + "##"
+    + outdf["start"].astype(str)
+    + "##"
+    + outdf["end"].astype(str)
+    + "##"
+    + outdf["strand"].astype(str)
+)
+outdf.set_index(["circRNA_id"], inplace=True)
 
 # sort and write out
-outdf.sort_values(by=['chrom','start'],inplace=True)
-outdf.to_csv(args.outfile,sep="\t",header=True,index=False)
+outdf.sort_values(by=["chrom", "start"], inplace=True)
+outdf.to_csv(args.outfile, sep="\t", header=True, index=False)
 
-# filter 
+# filter
 # nreads filter
 outdf = outdf[~outdf["chrom"].isin(regions["additive"])]
 outdf = outdf[outdf["read_count"] >= args.back_spliced_min_reads]
@@ -122,16 +231,16 @@ def _annotation_int2str(i):
 outdf_host["dist"] = abs(outdf_host["start"] - outdf_host["end"])
 outdf_host = outdf_host[outdf_host["dist"] > args.host_filter_min]
 outdf_host = outdf_host[outdf_host["dist"] < args.host_filter_max]
-outdf_host.drop(["dist"],axis=1,inplace=True)
+outdf_host.drop(["dist"], axis=1, inplace=True)
 
 # virus distance/size filter
 outdf_virus = outdf[outdf["chrom"].isin(regions["virus"])]
 outdf_virus["dist"] = abs(outdf_virus["start"] - outdf_virus["end"])
 outdf_virus = outdf_virus[outdf_virus["dist"] > args.virus_filter_min]
 outdf_virus = outdf_virus[outdf_virus["dist"] < args.virus_filter_max]
-outdf_virus.drop(["dist"],axis=1,inplace=True)
+outdf_virus.drop(["dist"], axis=1, inplace=True)
 
-outdf = pandas.concat([outdf_host,outdf_virus])
+outdf = pandas.concat([outdf_host, outdf_virus])
 # sort and write out
-outdf.sort_values(by=['chrom','start'],inplace=True)
-outdf.to_csv(args.filteredoutfile,sep="\t",header=True,index=False)
+outdf.sort_values(by=["chrom", "start"], inplace=True)
+outdf.to_csv(args.filteredoutfile, sep="\t", header=True, index=False)
diff --git a/workflow/scripts/filter_bam.py b/workflow/scripts/filter_bam.py
index 5f02829..1c08e8e 100755
--- a/workflow/scripts/filter_bam.py
+++ b/workflow/scripts/filter_bam.py
@@ -1,27 +1,45 @@
 import pysam
 import argparse
 
+
 def main():
     parser = argparse.ArgumentParser(
         description="Remove all non-proper-pair, chimeric, secondary, supplementary, unmapped alignments from input BAM file"
     )
-    parser.add_argument("-i","--inbam",dest="inbam",required=True,type=str,
-        help="Input  BAM file")
-    parser.add_argument("-o","--outbam",dest="outbam",required=True,type=str,
-        help="Output primary alignment only BAM file")
-    parser.add_argument('-p',"--pe",dest="pe",required=False,action='store_true', default=False,
-        help="set this if BAM is paired end")
-    args = parser.parse_args()		
+    parser.add_argument(
+        "-i", "--inbam", dest="inbam", required=True, type=str, help="Input  BAM file"
+    )
+    parser.add_argument(
+        "-o",
+        "--outbam",
+        dest="outbam",
+        required=True,
+        type=str,
+        help="Output primary alignment only BAM file",
+    )
+    parser.add_argument(
+        "-p",
+        "--pe",
+        dest="pe",
+        required=False,
+        action="store_true",
+        default=False,
+        help="set this if BAM is paired end",
+    )
+    args = parser.parse_args()
     samfile = pysam.AlignmentFile(args.inbam, "rb")
     outfile = pysam.AlignmentFile(args.outbam, "wb", template=samfile)
     for read in samfile.fetch():
-        if args.pe and ( read.reference_id != read.next_reference_id ): continue    # only works for PE ... for SE read.next_reference_id is -1
-        if args.pe and ( not read.is_proper_pair ): continue
-        if read.is_secondary or read.is_supplementary or read.is_unmapped : continue
+        if args.pe and (read.reference_id != read.next_reference_id):
+            continue  # only works for PE ... for SE read.next_reference_id is -1
+        if args.pe and (not read.is_proper_pair):
+            continue
+        if read.is_secondary or read.is_supplementary or read.is_unmapped:
+            continue
         outfile.write(read)
     samfile.close()
     outfile.close()
 
 
 if __name__ == "__main__":
-    main()
\ No newline at end of file
+    main()
diff --git a/workflow/scripts/filter_bam_by_readids.py b/workflow/scripts/filter_bam_by_readids.py
index b03f426..178ec83 100755
--- a/workflow/scripts/filter_bam_by_readids.py
+++ b/workflow/scripts/filter_bam_by_readids.py
@@ -3,8 +3,9 @@
 import argparse
 import os
 import gzip
+
 # """
-# Script takes a BAM file with a list of readids, then 
+# Script takes a BAM file with a list of readids, then
 # filters the input BAM for those readids and outputs
 # only those readid alignments into a new BAM file
 # @Params:
@@ -18,42 +19,53 @@
 # 	path to output BAM file
 # """
 
-parser = argparse.ArgumentParser(description='Filter BAM by readids')
-parser.add_argument('--inputBAM', dest='inputBAM', type=str, required=True,
-                    help='input BAM file')
-parser.add_argument('--outputBAM', dest='outputBAM', type=str, required=True,
-                    help='filtered output BAM file')
-parser.add_argument('--readids', dest='readids', type=str, required=True,
-                    help='file with readids to keep (one readid per line)')
+parser = argparse.ArgumentParser(description="Filter BAM by readids")
+parser.add_argument(
+    "--inputBAM", dest="inputBAM", type=str, required=True, help="input BAM file"
+)
+parser.add_argument(
+    "--outputBAM",
+    dest="outputBAM",
+    type=str,
+    required=True,
+    help="filtered output BAM file",
+)
+parser.add_argument(
+    "--readids",
+    dest="readids",
+    type=str,
+    required=True,
+    help="file with readids to keep (one readid per line)",
+)
 args = parser.parse_args()
 
-split_tup = os.path.splitext(args.readids)  
+split_tup = os.path.splitext(args.readids)
 # extract the file name and extension
 file_name = split_tup[0]
 file_extension = split_tup[1]
 
-rids_dict=dict()
-if file_extension==".gz":
-	rids=list()
-	with gzip.open(args.readids,'rt') as readids:
-		for l in readids:
-			l = l.strip()
-			rids_dict[l]=1
+rids_dict = dict()
+if file_extension == ".gz":
+    rids = list()
+    with gzip.open(args.readids, "rt") as readids:
+        for l in readids:
+            l = l.strip()
+            rids_dict[l] = 1
 else:
-	rids = list(map(lambda x:x.strip(),open(args.readids,'r').readlines()))
-	rids = list(set(rids))
-	for rid in rids:
-		rids_dict[rid]=1
+    rids = list(map(lambda x: x.strip(), open(args.readids, "r").readlines()))
+    rids = list(set(rids))
+    for rid in rids:
+        rids_dict[rid] = 1
 inBAM = pysam.AlignmentFile(args.inputBAM, "rb")
 outBAM = pysam.AlignmentFile(args.outputBAM, "wb", template=inBAM)
 
-count=0
+count = 0
 for read in inBAM.fetch():
-	count+=1
-	if count%1000000 == 0:
-		print("%d reads read!"%(count))
-	qn=read.query_name
-	if qn in rids_dict:
-		outBAM.write(read)
+    count += 1
+    if count % 1000000 == 0:
+        print("%d reads read!" % (count))
+    qn = read.query_name
+    if qn in rids_dict:
+        outBAM.write(read)
 inBAM.close()
 outBAM.close()
diff --git a/workflow/scripts/filter_bam_for_BSJs.py b/workflow/scripts/filter_bam_for_BSJs.py
index d0473c7..5186bf0 100755
--- a/workflow/scripts/filter_bam_for_BSJs.py
+++ b/workflow/scripts/filter_bam_for_BSJs.py
@@ -7,7 +7,7 @@
 
 # """
 # input is a BAM file containing all BSJ alignments along with some chimeric alignments
-# this script filters out the non-BSJ alignments and outputs the BSJ-only alignments to a 
+# this script filters out the non-BSJ alignments and outputs the BSJ-only alignments to a
 # new BAM file.
 # @Params:
 # @Inputs:
@@ -20,38 +20,54 @@
 # 	path to output BAM file
 # """
 
+
 def split_text(s):
     for k, g in groupby(s, str.isalpha):
-        yield ''.join(g)
+        yield "".join(g)
+
 
 def get_alt_cigars(c):
-	alt_cigars=[]
-	x=list(split_text(c))
-	if x[1]=="H":
-		alt_cigars.append("".join(x[2:]))
-	if x[-1]=="H":
-		alt_cigars.append("".join(x[:-2]))
-	if x[1]=="H" and x[-1]=="H":
-		alt_cigars.append("".join(x[2:-2]))
-	return alt_cigars
+    alt_cigars = []
+    x = list(split_text(c))
+    if x[1] == "H":
+        alt_cigars.append("".join(x[2:]))
+    if x[-1] == "H":
+        alt_cigars.append("".join(x[:-2]))
+    if x[1] == "H" and x[-1] == "H":
+        alt_cigars.append("".join(x[2:-2]))
+    return alt_cigars
+
 
 pp = pprint.PrettyPrinter(indent=4)
 
-parser = argparse.ArgumentParser(description='Filter readid filtered BAM file for BSJ-only alignments')
-parser.add_argument('--inputBAM', dest='inputBAM', type=str, required=True,
-                    help='input BAM file')
-parser.add_argument('--outputBAM', dest='outputBAM', type=str, required=True,
-                    help='filtered output BAM file')
-parser.add_argument('--readids', dest='readids', type=str, required=True,
-                    help='file with readids to keep (tab-delimited with columns:readid,chrom,strand,site1,site2,cigarlist)')
+parser = argparse.ArgumentParser(
+    description="Filter readid filtered BAM file for BSJ-only alignments"
+)
+parser.add_argument(
+    "--inputBAM", dest="inputBAM", type=str, required=True, help="input BAM file"
+)
+parser.add_argument(
+    "--outputBAM",
+    dest="outputBAM",
+    type=str,
+    required=True,
+    help="filtered output BAM file",
+)
+parser.add_argument(
+    "--readids",
+    dest="readids",
+    type=str,
+    required=True,
+    help="file with readids to keep (tab-delimited with columns:readid,chrom,strand,site1,site2,cigarlist)",
+)
 args = parser.parse_args()
-rids=dict()
+rids = dict()
 inBAM = pysam.AlignmentFile(args.inputBAM, "rb")
 outBAM = pysam.AlignmentFile(args.outputBAM, "wb", template=inBAM)
 
-# multiple alignments of a read are grouped together by 
+# multiple alignments of a read are grouped together by
 # HI i Query hit index ... eg. HI:i:1, HI:i:2 etc. --> See https://samtools.github.io/hts-specs/SAMtags.pdf
-# each HI represents a different alignent for the pair and 
+# each HI represents a different alignent for the pair and
 # generally contains 3 lines in the alignment file eg:
 # SRR1731877.10077876	163	chr16	16699505	1	30S53M	=	16699513	53	CTACCGTTTCCTGTGATAAGTGCTACTTCTTGAGGCTCTGTTCCATCTTTGTCCCTTTCCAGAGATTTAATCTCTCTCTCTCT	;DDDDHBFHHDG@AAFHHGEHHIIIIIIIIIBDGIEH3DDHGC4?09?BBB0999B?8)./>FH>GHG>==CE@@A>>AE?;;	NH:i:4	HI:i:1	AS:i:97	nM:i:0	NM:i:0	SA:Z:chr16,16700448,+,30M53H,1,0;
 # SRR1731877.10077876	83	chr16	16699513	1	45M	=	16699505	-53	TGTTCCATCTTTGTCCCTTTCCAGAGATTTAATCTCTCTCTCTCT	DGD>B@?;B@GFC88ECADCFHEE@C<@C:2A<A+C?DBB:D?;?	NH:i:4	HI:i:1	AS:i:97	nM:i:0	NM:i:0
@@ -61,144 +77,149 @@ def get_alt_cigars(c):
 # each readid may have multiple HI values, each of which is a dict
 # each HI value has 3 lists a) sites b) cigars c) alignments ... each of these lists have 3 elements each
 
-count=0
+count = 0
 for read in inBAM.fetch():
-	count+=1
-	qn=read.query_name
-	if not qn in rids:
-		rids[qn]=dict()
-	hi=read.get_tag("HI")
-	if not hi in rids[qn]:
-		rids[qn][hi]=dict()
-		rids[qn][hi]['alignments']=list()
-		rids[qn][hi]['sites']=list()
-		rids[qn][hi]['cigars']=list()
-	rids[qn][hi]['alignments'].append(read)
-	site=read.get_reference_positions()[1]
-	cigar=read.cigarstring
-	cigar=cigar.replace("S","H") # convert soft-clips to hard-clips ... readids file has all hard-clips ... required for comparison
-	rids[qn][hi]['sites'].append(site)
-	rids[qn][hi]['cigars'].append(cigar)
-	rids[qn][hi]['cigars'].sort()	
-	#print(site)
-	#print(cigar)
-	#print(len(rids[qn]))
-	#if count==25:
-	#	pp.pprint(rids)
-	#	for rid in rids:
-	#		print(rid,len(rids[rid]))
-	#	exit()
+    count += 1
+    qn = read.query_name
+    if not qn in rids:
+        rids[qn] = dict()
+    hi = read.get_tag("HI")
+    if not hi in rids[qn]:
+        rids[qn][hi] = dict()
+        rids[qn][hi]["alignments"] = list()
+        rids[qn][hi]["sites"] = list()
+        rids[qn][hi]["cigars"] = list()
+    rids[qn][hi]["alignments"].append(read)
+    site = read.get_reference_positions()[1]
+    cigar = read.cigarstring
+    cigar = cigar.replace(
+        "S", "H"
+    )  # convert soft-clips to hard-clips ... readids file has all hard-clips ... required for comparison
+    rids[qn][hi]["sites"].append(site)
+    rids[qn][hi]["cigars"].append(cigar)
+    rids[qn][hi]["cigars"].sort()
+    # print(site)
+    # print(cigar)
+    # print(len(rids[qn]))
+    # if count==25:
+    # 	pp.pprint(rids)
+    # 	for rid in rids:
+    # 		print(rid,len(rids[rid]))
+    # 	exit()
 inBAM.close()
 
-#print(rids["SRR1731877.10077876"].keys())
-#exit()
+# print(rids["SRR1731877.10077876"].keys())
+# exit()
 
-readidfile = open(args.readids,'r')
+readidfile = open(args.readids, "r")
 readids = readidfile.readlines()
 readidfile.close()
 
-#print(rids.keys())
-
+# print(rids.keys())
 
 
 for line in readids:
-	line=line.strip().split("\t")
-	# print(line)
-## SRR1731877.10077876	chr16	-	16699504	16700478	30H53M,45M,30M53H
-## columns:readid,chrom,strand,site1,site2,cigarlist
-## this is generated by junctions2readids.py from the .junction file from STAR2p
-	readid=line[0]
-	chrom=line[1]
-	strand=line[2]
-	site1=line[3]
-	site2=line[4]
-	cigars=line[5].split(",")
-	cigars.sort()
-## as we are searching for the alignment which represents this occurance 
-## (which of the multiple HI values should we report in the output BAM) 
-## of this readid in the readids file,
-## TEST #1
-## We first compare site (or coordinate)
-## If strand is -ve, then site1 is expected to be in the reported alignment
-## but if the strand is +ve, the site2 is expected to be in the 'sites' list 
-## note: we have to add 1 to switch from 0-based to 1-based
-	if strand=="-":
-		site=int(site1)+1
-	else:
-		site=int(site2)+1
-## readid will always be part of rids... but just in case
-	if not readid in rids:
-		continue
-	for hi in rids[readid].keys():
-		# print(readid,hi,site)
-		# print(site,"===>>",rids[readid][hi]['sites'])
-		# print(site in rids[readid][hi]['sites'])
-		# print(cigars,"====>>",rids[readid][hi]['cigars'])
-		# print(rids[readid][hi]['cigars'] == cigars)
-		if site in rids[readid][hi]['sites']:
-## TEST #2
-## we know that site is present in sites of this alignment
-## next we ensure that all 3 alignments of this HI value are on the same chromosome/reference
-			references=[]
-			for read in rids[readid][hi]['alignments']:
-				references.append(read.reference_name)
-			if len(list(set(references)))!=1: # same HI but different aligning to different chromosomes
-				continue
-			rids[readid][hi]['alignments']=list(set(rids[readid][hi]['alignments']))
-## TEST #3.1
-## we know that site is in 'sites' and all 3 alignment from the HI value are on the same chromosome
-## next we check if the CIGAR scores of the 3 alignments are the same as the CIGAR scores from the readids file
-			if rids[readid][hi]['cigars'] == cigars: # lists are sorted before comparison
-				for read in rids[readid][hi]['alignments']:
-					outBAM.write(read)	
-			else:
-## TEST #3.2
-## some alignments are missed because of extra soft clipping in one of the 3 reported alignments in a single HI value
-## eg.
-# SRR1731877.16929220	83	chr7	99416198	255	5S48M1S	=	99416198	-48	GGAAGTCCACCACCAGAAAACCCGCTACATCTTCGACCTCTTTTACAAGCGGAC	FEHC>HHE@GC=GCIIJIGDIJJIJJIJJJJJJIHGJJIIIHHGHHFFFFFCC@	NH:i:1	HI:i:1	AS:i:95	nM:i:0	NM:i:0
-# SRR1731877.16929220	163	chr7	99416198	255	42S48M1S	=	99416198	48	CAGAAAACCCGCTACATCTGCGACCTCTTTTACAAGCGGAAATCCACCACCAGAAAACCCGCTACATCTTCGACCTCTTTTACAAGCGGAC	@@BFFFFFHHHHHJJJJJJHIJJJJJJJJJJIJJJIIIIGJJCFHHGJHGEHEFFFFEDCDDDDDDDDDDEDDDDDDDDDDDDCDDDDDDD	NH:i:1	HI:i:1	AS:i:95	nM:i:0	NM:i:0	SA:Z:chr7,99416206,+,42M49H,255,1;
-# SRR1731877.16929220	2209	chr7	99416206	255	42M49H	=	99416198	0	CAGAAAACCCGCTACATCTGCGACCTCTTTTACAAGCGGAAA	@@BFFFFFHHHHHJJJJJJHIJJJJJJJJJJIJJJIIIIGJJ	NH:i:1	HI:i:1	AS:i:39	nM:i:1	NM:i:1	SA:Z:chr7,99416198,+,42S48M1S,255,0;
-# the readids file contains
-# SRR1731877.16929220	chr7	-	99416197	99416248	42H48M,48M1H,42M49H
-# cigars from readids file --> 42H48M,48M1H,42M49H
-# cigars from bam file --> 42H48M,5H48M1H,42M49H
-# this is fix to include these alignments in the output BAM
-# recompare cigars after removing softclippings at the ends of the CIGAR of non-matching cigar string
-				aminusb=list(set(rids[readid][hi]['cigars'])-set(cigars))
-				if len(aminusb)==1:
-					restcigars=list(set(rids[readid][hi]['cigars'])-set(aminusb))
-					altcigars=get_alt_cigars(aminusb[0])
-					for ac in altcigars:
-						newcigars=[]
-						newcigars.extend(restcigars)
-						newcigars.append(ac)
-						newcigars.sort()
-						if newcigars == cigars:
-							for read in rids[readid][hi]['alignments']:
-								outBAM.write(read)
-							break
-## TEST #3.3
-## similar to 3.2 some alignments are missed because of extra soft clipping in 2 of the 3 reported alignments in a single HI value
-# this is fix for that scenario
-				if len(aminusb)==2:
-					commoncigar=list(set(rids[readid][hi]['cigars'])-set(aminusb))
-					altcigars1=get_alt_cigars(aminusb[0])
-					altcigars2=get_alt_cigars(aminusb[1])
-					found=0
-					for ac1 in altcigars1:
-						if found!=0:
-							break
-						tmpcigars=[]
-						tmpcigars.extend(commoncigar)
-						tmpcigars.append(ac1)
-						for ac2 in altcigars2:
-							newcigars=[]
-							newcigars.extend(tmpcigars)
-							newcigars.append(ac2)	
-							newcigars.sort()
-							if newcigars == cigars:
-								for read in rids[readid][hi]['alignments']:
-									outBAM.write(read)
-								found=1
-								break
+    line = line.strip().split("\t")
+    # print(line)
+    ## SRR1731877.10077876	chr16	-	16699504	16700478	30H53M,45M,30M53H
+    ## columns:readid,chrom,strand,site1,site2,cigarlist
+    ## this is generated by junctions2readids.py from the .junction file from STAR2p
+    readid = line[0]
+    chrom = line[1]
+    strand = line[2]
+    site1 = line[3]
+    site2 = line[4]
+    cigars = line[5].split(",")
+    cigars.sort()
+    ## as we are searching for the alignment which represents this occurance
+    ## (which of the multiple HI values should we report in the output BAM)
+    ## of this readid in the readids file,
+    ## TEST #1
+    ## We first compare site (or coordinate)
+    ## If strand is -ve, then site1 is expected to be in the reported alignment
+    ## but if the strand is +ve, the site2 is expected to be in the 'sites' list
+    ## note: we have to add 1 to switch from 0-based to 1-based
+    if strand == "-":
+        site = int(site1) + 1
+    else:
+        site = int(site2) + 1
+    ## readid will always be part of rids... but just in case
+    if not readid in rids:
+        continue
+    for hi in rids[readid].keys():
+        # print(readid,hi,site)
+        # print(site,"===>>",rids[readid][hi]['sites'])
+        # print(site in rids[readid][hi]['sites'])
+        # print(cigars,"====>>",rids[readid][hi]['cigars'])
+        # print(rids[readid][hi]['cigars'] == cigars)
+        if site in rids[readid][hi]["sites"]:
+            ## TEST #2
+            ## we know that site is present in sites of this alignment
+            ## next we ensure that all 3 alignments of this HI value are on the same chromosome/reference
+            references = []
+            for read in rids[readid][hi]["alignments"]:
+                references.append(read.reference_name)
+            if (
+                len(list(set(references))) != 1
+            ):  # same HI but different aligning to different chromosomes
+                continue
+            rids[readid][hi]["alignments"] = list(set(rids[readid][hi]["alignments"]))
+            ## TEST #3.1
+            ## we know that site is in 'sites' and all 3 alignment from the HI value are on the same chromosome
+            ## next we check if the CIGAR scores of the 3 alignments are the same as the CIGAR scores from the readids file
+            if (
+                rids[readid][hi]["cigars"] == cigars
+            ):  # lists are sorted before comparison
+                for read in rids[readid][hi]["alignments"]:
+                    outBAM.write(read)
+            else:
+                ## TEST #3.2
+                ## some alignments are missed because of extra soft clipping in one of the 3 reported alignments in a single HI value
+                ## eg.
+                # SRR1731877.16929220	83	chr7	99416198	255	5S48M1S	=	99416198	-48	GGAAGTCCACCACCAGAAAACCCGCTACATCTTCGACCTCTTTTACAAGCGGAC	FEHC>HHE@GC=GCIIJIGDIJJIJJIJJJJJJIHGJJIIIHHGHHFFFFFCC@	NH:i:1	HI:i:1	AS:i:95	nM:i:0	NM:i:0
+                # SRR1731877.16929220	163	chr7	99416198	255	42S48M1S	=	99416198	48	CAGAAAACCCGCTACATCTGCGACCTCTTTTACAAGCGGAAATCCACCACCAGAAAACCCGCTACATCTTCGACCTCTTTTACAAGCGGAC	@@BFFFFFHHHHHJJJJJJHIJJJJJJJJJJIJJJIIIIGJJCFHHGJHGEHEFFFFEDCDDDDDDDDDDEDDDDDDDDDDDDCDDDDDDD	NH:i:1	HI:i:1	AS:i:95	nM:i:0	NM:i:0	SA:Z:chr7,99416206,+,42M49H,255,1;
+                # SRR1731877.16929220	2209	chr7	99416206	255	42M49H	=	99416198	0	CAGAAAACCCGCTACATCTGCGACCTCTTTTACAAGCGGAAA	@@BFFFFFHHHHHJJJJJJHIJJJJJJJJJJIJJJIIIIGJJ	NH:i:1	HI:i:1	AS:i:39	nM:i:1	NM:i:1	SA:Z:chr7,99416198,+,42S48M1S,255,0;
+                # the readids file contains
+                # SRR1731877.16929220	chr7	-	99416197	99416248	42H48M,48M1H,42M49H
+                # cigars from readids file --> 42H48M,48M1H,42M49H
+                # cigars from bam file --> 42H48M,5H48M1H,42M49H
+                # this is fix to include these alignments in the output BAM
+                # recompare cigars after removing softclippings at the ends of the CIGAR of non-matching cigar string
+                aminusb = list(set(rids[readid][hi]["cigars"]) - set(cigars))
+                if len(aminusb) == 1:
+                    restcigars = list(set(rids[readid][hi]["cigars"]) - set(aminusb))
+                    altcigars = get_alt_cigars(aminusb[0])
+                    for ac in altcigars:
+                        newcigars = []
+                        newcigars.extend(restcigars)
+                        newcigars.append(ac)
+                        newcigars.sort()
+                        if newcigars == cigars:
+                            for read in rids[readid][hi]["alignments"]:
+                                outBAM.write(read)
+                            break
+                ## TEST #3.3
+                ## similar to 3.2 some alignments are missed because of extra soft clipping in 2 of the 3 reported alignments in a single HI value
+                # this is fix for that scenario
+                if len(aminusb) == 2:
+                    commoncigar = list(set(rids[readid][hi]["cigars"]) - set(aminusb))
+                    altcigars1 = get_alt_cigars(aminusb[0])
+                    altcigars2 = get_alt_cigars(aminusb[1])
+                    found = 0
+                    for ac1 in altcigars1:
+                        if found != 0:
+                            break
+                        tmpcigars = []
+                        tmpcigars.extend(commoncigar)
+                        tmpcigars.append(ac1)
+                        for ac2 in altcigars2:
+                            newcigars = []
+                            newcigars.extend(tmpcigars)
+                            newcigars.append(ac2)
+                            newcigars.sort()
+                            if newcigars == cigars:
+                                for read in rids[readid][hi]["alignments"]:
+                                    outBAM.write(read)
+                                found = 1
+                                break
 outBAM.close()
diff --git a/workflow/scripts/filter_bam_for_linear_reads.py b/workflow/scripts/filter_bam_for_linear_reads.py
index c8bf79a..085be9e 100755
--- a/workflow/scripts/filter_bam_for_linear_reads.py
+++ b/workflow/scripts/filter_bam_for_linear_reads.py
@@ -21,33 +21,42 @@
 # """
 
 # if readid is NOT in the junctions file then it is a read with no Junction ... aka LINEAR!
-            
+
+
 class Read:
     def __init__(self):
-        self.alignments=list()
-        self.read1exists=False
-        self.read2exists=False
-        
-    def append_alignment(self,alignment):
+        self.alignments = list()
+        self.read1exists = False
+        self.read2exists = False
+
+    def append_alignment(self, alignment):
         self.alignments.append(alignment)
         if alignment.is_read1:
-            self.read1exists=True
+            self.read1exists = True
         if alignment.is_read2:
-            self.read2exists=True
-    
-    def is_valid_read(self):
-        return(self.read1exists and self.read2exists)
-        
+            self.read2exists = True
 
+    def is_valid_read(self):
+        return self.read1exists and self.read2exists
 
 
-parser = argparse.ArgumentParser(description='Filter BAM to exclude BSJs and other chimeric alignments')
-parser.add_argument('--inputBAM', dest='inputBAM', type=str, required=True,
-                    help='input BAM file')
-parser.add_argument('--outputBAM', dest='outputBAM', type=str, required=True,
-                    help='filtered output BAM file')
-parser.add_argument('-j',dest='junctions',required=True,help='chimeric junctions file')
-parser.add_argument('-p',dest='paired', help='bam is paired', action='store_true')
+parser = argparse.ArgumentParser(
+    description="Filter BAM to exclude BSJs and other chimeric alignments"
+)
+parser.add_argument(
+    "--inputBAM", dest="inputBAM", type=str, required=True, help="input BAM file"
+)
+parser.add_argument(
+    "--outputBAM",
+    dest="outputBAM",
+    type=str,
+    required=True,
+    help="filtered output BAM file",
+)
+parser.add_argument(
+    "-j", dest="junctions", required=True, help="chimeric junctions file"
+)
+parser.add_argument("-p", dest="paired", help="bam is paired", action="store_true")
 args = parser.parse_args()
 # rids=list()
 inBAM = pysam.AlignmentFile(args.inputBAM, "rb")
@@ -98,31 +107,36 @@ def is_valid_read(self):
 
 
 # get a list of the chimeric readids
-rids_dict=dict()
-with open(args.junctions, 'r') as junc_f:
+rids_dict = dict()
+with open(args.junctions, "r") as junc_f:
     for line in junc_f:
         if "junction_type" in line:
             continue
-        readid=line.split()[9] # 10th column is read-name
-        rids_dict[readid]=1
+        readid = line.split()[9]  # 10th column is read-name
+        rids_dict[readid] = 1
 
 print(f"Total chimeric readids:{len(rids_dict)}")
 
-if args.paired: # paired-end
+if args.paired:  # paired-end
     for read in inBAM.fetch(until_eof=True):
-        if not read.is_proper_pair or read.is_secondary or read.is_supplementary or read.is_unmapped:
+        if (
+            not read.is_proper_pair
+            or read.is_secondary
+            or read.is_supplementary
+            or read.is_unmapped
+        ):
             continue
         qname = read.query_name
-        if qname in rids_dict: # "in" dict is much faster than "in" list
-            continue # if readid is in dict then it is a junction read ... so ignore it!
+        if qname in rids_dict:  # "in" dict is much faster than "in" list
+            continue  # if readid is in dict then it is a junction read ... so ignore it!
         else:
             outBAM.write(read)
-else: # single-end
-    incount=0
-    outcount=0
+else:  # single-end
+    incount = 0
+    outcount = 0
     for read in inBAM.fetch(until_eof=True):
-        incount+=1
-        if incount%1000==0:
+        incount += 1
+        if incount % 1000 == 0:
             print(f"{incount/1000000:.4f}m reads read in")
             print(f"{outcount/1000000:.4f}m reads written out")
         if read.is_secondary or read.is_supplementary or read.is_unmapped:
@@ -131,11 +145,10 @@ def is_valid_read(self):
         if qname in rids_dict:
             continue
         else:
-            outcount+=1
+            outcount += 1
             outBAM.write(read)
 
 
-
 inBAM.close()
 
 outBAM.close()
diff --git a/workflow/scripts/filter_bam_for_splice_reads.py b/workflow/scripts/filter_bam_for_splice_reads.py
index ac04f76..51eee92 100755
--- a/workflow/scripts/filter_bam_for_splice_reads.py
+++ b/workflow/scripts/filter_bam_for_splice_reads.py
@@ -1,6 +1,7 @@
 import pysam
 import sys
 import argparse
+
 # """
 # Script takes a STAR 2p BAM file and tab-delimited file with splice junctions in the first 3 columns,
 # and outputs spliced-only alignments
@@ -15,87 +16,96 @@
 # 	path to output BAM file
 # """
 
-parser = argparse.ArgumentParser(description='extract spliced reads from bam file')
-parser.add_argument('--inbam',dest='inbam',required=True,help='STAR bam file with index')
-parser.add_argument('--tab',dest='tab',required=True,help='tab file with splice junctions in the first 3 columns')
-parser.add_argument('--outbam',dest='outbam',required=True,help='Output bam filename')
-args=parser.parse_args()
+parser = argparse.ArgumentParser(description="extract spliced reads from bam file")
+parser.add_argument(
+    "--inbam", dest="inbam", required=True, help="STAR bam file with index"
+)
+parser.add_argument(
+    "--tab",
+    dest="tab",
+    required=True,
+    help="tab file with splice junctions in the first 3 columns",
+)
+parser.add_argument(
+    "--outbam", dest="outbam", required=True, help="Output bam filename"
+)
+args = parser.parse_args()
 
-inbam = pysam.AlignmentFile(args.inbam, "rb" )
-outbam = pysam.AlignmentFile(args.outbam, "wb", template=inbam )
+inbam = pysam.AlignmentFile(args.inbam, "rb")
+outbam = pysam.AlignmentFile(args.outbam, "wb", template=inbam)
 tab = open(args.tab)
 junctions = tab.readlines()
 junctions.pop(0)
 tab.close()
-count=0
-threshold=0
-incr=5
+count = 0
+threshold = 0
+incr = 5
 for l in junctions:
-    count+=1
-    if count*100/len(junctions)>threshold:
-        print("%d %% complete!"% (threshold))
-        threshold+=incr
-    l=l.strip().split("\t")
-    c=l[0]
-    s=int(l[1])
-    e=int(l[2])
-# get chromosome name, start and end positions for the junction
-# and fetch reads aligning to this region using "fetch"
-# ref: https://pysam.readthedocs.io/en/latest/api.html#pysam.FastaFile.fetch
-    for read in inbam.fetch(c,s-200,e+200):
-    # for read in inbam.fetch(c):
-# get cigarstring to replace softclips
-        cigar=read.cigarstring
-# replace softclips with hardclip
-        cigar=cigar.replace("S","H")
-        cigart=read.cigartuples
+    count += 1
+    if count * 100 / len(junctions) > threshold:
+        print("%d %% complete!" % (threshold))
+        threshold += incr
+    l = l.strip().split("\t")
+    c = l[0]
+    s = int(l[1])
+    e = int(l[2])
+    # get chromosome name, start and end positions for the junction
+    # and fetch reads aligning to this region using "fetch"
+    # ref: https://pysam.readthedocs.io/en/latest/api.html#pysam.FastaFile.fetch
+    for read in inbam.fetch(c, s - 200, e + 200):
+        # for read in inbam.fetch(c):
+        # get cigarstring to replace softclips
+        cigar = read.cigarstring
+        # replace softclips with hardclip
+        cigar = cigar.replace("S", "H")
+        cigart = read.cigartuples
 
-# if cigartuple contains
-# N	BAM_CREF_SKIP	3
-# then it is a spliced read!
+        # if cigartuple contains
+        # N	BAM_CREF_SKIP	3
+        # then it is a spliced read!
 
-# ref: https://pysam.readthedocs.io/en/latest/api.html#pysam.AlignedSegment.cigartuples
-# cigartuples operation list is
-# M	BAM_CMATCH	0
-# I	BAM_CINS	1
-# D	BAM_CDEL	2
-# N	BAM_CREF_SKIP	3
-# S	BAM_CSOFT_CLIP	4
-# H	BAM_CHARD_CLIP	5
-# P	BAM_CPAD	6
-# =	BAM_CEQUAL	7
-# X	BAM_CDIFF	8
-# B	BAM_CBACK	9
-# cigartuples returns a list of tuples of (operation, length)
-# eg. 30M is returned as [(0, 30)]
-# N in CIGAR score is index 3 in tuple represents BAM_CREF_SKIP indicative of spliced read
-        if 3 in list(map(lambda z:z[0],cigart)):
-# cigart[list(map(lambda z:z[0],cigart)).index(0):]
-# get the first item of each tuple in the list of tuples
-# first item will be the operation from the above table
-# if 3 is in the new list ... means that there was a BAM_CREF_SKIP
-# BAM_CREF_SKIP in CIGAR score right after a match (BAM_CMATCH)
-# suggests spliced alignment aka spliced read
-            cigart=cigart[list(map(lambda z:z[0],cigart)).index(0):]
-            if cigart[0][0]==0 and cigart[1][0]==3:
-# CIGAR has match ... followed by skip ... aka spliced read
-# so gather start and end coordinates
-                start=read.reference_start+cigart[0][1]+1
-                end=start+cigart[1][1]-1
+        # ref: https://pysam.readthedocs.io/en/latest/api.html#pysam.AlignedSegment.cigartuples
+        # cigartuples operation list is
+        # M	BAM_CMATCH	0
+        # I	BAM_CINS	1
+        # D	BAM_CDEL	2
+        # N	BAM_CREF_SKIP	3
+        # S	BAM_CSOFT_CLIP	4
+        # H	BAM_CHARD_CLIP	5
+        # P	BAM_CPAD	6
+        # =	BAM_CEQUAL	7
+        # X	BAM_CDIFF	8
+        # B	BAM_CBACK	9
+        # cigartuples returns a list of tuples of (operation, length)
+        # eg. 30M is returned as [(0, 30)]
+        # N in CIGAR score is index 3 in tuple represents BAM_CREF_SKIP indicative of spliced read
+        if 3 in list(map(lambda z: z[0], cigart)):
+            # cigart[list(map(lambda z:z[0],cigart)).index(0):]
+            # get the first item of each tuple in the list of tuples
+            # first item will be the operation from the above table
+            # if 3 is in the new list ... means that there was a BAM_CREF_SKIP
+            # BAM_CREF_SKIP in CIGAR score right after a match (BAM_CMATCH)
+            # suggests spliced alignment aka spliced read
+            cigart = cigart[list(map(lambda z: z[0], cigart)).index(0) :]
+            if cigart[0][0] == 0 and cigart[1][0] == 3:
+                # CIGAR has match ... followed by skip ... aka spliced read
+                # so gather start and end coordinates
+                start = read.reference_start + cigart[0][1] + 1
+                end = start + cigart[1][1] - 1
                 # print(read)
                 # print(cigart)
                 # print(c+"##"+str(s)+"##"+str(e),start-s,end-e,read.get_reference_positions(full_length=True),read)
-                if start==s and end==e:
-# check if start and end are in the junctions file
-# if yes then write to output file
+                if start == s and end == e:
+                    # check if start and end are in the junctions file
+                    # if yes then write to output file
                     # print(read)
                     # print(cigart)
                     # print(start,end)
                     # print(s,e)
                     # exit()
                     outbam.write(read)
-                    #print(read.query_name,c,s,e,start,end)
-                    #print(read)
+                    # print(read.query_name,c,s,e,start,end)
+                    # print(read)
 inbam.close()
 outbam.close()
-exit()
\ No newline at end of file
+exit()
diff --git a/workflow/scripts/filter_ciriout.py b/workflow/scripts/filter_ciriout.py
index 686ddac..830d128 100755
--- a/workflow/scripts/filter_ciriout.py
+++ b/workflow/scripts/filter_ciriout.py
@@ -2,6 +2,7 @@
 import argparse
 import inspect
 
+
 # CIRI2 output file has following columns:
 # | #  | colName              | Description                                                                                                                                                                                                                                                                                                                                                                                                                                                                                             |
 # |----|----------------------|---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
@@ -19,105 +20,183 @@
 # | 12 | junction_reads_ID    | all of the circular junction read IDs (split by ",")                                                                                                                                                                                                                                                                                                                                                                                                                                                    |
 # ref: https://ciri-cookbook.readthedocs.io/en/latest/CIRI2.html#an-example-of-running-ciri2
 class CIRIOUT:
-    def __init__(self,entry,chrom="",start=0,end=0,nreads=0,size=0,host_additive_virus="additive",filter_out=False):
-        self.entry=entry
-        l=entry.strip().split('\t')
-        self.chrom=l[1]
-        self.start=int(l[2])-1
-        self.end=int(l[3])
-        self.nreads=int(l[4])
-        self.size=self.start-self.end
-        self.filter_out=False
-    
+    def __init__(
+        self,
+        entry,
+        chrom="",
+        start=0,
+        end=0,
+        nreads=0,
+        size=0,
+        host_additive_virus="additive",
+        filter_out=False,
+    ):
+        self.entry = entry
+        l = entry.strip().split("\t")
+        self.chrom = l[1]
+        self.start = int(l[2]) - 1
+        self.end = int(l[3])
+        self.nreads = int(l[4])
+        self.size = self.start - self.end
+        self.filter_out = False
+
     # @classmethod
-    def set_host_additive_virus(self,regions):
-        self.host_additive_virus=_get_host_additive_virus(regions=regions,seqname=self.chrom)
-    
+    def set_host_additive_virus(self, regions):
+        self.host_additive_virus = _get_host_additive_virus(
+            regions=regions, seqname=self.chrom
+        )
+
     # @classmethod
-    def filter_by_nreads(self,minreads):
-        if self.nreads < minreads: self.filter_out=True
-    
+    def filter_by_nreads(self, minreads):
+        if self.nreads < minreads:
+            self.filter_out = True
+
     # @classmethod
-    def filter_by_size(self,host_min,host_max,virus_min,virus_max):
-        if self.host_additive_virus=="host":
-            if self.size < host_min : self.filter_out=True
-            if self.size > host_max : self.filter_out=True
-        elif self.host_additive_virus=="virus":
-            if self.size < virus_min : self.filter_out=True
-            if self.size > virus_max : self.filter_out=True
+    def filter_by_size(self, host_min, host_max, virus_min, virus_max):
+        if self.host_additive_virus == "host":
+            if self.size < host_min:
+                self.filter_out = True
+            if self.size > host_max:
+                self.filter_out = True
+        elif self.host_additive_virus == "virus":
+            if self.size < virus_min:
+                self.filter_out = True
+            if self.size > virus_max:
+                self.filter_out = True
         else:
-            self.filter_out=True
+            self.filter_out = True
+
 
-    
-def read_regions(regionsfile,host,additives,viruses):
-    host=host.split(",")
-    additives=additives.split(",")
-    viruses=viruses.split(",")
-    infile=open(regionsfile,'r')
-    regions=dict()
+def read_regions(regionsfile, host, additives, viruses):
+    host = host.split(",")
+    additives = additives.split(",")
+    viruses = viruses.split(",")
+    infile = open(regionsfile, "r")
+    regions = dict()
     for l in infile.readlines():
         l = l.strip().split("\t")
-        region_name=l[0]
-        regions[region_name]=dict()
-        regions[region_name]['sequences']=dict()
+        region_name = l[0]
+        regions[region_name] = dict()
+        regions[region_name]["sequences"] = dict()
         if region_name in host:
-            regions[region_name]['host_additive_virus']="host"
+            regions[region_name]["host_additive_virus"] = "host"
         elif region_name in additives:
-            regions[region_name]['host_additive_virus']="additive"
+            regions[region_name]["host_additive_virus"] = "additive"
         elif region_name in viruses:
-            regions[region_name]['host_additive_virus']="virus"
+            regions[region_name]["host_additive_virus"] = "virus"
         else:
             exit("%s has unknown region. Its not a host or a additive or a virus!!")
-        sequence_names=l[1].split()
+        sequence_names = l[1].split()
         for s in sequence_names:
-            regions[region_name]['sequences'][s]=1
-    return regions        
+            regions[region_name]["sequences"][s] = 1
+    return regions
+
 
-def _get_host_additive_virus(regions,seqname):
-    for k,v in regions.items():
-        if seqname in v['sequences']:
-            return v['host_additive_virus']
+def _get_host_additive_virus(regions, seqname):
+    for k, v in regions.items():
+        if seqname in v["sequences"]:
+            return v["host_additive_virus"]
     else:
-        exit("Sequence: %s does not have a region."%(seqname))
+        exit("Sequence: %s does not have a region." % (seqname))
 
 
-parser = argparse.ArgumentParser(description='Filter CIRI2 Per Sample Counts Table')
-parser.add_argument('--ciriout', dest='ciriout', type=str, required=True,
-                    help='ciri out file')
-parser.add_argument('--back_spliced_min_reads', dest='back_spliced_min_reads', type=int, required=True,
-                    help='back_spliced minimum read threshold') 
-parser.add_argument('--host', dest='host', type=str, required=True,
-                    help='host name eg.hg38... single value...host_filter_min/host_filter_max filters are applied to this region only')
-parser.add_argument('--additives', dest='additives', type=str, required=True,
-                    help='additive name(s) eg.ERCC... comma-separated list... all BSJs in this region are filtered out')
-parser.add_argument('--viruses', dest='viruses', type=str, required=True,
-                    help='virus name(s) eg.NC_009333.1... comma-separated list...virus_filter_min/virus_filter_max filters are applied to this region only')
-parser.add_argument('--host_filter_min', dest='host_filter_min', type=int, required=False, default=150,
-                    help='min BSJ size filter for host')
-parser.add_argument('--virus_filter_min', dest='virus_filter_min', type=int, required=False, default=150,
-                    help='min BSJ size filter for virus')
-parser.add_argument('--host_filter_max', dest='host_filter_max', type=int, required=False, default=5000,
-                    help='max BSJ size filter for host')
-parser.add_argument('--virus_filter_max', dest='virus_filter_max', type=int, required=False, default=5000,
-                    help='max BSJ size filter for virus')
-parser.add_argument('--regions', dest='regions', type=str, required=True,
-                    help='regions file eg. ref.fa.regions')
-parser.add_argument('-o',dest='outfile',required=True,help='filtered ciriout file')
+parser = argparse.ArgumentParser(description="Filter CIRI2 Per Sample Counts Table")
+parser.add_argument(
+    "--ciriout", dest="ciriout", type=str, required=True, help="ciri out file"
+)
+parser.add_argument(
+    "--back_spliced_min_reads",
+    dest="back_spliced_min_reads",
+    type=int,
+    required=True,
+    help="back_spliced minimum read threshold",
+)
+parser.add_argument(
+    "--host",
+    dest="host",
+    type=str,
+    required=True,
+    help="host name eg.hg38... single value...host_filter_min/host_filter_max filters are applied to this region only",
+)
+parser.add_argument(
+    "--additives",
+    dest="additives",
+    type=str,
+    required=True,
+    help="additive name(s) eg.ERCC... comma-separated list... all BSJs in this region are filtered out",
+)
+parser.add_argument(
+    "--viruses",
+    dest="viruses",
+    type=str,
+    required=True,
+    help="virus name(s) eg.NC_009333.1... comma-separated list...virus_filter_min/virus_filter_max filters are applied to this region only",
+)
+parser.add_argument(
+    "--host_filter_min",
+    dest="host_filter_min",
+    type=int,
+    required=False,
+    default=150,
+    help="min BSJ size filter for host",
+)
+parser.add_argument(
+    "--virus_filter_min",
+    dest="virus_filter_min",
+    type=int,
+    required=False,
+    default=150,
+    help="min BSJ size filter for virus",
+)
+parser.add_argument(
+    "--host_filter_max",
+    dest="host_filter_max",
+    type=int,
+    required=False,
+    default=5000,
+    help="max BSJ size filter for host",
+)
+parser.add_argument(
+    "--virus_filter_max",
+    dest="virus_filter_max",
+    type=int,
+    required=False,
+    default=5000,
+    help="max BSJ size filter for virus",
+)
+parser.add_argument(
+    "--regions",
+    dest="regions",
+    type=str,
+    required=True,
+    help="regions file eg. ref.fa.regions",
+)
+parser.add_argument("-o", dest="outfile", required=True, help="filtered ciriout file")
 args = parser.parse_args()
 
-regions = read_regions(regionsfile=args.regions,host=args.host,additives=args.additives,viruses=args.viruses)
-outfile = open(args.outfile,'w')
-infile  = open(args.ciriout,'r')
+regions = read_regions(
+    regionsfile=args.regions,
+    host=args.host,
+    additives=args.additives,
+    viruses=args.viruses,
+)
+outfile = open(args.outfile, "w")
+infile = open(args.ciriout, "r")
 alllines = infile.readlines()
 header = alllines.pop(0)
-outfile.write("%s"%(header))
+outfile.write("%s" % (header))
 infile.close()
 for l in alllines:
     out = CIRIOUT(entry=l)
     out.set_host_additive_virus(regions=regions)
     out.filter_by_nreads(args.back_spliced_min_reads)
     if out.filter_out == False:
-        out.filter_by_size(host_min=args.host_filter_min,host_max=args.host_filter_max,virus_min=args.virus_filter_min,virus_max=args.virus_filter_max)
+        out.filter_by_size(
+            host_min=args.host_filter_min,
+            host_max=args.host_filter_max,
+            virus_min=args.virus_filter_min,
+            virus_max=args.virus_filter_max,
+        )
         if out.filter_out == True:
             outfile.write(l)
 outfile.close()
diff --git a/workflow/scripts/filter_dcc.py b/workflow/scripts/filter_dcc.py
index f518d6a..a9e1707 100755
--- a/workflow/scripts/filter_dcc.py
+++ b/workflow/scripts/filter_dcc.py
@@ -2,6 +2,7 @@
 import argparse
 import inspect
 
+
 # DCC counts table input/output file has following columns:
 # | #  | colName              | Description                                                                                                                                                                                                                                                                                                                                                                                                                                                                                             |
 # |----|----------------------|-----------------------------------------------------------------|
@@ -12,105 +13,188 @@
 # | 5  | read_count           |                                                                 |
 # | 6  | dcc_annotation       | this is JunctionType##Start-End Region from CircCoordinates file|
 class DCC:
-    def __init__(self,entry,chrom="",start=0,end=0,nreads=0,size=0,host_additive_virus="additive",filter_out=False):
-        self.entry=entry
-        l=entry.strip().split('\t')
-        self.chrom=l[0]
-        self.start=int(l[1])
-        self.end=int(l[2])
-        self.nreads=int(l[4])
-        self.size=self.start-self.end
-        self.filter_out=False
-    
+    def __init__(
+        self,
+        entry,
+        chrom="",
+        start=0,
+        end=0,
+        nreads=0,
+        size=0,
+        host_additive_virus="additive",
+        filter_out=False,
+    ):
+        self.entry = entry
+        l = entry.strip().split("\t")
+        self.chrom = l[0]
+        self.start = int(l[1])
+        self.end = int(l[2])
+        self.nreads = int(l[4])
+        self.size = self.start - self.end
+        self.filter_out = False
+
     # @classmethod
-    def set_host_additive_virus(self,regions):
-        self.host_additive_virus=_get_host_additive_virus(regions=regions,seqname=self.chrom)
-    
+    def set_host_additive_virus(self, regions):
+        self.host_additive_virus = _get_host_additive_virus(
+            regions=regions, seqname=self.chrom
+        )
+
     # @classmethod
-    def filter_by_nreads(self,minreads):
-        if self.nreads < minreads: self.filter_out=True
-    
+    def filter_by_nreads(self, minreads):
+        if self.nreads < minreads:
+            self.filter_out = True
+
     # @classmethod
-    def filter_by_size(self,host_min,host_max,virus_min,virus_max):
-        if self.host_additive_virus=="host":
-            if self.size < host_min : self.filter_out=True
-            if self.size > host_max : self.filter_out=True
-        elif self.host_additive_virus=="virus":
-            if self.size < virus_min : self.filter_out=True
-            if self.size > virus_max : self.filter_out=True
+    def filter_by_size(self, host_min, host_max, virus_min, virus_max):
+        if self.host_additive_virus == "host":
+            if self.size < host_min:
+                self.filter_out = True
+            if self.size > host_max:
+                self.filter_out = True
+        elif self.host_additive_virus == "virus":
+            if self.size < virus_min:
+                self.filter_out = True
+            if self.size > virus_max:
+                self.filter_out = True
         else:
-            self.filter_out=True
+            self.filter_out = True
+
 
-    
-def read_regions(regionsfile,host,additives,viruses):
-    host=host.split(",")
-    additives=additives.split(",")
-    viruses=viruses.split(",")
-    infile=open(regionsfile,'r')
-    regions=dict()
+def read_regions(regionsfile, host, additives, viruses):
+    host = host.split(",")
+    additives = additives.split(",")
+    viruses = viruses.split(",")
+    infile = open(regionsfile, "r")
+    regions = dict()
     for l in infile.readlines():
         l = l.strip().split("\t")
-        region_name=l[0]
-        regions[region_name]=dict()
-        regions[region_name]['sequences']=dict()
+        region_name = l[0]
+        regions[region_name] = dict()
+        regions[region_name]["sequences"] = dict()
         if region_name in host:
-            regions[region_name]['host_additive_virus']="host"
+            regions[region_name]["host_additive_virus"] = "host"
         elif region_name in additives:
-            regions[region_name]['host_additive_virus']="additive"
+            regions[region_name]["host_additive_virus"] = "additive"
         elif region_name in viruses:
-            regions[region_name]['host_additive_virus']="virus"
+            regions[region_name]["host_additive_virus"] = "virus"
         else:
             exit("%s has unknown region. Its not a host or a additive or a virus!!")
-        sequence_names=l[1].split()
+        sequence_names = l[1].split()
         for s in sequence_names:
-            regions[region_name]['sequences'][s]=1
-    return regions        
+            regions[region_name]["sequences"][s] = 1
+    return regions
+
 
-def _get_host_additive_virus(regions,seqname):
-    for k,v in regions.items():
-        if seqname in v['sequences']:
-            return v['host_additive_virus']
+def _get_host_additive_virus(regions, seqname):
+    for k, v in regions.items():
+        if seqname in v["sequences"]:
+            return v["host_additive_virus"]
     else:
-        exit("Sequence: %s does not have a region."%(seqname))
+        exit("Sequence: %s does not have a region." % (seqname))
 
 
-parser = argparse.ArgumentParser(description='Filter DCC Per Sample Counts Table')
-parser.add_argument('--in_dcc_counts_table', dest='intable', type=str, required=True,
-                    help='DCC in file')
-parser.add_argument('--back_spliced_min_reads', dest='back_spliced_min_reads', type=int, required=True,
-                    help='back_spliced minimum read threshold') 
-parser.add_argument('--host', dest='host', type=str, required=True,
-                    help='host name eg.hg38... single value...host_filter_min/host_filter_max filters are applied to this region only')
-parser.add_argument('--additives', dest='additives', type=str, required=True,
-                    help='additive name(s) eg.ERCC... comma-separated list... all BSJs in this region are filtered out')
-parser.add_argument('--viruses', dest='viruses', type=str, required=True,
-                    help='virus name(s) eg.NC_009333.1... comma-separated list...virus_filter_min/virus_filter_max filters are applied to this region only')
-parser.add_argument('--host_filter_min', dest='host_filter_min', type=int, required=False, default=150,
-                    help='min BSJ size filter for host')
-parser.add_argument('--virus_filter_min', dest='virus_filter_min', type=int, required=False, default=150,
-                    help='min BSJ size filter for virus')
-parser.add_argument('--host_filter_max', dest='host_filter_max', type=int, required=False, default=5000,
-                    help='max BSJ size filter for host')
-parser.add_argument('--virus_filter_max', dest='virus_filter_max', type=int, required=False, default=5000,
-                    help='max BSJ size filter for virus')
-parser.add_argument('--regions', dest='regions', type=str, required=True,
-                    help='regions file eg. ref.fa.regions')
-parser.add_argument('--out_dcc_filtered_counts_table',dest='outfile',required=True,help='filtered DCC out file')
+parser = argparse.ArgumentParser(description="Filter DCC Per Sample Counts Table")
+parser.add_argument(
+    "--in_dcc_counts_table", dest="intable", type=str, required=True, help="DCC in file"
+)
+parser.add_argument(
+    "--back_spliced_min_reads",
+    dest="back_spliced_min_reads",
+    type=int,
+    required=True,
+    help="back_spliced minimum read threshold",
+)
+parser.add_argument(
+    "--host",
+    dest="host",
+    type=str,
+    required=True,
+    help="host name eg.hg38... single value...host_filter_min/host_filter_max filters are applied to this region only",
+)
+parser.add_argument(
+    "--additives",
+    dest="additives",
+    type=str,
+    required=True,
+    help="additive name(s) eg.ERCC... comma-separated list... all BSJs in this region are filtered out",
+)
+parser.add_argument(
+    "--viruses",
+    dest="viruses",
+    type=str,
+    required=True,
+    help="virus name(s) eg.NC_009333.1... comma-separated list...virus_filter_min/virus_filter_max filters are applied to this region only",
+)
+parser.add_argument(
+    "--host_filter_min",
+    dest="host_filter_min",
+    type=int,
+    required=False,
+    default=150,
+    help="min BSJ size filter for host",
+)
+parser.add_argument(
+    "--virus_filter_min",
+    dest="virus_filter_min",
+    type=int,
+    required=False,
+    default=150,
+    help="min BSJ size filter for virus",
+)
+parser.add_argument(
+    "--host_filter_max",
+    dest="host_filter_max",
+    type=int,
+    required=False,
+    default=5000,
+    help="max BSJ size filter for host",
+)
+parser.add_argument(
+    "--virus_filter_max",
+    dest="virus_filter_max",
+    type=int,
+    required=False,
+    default=5000,
+    help="max BSJ size filter for virus",
+)
+parser.add_argument(
+    "--regions",
+    dest="regions",
+    type=str,
+    required=True,
+    help="regions file eg. ref.fa.regions",
+)
+parser.add_argument(
+    "--out_dcc_filtered_counts_table",
+    dest="outfile",
+    required=True,
+    help="filtered DCC out file",
+)
 args = parser.parse_args()
 
-regions = read_regions(regionsfile=args.regions,host=args.host,additives=args.additives,viruses=args.viruses)
-outfile = open(args.outfile,'w')
-infile  = open(args.intable,'r')
+regions = read_regions(
+    regionsfile=args.regions,
+    host=args.host,
+    additives=args.additives,
+    viruses=args.viruses,
+)
+outfile = open(args.outfile, "w")
+infile = open(args.intable, "r")
 alllines = infile.readlines()
 header = alllines.pop(0)
-outfile.write("%s"%(header))
+outfile.write("%s" % (header))
 infile.close()
 for l in alllines:
     out = DCC(entry=l)
     out.set_host_additive_virus(regions=regions)
     out.filter_by_nreads(args.back_spliced_min_reads)
     if out.filter_out == False:
-        out.filter_by_size(host_min=args.host_filter_min,host_max=args.host_filter_max,virus_min=args.virus_filter_min,virus_max=args.virus_filter_max)
+        out.filter_by_size(
+            host_min=args.host_filter_min,
+            host_max=args.host_filter_max,
+            virus_min=args.virus_filter_min,
+            virus_max=args.virus_filter_max,
+        )
         if out.filter_out == True:
             outfile.write(l)
 outfile.close()
diff --git a/workflow/scripts/filter_junction.py b/workflow/scripts/filter_junction.py
index 7a276c3..31acac6 100755
--- a/workflow/scripts/filter_junction.py
+++ b/workflow/scripts/filter_junction.py
@@ -1,6 +1,6 @@
 import sys
-for i in open(sys.argv[1]).readlines():
-	j=i.split("\t")
-	if (j[0]=="chrKSHV") and (j[3]=="chrKSHV") :
-		print(i.strip())
 
+for i in open(sys.argv[1]).readlines():
+    j = i.split("\t")
+    if (j[0] == "chrKSHV") and (j[3] == "chrKSHV"):
+        print(i.strip())
diff --git a/workflow/scripts/filter_junction_human.py b/workflow/scripts/filter_junction_human.py
index c8e38cd..d315ae9 100755
--- a/workflow/scripts/filter_junction_human.py
+++ b/workflow/scripts/filter_junction_human.py
@@ -1,6 +1,6 @@
 import sys
-for i in open(sys.argv[1]).readlines():
-	j=i.split("\t")
-	if (j[0]!="chrKSHV") and (j[3]==j[0]) :
-		print(i.strip())
 
+for i in open(sys.argv[1]).readlines():
+    j = i.split("\t")
+    if (j[0] != "chrKSHV") and (j[3] == j[0]):
+        print(i.strip())
diff --git a/workflow/scripts/fix_gtfs.py b/workflow/scripts/fix_gtfs.py
index 3eee716..5f5615e 100755
--- a/workflow/scripts/fix_gtfs.py
+++ b/workflow/scripts/fix_gtfs.py
@@ -1,86 +1,98 @@
 import argparse
 import pandas
 
-debug=0
+debug = 0
+
 
 def get_attributes(attstr):
-	att = dict()
-	attlist = attstr.strip().split(";")
-	if debug==1: print(attstr)
-	if debug==1: print(attlist)
-	for item in attlist:
-		x = item.strip()
-		if debug==1: print(x)
-		x = x.replace("\"","")
-		if debug==1: print(x)
-		x = x.split()
-		if debug==1: print(x)
-		if len(x)!=2: continue
-		key = x.pop(0)
-		key = key.replace(":","")
-		value = " ".join(x)
-		value = value.replace(":","_")
-		att[key] = value
-	return att
+    att = dict()
+    attlist = attstr.strip().split(";")
+    if debug == 1:
+        print(attstr)
+    if debug == 1:
+        print(attlist)
+    for item in attlist:
+        x = item.strip()
+        if debug == 1:
+            print(x)
+        x = x.replace('"', "")
+        if debug == 1:
+            print(x)
+        x = x.split()
+        if debug == 1:
+            print(x)
+        if len(x) != 2:
+            continue
+        key = x.pop(0)
+        key = key.replace(":", "")
+        value = " ".join(x)
+        value = value.replace(":", "_")
+        att[key] = value
+    return att
+
 
 def get_attstr(att):
-	strlist=[]
-	for k,v in att.items():
-		s = "%s \"%s\""%(k,v)
-		strlist.append(s)
-	attstr = "; ".join(strlist)
-	return attstr+";"
+    strlist = []
+    for k, v in att.items():
+        s = '%s "%s"' % (k, v)
+        strlist.append(s)
+    attstr = "; ".join(strlist)
+    return attstr + ";"
 
-parser = argparse.ArgumentParser(description='fix gtf file')
-parser.add_argument('--ingtf', dest='ingtf', type=str, required=True,
-		                    help='input gtf file')
-parser.add_argument('--outgtf', dest='outgtf', type=str, required=True,
-		                    help='output gtf file')
+
+parser = argparse.ArgumentParser(description="fix gtf file")
+parser.add_argument(
+    "--ingtf", dest="ingtf", type=str, required=True, help="input gtf file"
+)
+parser.add_argument(
+    "--outgtf", dest="outgtf", type=str, required=True, help="output gtf file"
+)
 args = parser.parse_args()
 
 gene_id_2_gene_name = dict()
 
-with open(args.ingtf, 'r') as ingtf:
-	for line in ingtf:
-		if line.startswith("#"): continue
-		line = line.strip()
-		line = line.split("\t")
-		if len(line) != 9:
-			print(line)
-			exit("ERROR ... line does not have 9 items!")
-		attributes = get_attributes(line[8])
-		if debug==1: print(line)
-		if debug==1: print(attributes)
-		if not attributes["gene_id"] in gene_id_2_gene_name:
-			if "gene_name" in attributes:
-				gene_id_2_gene_name[attributes["gene_id"]] = attributes["gene_name"]
-			else:
-				gene_id_2_gene_name["gene_id"] = attributes["gene_id"]
-
-with open("gene_id_2_gene_name.tsv",'w') as tmp:
-	for k,v in gene_id_2_gene_name.items():
-		tmp.write("%s\t%s\n"%(k,v))
+with open(args.ingtf, "r") as ingtf:
+    for line in ingtf:
+        if line.startswith("#"):
+            continue
+        line = line.strip()
+        line = line.split("\t")
+        if len(line) != 9:
+            print(line)
+            exit("ERROR ... line does not have 9 items!")
+        attributes = get_attributes(line[8])
+        if debug == 1:
+            print(line)
+        if debug == 1:
+            print(attributes)
+        if not attributes["gene_id"] in gene_id_2_gene_name:
+            if "gene_name" in attributes:
+                gene_id_2_gene_name[attributes["gene_id"]] = attributes["gene_name"]
+            else:
+                gene_id_2_gene_name["gene_id"] = attributes["gene_id"]
 
-with open(args.ingtf,'r') as ingtf, open(args.outgtf,'w') as outgtf:
-	for line in ingtf:
-		if line.startswith("#"): 
-			outgtf.write(line)
-			continue
-		line = line.strip()
-		line = line.split("\t")
-		attributes = get_attributes(line[8])
-		if not "gene_name" in attributes:
-			if not "gene_id" in attributes:
-				print(line)
-				print(attributes)
-				exit("ERROR in this line!")
-			if not attributes["gene_id"] in gene_id_2_gene_name:
-				print(line)
-				print(attributes)
-				print(attributes["gene_id"])
-				exit("ERROR2 in this line!")
-			attributes["gene_name"] = gene_id_2_gene_name[attributes["gene_id"]]
-		line[8] = get_attstr(attributes)
-		outgtf.write("\t".join(line)+"\n")
+with open("gene_id_2_gene_name.tsv", "w") as tmp:
+    for k, v in gene_id_2_gene_name.items():
+        tmp.write("%s\t%s\n" % (k, v))
 
-		
+with open(args.ingtf, "r") as ingtf, open(args.outgtf, "w") as outgtf:
+    for line in ingtf:
+        if line.startswith("#"):
+            outgtf.write(line)
+            continue
+        line = line.strip()
+        line = line.split("\t")
+        attributes = get_attributes(line[8])
+        if not "gene_name" in attributes:
+            if not "gene_id" in attributes:
+                print(line)
+                print(attributes)
+                exit("ERROR in this line!")
+            if not attributes["gene_id"] in gene_id_2_gene_name:
+                print(line)
+                print(attributes)
+                print(attributes["gene_id"])
+                exit("ERROR2 in this line!")
+            attributes["gene_name"] = gene_id_2_gene_name[attributes["gene_id"]]
+        line[8] = get_attstr(attributes)
+        outgtf.write("\t".join(line) + "\n")
diff --git a/workflow/scripts/fix_refseq_gtf.py b/workflow/scripts/fix_refseq_gtf.py
index 130fe2a..4e06fca 100755
--- a/workflow/scripts/fix_refseq_gtf.py
+++ b/workflow/scripts/fix_refseq_gtf.py
@@ -3,266 +3,293 @@
 # Date: Aug, 2020
 
 
-import sys,copy,argparse
+import sys, copy, argparse
 
 
 parser = argparse.ArgumentParser()
-parser.add_argument('-i',dest='ingtf', required=True, type=str, help="Input RefSeq GTF ..downloaded from NCBI ftp server")
-parser.add_argument('-o',dest='outgtf', required=True, type=str, help="Modified Output RefSeq GTF")
+parser.add_argument(
+    "-i",
+    dest="ingtf",
+    required=True,
+    type=str,
+    help="Input RefSeq GTF ..downloaded from NCBI ftp server",
+)
+parser.add_argument(
+    "-o", dest="outgtf", required=True, type=str, help="Modified Output RefSeq GTF"
+)
 args = parser.parse_args()
 
+
 def get_gene_id(column9):
-    x=column9.strip().split()
-    for i,value in enumerate(x):
-        if value=="gene_id":
-            gene_id_index=i+1
+    x = column9.strip().split()
+    for i, value in enumerate(x):
+        if value == "gene_id":
+            gene_id_index = i + 1
             break
-    gene_id=x[gene_id_index]
+    gene_id = x[gene_id_index]
     return gene_id
 
+
 def get_gene_biotype(column9):
-    x=column9.strip().split()
-    found=0
-    for i,value in enumerate(x):
-        if value=="gene_type" or value=="gene_biotype":
-            gene_biotype_index=i+1
-            found=1
+    x = column9.strip().split()
+    found = 0
+    for i, value in enumerate(x):
+        if value == "gene_type" or value == "gene_biotype":
+            gene_biotype_index = i + 1
+            found = 1
             break
-    if found==0:
+    if found == 0:
         return '"unknown";'
-    gene_biotype=x[gene_biotype_index]
+    gene_biotype = x[gene_biotype_index]
     return gene_biotype
 
+
 def get_gene_name(column9):
-    x=column9.strip().split()
-    found=0
-    for i,value in enumerate(x):
-        if value=="gene" or value=="gene_name":
-            gene_index=i+1
-            found=1
+    x = column9.strip().split()
+    found = 0
+    for i, value in enumerate(x):
+        if value == "gene" or value == "gene_name":
+            gene_index = i + 1
+            found = 1
             break
-    if found==0:
+    if found == 0:
         return ""
-    gene_name=x[gene_index]
+    gene_name = x[gene_index]
     return gene_name
 
+
 def get_transcript_id(column9):
-    x=column9.strip().split()
-    found=0
-    for i,value in enumerate(x):
-        if value=="transcript_id":
-            transcript_id_index=i+1
-            found=1
+    x = column9.strip().split()
+    found = 0
+    for i, value in enumerate(x):
+        if value == "transcript_id":
+            transcript_id_index = i + 1
+            found = 1
             break
-    if found==0:
+    if found == 0:
         return '"transcript_id_unknown";'
-    transcript_id=x[transcript_id_index]
+    transcript_id = x[transcript_id_index]
     return transcript_id
 
-def fix_transcript_id(column9,g):
-    x=column9.strip().split()
-    found=0
-    for i,value in enumerate(x):
-        if value=="transcript_id":
-            transcript_id_index=i+1
-            found=1
+
+def fix_transcript_id(column9, g):
+    x = column9.strip().split()
+    found = 0
+    for i, value in enumerate(x):
+        if value == "transcript_id":
+            transcript_id_index = i + 1
+            found = 1
             break
-    x[transcript_id_index]=g
-    if found==0:
+    x[transcript_id_index] = g
+    if found == 0:
         x.append("transcript_id")
         x.append(g)
-    x=" ".join(x)
-    return x   
+    x = " ".join(x)
+    return x
 
-def create_new_transript_id(g,i):
-    n=g.split('"')
-    n[-2]+="_transcript_"+str(i)
-    n='"'.join(n)
+
+def create_new_transript_id(g, i):
+    n = g.split('"')
+    n[-2] += "_transcript_" + str(i)
+    n = '"'.join(n)
     return n
 
+
 def are_exons_present(transcript_lines):
     for l in transcript_lines:
-        l_split=l.strip().split("\t")
-        if l_split[2]=="exon":
+        l_split = l.strip().split("\t")
+        if l_split[2] == "exon":
             return True
     else:
         return False
 
-#create genelist
-genelist=[]
-gene_coords=dict()
-all_gtflines=list(filter(lambda x:not x.startswith("#"),open(args.ingtf).readlines()))
-blank_gene_id_lines=[]
+
+# create genelist
+genelist = []
+gene_coords = dict()
+all_gtflines = list(
+    filter(lambda x: not x.startswith("#"), open(args.ingtf).readlines())
+)
+blank_gene_id_lines = []
 for f in all_gtflines:
-    its_a_gene=0
-    if f.strip().split("\t")[2]=="gene":
-        its_a_gene=1
-    gene_id=get_gene_id(f.strip().split("\t")[8])
-    if gene_id=='"";':
+    its_a_gene = 0
+    if f.strip().split("\t")[2] == "gene":
+        its_a_gene = 1
+    gene_id = get_gene_id(f.strip().split("\t")[8])
+    if gene_id == '"";':
         blank_gene_id_lines.append(f)
         continue
     genelist.append(gene_id)
-    if its_a_gene==1 and not gene_id in gene_coords:
-        gene_coords[gene_id]=(int(f.strip().split("\t")[3]),int(f.strip().split("\t")[4]))
-genelist=list(set(genelist))
+    if its_a_gene == 1 and not gene_id in gene_coords:
+        gene_coords[gene_id] = (
+            int(f.strip().split("\t")[3]),
+            int(f.strip().split("\t")[4]),
+        )
+genelist = list(set(genelist))
 # print(genelist)
 # print(len(blank_gene_id_lines))
 
-#get genes2transcripts ... this is only for verifying that every gene has only 1 transript... this is the assumption
-gene_id_2_transcript_ids=dict()
+# get genes2transcripts ... this is only for verifying that every gene has only 1 transript... this is the assumption
+gene_id_2_transcript_ids = dict()
 for g in genelist:
     if not g in gene_id_2_transcript_ids:
-        gene_id_2_transcript_ids[g]=list()
-    lines_with_gene_id=list(filter(lambda x: g in x,all_gtflines))
-    non_gene_lines=list(filter(lambda x:x.split("\t")[2]!="gene",lines_with_gene_id))
+        gene_id_2_transcript_ids[g] = list()
+    lines_with_gene_id = list(filter(lambda x: g in x, all_gtflines))
+    non_gene_lines = list(
+        filter(lambda x: x.split("\t")[2] != "gene", lines_with_gene_id)
+    )
     for l in non_gene_lines:
-        t_id=get_transcript_id(l.strip().split("\t")[8])
-        if t_id!='"transcript_id_unknown";':
+        t_id = get_transcript_id(l.strip().split("\t")[8])
+        if t_id != '"transcript_id_unknown";':
             gene_id_2_transcript_ids[g].append(t_id)
-            gene_id_2_transcript_ids[g]=list(set(gene_id_2_transcript_ids[g]))
+            gene_id_2_transcript_ids[g] = list(set(gene_id_2_transcript_ids[g]))
 
-geneid2transcriptidfile=open(args.ingtf+".geneid2transcriptid",'w')
-for k,v in gene_id_2_transcript_ids.items():
-    geneid2transcriptidfile.write("%s\t%s\n"%(k,v))
+geneid2transcriptidfile = open(args.ingtf + ".geneid2transcriptid", "w")
+for k, v in gene_id_2_transcript_ids.items():
+    geneid2transcriptidfile.write("%s\t%s\n" % (k, v))
 geneid2transcriptidfile.close()
 
-#get genenames
-gene_id_2_gene_name=dict()
+# get genenames
+gene_id_2_gene_name = dict()
 for g in genelist:
     if not g in gene_id_2_gene_name:
-        gene_id_2_gene_name[g]=list()
-    lines_with_gene_id=list(filter(lambda x: g in x,all_gtflines))
-    gene_line=list(filter(lambda x:x.split("\t")[2]=="gene",lines_with_gene_id))
+        gene_id_2_gene_name[g] = list()
+    lines_with_gene_id = list(filter(lambda x: g in x, all_gtflines))
+    gene_line = list(filter(lambda x: x.split("\t")[2] == "gene", lines_with_gene_id))
     # if len(gene_line)==0:
     #     for l in lines_with_gene_id:
     #         print(l,)
-    gene_line=gene_line[0]
-    gene_name=get_gene_name(gene_line.split("\t")[8])
-    if gene_name=="":
-        gene_name=g
-    gene_id_2_gene_name[g]=gene_name
+    gene_line = gene_line[0]
+    gene_name = get_gene_name(gene_line.split("\t")[8])
+    if gene_name == "":
+        gene_name = g
+    gene_id_2_gene_name[g] = gene_name
 # for k,v in gene_id_2_gene_name.items():
 #     print(k,v)
-    
-#get transcript coordinates
-gene_id_2_transcript_coordinates=dict()
+
+# get transcript coordinates
+gene_id_2_transcript_coordinates = dict()
 for g in genelist:
     # print("gene=",g)
     if not g in gene_id_2_transcript_coordinates:
-        gene_id_2_transcript_coordinates[g]=list()
-    if len(gene_id_2_transcript_ids[g])==1:
+        gene_id_2_transcript_coordinates[g] = list()
+    if len(gene_id_2_transcript_ids[g]) == 1:
         gene_id_2_transcript_coordinates[g].append(gene_coords[g])
     else:
-        lines_with_gene_id=list(filter(lambda x: g in x,all_gtflines))
-        non_gene_lines=list(filter(lambda x:x.split("\t")[2]!="gene",lines_with_gene_id))
+        lines_with_gene_id = list(filter(lambda x: g in x, all_gtflines))
+        non_gene_lines = list(
+            filter(lambda x: x.split("\t")[2] != "gene", lines_with_gene_id)
+        )
         for t in gene_id_2_transcript_ids[g]:
             # print("transcript=",t)
-            transcript_lines=list(filter(lambda x:t in x,non_gene_lines))
-            coords=[]
+            transcript_lines = list(filter(lambda x: t in x, non_gene_lines))
+            coords = []
             for l in transcript_lines:
                 # print(l.strip())
-                l_split=l.split("\t")
+                l_split = l.split("\t")
                 coords.append(int(l_split[3]))
                 coords.append(int(l_split[4]))
             # print()
-            gene_id_2_transcript_coordinates[g].append((min(coords),max(coords)))
+            gene_id_2_transcript_coordinates[g].append((min(coords), max(coords)))
     # print(gene_id_2_transcript_coordinates[g])
 # for k,v in gene_id_2_transcript_coordinates.items():
-    # print(k,v)
+# print(k,v)
 # exit()
 
-#get gene biotype\
-gene_id_2_gene_biotype=dict()
+# get gene biotype\
+gene_id_2_gene_biotype = dict()
 for g in genelist:
-    lines_with_gene_id=list(filter(lambda x: g in x,all_gtflines))
-    gene_line=list(filter(lambda x:x.split("\t")[2]=="gene",lines_with_gene_id))
-    gene_line=gene_line[0]
-    gene_biotype=get_gene_biotype(gene_line.split("\t")[8])
-    gene_id_2_gene_biotype[g]=gene_biotype
+    lines_with_gene_id = list(filter(lambda x: g in x, all_gtflines))
+    gene_line = list(filter(lambda x: x.split("\t")[2] == "gene", lines_with_gene_id))
+    gene_line = gene_line[0]
+    gene_biotype = get_gene_biotype(gene_line.split("\t")[8])
+    gene_id_2_gene_biotype[g] = gene_biotype
 # for k,v in gene_id_2_gene_biotype.items():
 #     print(k,v)
 
-out=open(args.outgtf,'w')    
+out = open(args.outgtf, "w")
 for g in genelist:
-    lines_with_gene_id=list(filter(lambda x: g in x,all_gtflines))
-    gene_line=list(filter(lambda x:x.split("\t")[2]=="gene",lines_with_gene_id))
-    gene_line=gene_line[0]
-    gene_line=gene_line.split("\t")
-    others=gene_line.pop(-1)
-    gene_line_copy=copy.copy(gene_line)
+    lines_with_gene_id = list(filter(lambda x: g in x, all_gtflines))
+    gene_line = list(filter(lambda x: x.split("\t")[2] == "gene", lines_with_gene_id))
+    gene_line = gene_line[0]
+    gene_line = gene_line.split("\t")
+    others = gene_line.pop(-1)
+    gene_line_copy = copy.copy(gene_line)
     # other key value pairs to add in the gene_line(col9)
-    others_to_add=[]
+    others_to_add = []
     # print("others=",others)
     for o in others.strip().split("; "):
         # print("o=",o)
-        o2=o.split(" ")
+        o2 = o.split(" ")
         # print("o2=",o2)
-        key=o2[0]
-        value=o2[1:]
-        value=" ".join(value)
+        key = o2[0]
+        value = o2[1:]
+        value = " ".join(value)
         # print("key=",key)
         # print("value=",value)
-        if key in ["gene_id","gene","gene_name","gene_type","gene_biotype"]:
+        if key in ["gene_id", "gene", "gene_name", "gene_type", "gene_biotype"]:
             continue
         else:
             others_to_add.append(key)
             if not ";" in value:
-                others_to_add.append(value+";")
+                others_to_add.append(value + ";")
             else:
                 others_to_add.append(value)
 
-    col9=[]
+    col9 = []
     col9.append("gene_id")
     col9.append(g)
     col9.append("gene_name")
     col9.append(gene_id_2_gene_name[g])
     col9.append("gene_biotype")
     col9.append(gene_id_2_gene_biotype[g])
-    col9plus=copy.copy(col9)
+    col9plus = copy.copy(col9)
     col9plus.extend(others_to_add)
-    gene_col9=" ".join(col9plus)
+    gene_col9 = " ".join(col9plus)
     gene_line.append(gene_col9)
-    gene_line="\t".join(gene_line)
-    out.write("%s\n"%(gene_line))
+    gene_line = "\t".join(gene_line)
+    out.write("%s\n" % (gene_line))
 
-    non_gene_lines=list(filter(lambda x:x.split("\t")[2]!="gene",lines_with_gene_id))
-    for i,t in enumerate(gene_id_2_transcript_ids[g]):
-        transcript_line=copy.copy(gene_line_copy)
-        transcript_line[2]="transcript"
-        transcript_line[3]=str(gene_id_2_transcript_coordinates[g][i][0])
-        transcript_line[4]=str(gene_id_2_transcript_coordinates[g][i][1])
-        new_trascript_id=create_new_transript_id(g,i+1)
-        transcript_col9=copy.copy(col9)
+    non_gene_lines = list(
+        filter(lambda x: x.split("\t")[2] != "gene", lines_with_gene_id)
+    )
+    for i, t in enumerate(gene_id_2_transcript_ids[g]):
+        transcript_line = copy.copy(gene_line_copy)
+        transcript_line[2] = "transcript"
+        transcript_line[3] = str(gene_id_2_transcript_coordinates[g][i][0])
+        transcript_line[4] = str(gene_id_2_transcript_coordinates[g][i][1])
+        new_trascript_id = create_new_transript_id(g, i + 1)
+        transcript_col9 = copy.copy(col9)
         transcript_col9.append("transcript_id")
         transcript_col9.append(new_trascript_id)
         transcript_col9.append("transcript_name")
         transcript_col9.append(new_trascript_id)
         transcript_col9.append("transcript_type")
         transcript_col9.append(gene_id_2_gene_biotype[g])
-        transcript_col9=" ".join(transcript_col9)
+        transcript_col9 = " ".join(transcript_col9)
         transcript_line.append(transcript_col9)
-        transcript_line="\t".join(transcript_line)   
-        out.write("%s\n"%(transcript_line))
+        transcript_line = "\t".join(transcript_line)
+        out.write("%s\n" % (transcript_line))
 
-        transcript_lines=list(filter(lambda x:t in x,non_gene_lines))
-        have_exons=are_exons_present(transcript_lines)
+        transcript_lines = list(filter(lambda x: t in x, non_gene_lines))
+        have_exons = are_exons_present(transcript_lines)
         for l in transcript_lines:
             # print(l)
-            l=l.strip().split("\t")
-            tofix=l.pop(-1)
-            l.append(fix_transcript_id(tofix,new_trascript_id))
-            if l[2]=="CDS" and have_exons==False:
-                l2=copy.copy(l)
-                l2[7]="."
-                l2[2]="exon"
-                l2="\t".join(l2)
-                out.write("%s\n"%(l2))
-            l="\t".join(l)
-            out.write("%s\n"%(l))
+            l = l.strip().split("\t")
+            tofix = l.pop(-1)
+            l.append(fix_transcript_id(tofix, new_trascript_id))
+            if l[2] == "CDS" and have_exons == False:
+                l2 = copy.copy(l)
+                l2[7] = "."
+                l2[2] = "exon"
+                l2 = "\t".join(l2)
+                out.write("%s\n" % (l2))
+            l = "\t".join(l)
+            out.write("%s\n" % (l))
             # print(l)
 out.close()
 
-out=open(args.ingtf+".extralines",'w')
+out = open(args.ingtf + ".extralines", "w")
 for b in blank_gene_id_lines:
     out.write(b)
 out.close()
diff --git a/workflow/scripts/gather_cluster_stats.sh b/workflow/scripts/gather_cluster_stats.sh
index 49c326e..1b9cc98 100755
--- a/workflow/scripts/gather_cluster_stats.sh
+++ b/workflow/scripts/gather_cluster_stats.sh
@@ -64,4 +64,4 @@ echo -ne "##SubmitTime\tHumanSubmitTime\tJobID:JobState:JobName\tAllocNode:Alloc
 while read jid;do
 	get_jobid_stats $jid
 done < ${snakemakelogfile}.jobids.lst |sort -k1,1n
-rm -f ${snakemakelogfile}.jobids.lst
\ No newline at end of file
+rm -f ${snakemakelogfile}.jobids.lst
diff --git a/workflow/scripts/get_index_rl.py b/workflow/scripts/get_index_rl.py
index 36265c1..1a30190 100755
--- a/workflow/scripts/get_index_rl.py
+++ b/workflow/scripts/get_index_rl.py
@@ -1,12 +1,13 @@
 import sys
 import gzip
 from itertools import islice
-with gzip.open(sys.argv[1],'r') as fin:
-	for line in islice(fin,1,2) :
-		r=len(line.strip())
 
-offset=2
-rls=[50,75,100,125,150]
-b=list(map(lambda x:x-int(r),rls))
-c=list(filter(lambda x:x<=(0+offset),b))
+with gzip.open(sys.argv[1], "r") as fin:
+    for line in islice(fin, 1, 2):
+        r = len(line.strip())
+
+offset = 2
+rls = [50, 75, 100, 125, 150]
+b = list(map(lambda x: x - int(r), rls))
+c = list(filter(lambda x: x <= (0 + offset), b))
 print(rls[b.index(max(c))])
diff --git a/workflow/scripts/junctions2readids.py b/workflow/scripts/junctions2readids.py
index c8053c3..192c592 100755
--- a/workflow/scripts/junctions2readids.py
+++ b/workflow/scripts/junctions2readids.py
@@ -39,35 +39,43 @@
 # e. site2
 # f. list of cigars comma-separated (soft-clips are converted to hard-clips)
 
+
 def split_text(s):
     for k, g in groupby(s, str.isalpha):
-        yield ''.join(g)
+        yield "".join(g)
+
 
 def split_cigar(c):
-	cigars=[]
-	if 'p' in c:
-		x=list(split_text(c))
-		cigars.append(''.join(x[:x.index('p')-1]).replace('S','H'))
-		cigars.append(''.join(x[x.index('p')+1:]).replace('S','H'))
-	else:
-		cigars.append(c.replace('S','H'))
-	return cigars
+    cigars = []
+    if "p" in c:
+        x = list(split_text(c))
+        cigars.append("".join(x[: x.index("p") - 1]).replace("S", "H"))
+        cigars.append("".join(x[x.index("p") + 1 :]).replace("S", "H"))
+    else:
+        cigars.append(c.replace("S", "H"))
+    return cigars
+
 
 def get_cigars(l):
-	cigars=[]
-	cigars.extend(split_cigar(l.split()[11]))
-	cigars.extend(split_cigar(l.split()[13]))
-	cigars=list(filter(lambda x:x!='',cigars))
-	return cigars
-	
-parser = argparse.ArgumentParser(description="""
+    cigars = []
+    cigars.extend(split_cigar(l.split()[11]))
+    cigars.extend(split_cigar(l.split()[13]))
+    cigars = list(filter(lambda x: x != "", cigars))
+    return cigars
+
+
+parser = argparse.ArgumentParser(
+    description="""
 Extract readids,strand,site,cigar etc. of reads with spliced junction from chimeric junctions file generated using STAR.
-""")
-parser.add_argument('-j',dest='junctions',required=True,help='chimeric junctions file')
+"""
+)
+parser.add_argument(
+    "-j", dest="junctions", required=True, help="chimeric junctions file"
+)
 # parser.add_argument('-r',dest='readids',required=True,help='Output txt file with a readid per line')
 args = parser.parse_args()
 # ofile=open(args.readids,'w')
-with open(args.junctions, 'r') as junc_f:
+with open(args.junctions, "r") as junc_f:
     for line in junc_f:
         if "junction_type" in line:
             continue
@@ -75,9 +83,11 @@ def get_cigars(l):
         if flag < 0:  # junction type : -1=encompassing junction (between the mates)
             continue
         chr1, site1, strand1, chr2, site2, strand2 = line.split()[:6]
-        if chr1 != chr2 or strand1 != strand2: # D & A need to be on the same chrom and same strand
+        if (
+            chr1 != chr2 or strand1 != strand2
+        ):  # D & A need to be on the same chrom and same strand
             continue
-        if strand1 == '+':
+        if strand1 == "+":
             start = int(site2)
             end = int(site1) - 1
         else:
@@ -85,5 +95,7 @@ def get_cigars(l):
             end = int(site2) - 1
         if start > end:
             continue
-        readid=line.split()[9]
-        print("\t".join([readid,chr1,strand1,site1,site2,",".join(get_cigars(line))]))
+        readid = line.split()[9]
+        print(
+            "\t".join([readid, chr1, strand1, site1, site2, ",".join(get_cigars(line))])
+        )
diff --git a/workflow/scripts/make_star_index.sh b/workflow/scripts/make_star_index.sh
index f81cc53..e3e6b9b 100755
--- a/workflow/scripts/make_star_index.sh
+++ b/workflow/scripts/make_star_index.sh
@@ -3,4 +3,4 @@ STAR \
 --runThreadN 56 \
 --runMode genomeGenerate \
 --genomeDir ./STAR_index_no_GTF \
---genomeFastaFiles ./ref.fa
\ No newline at end of file
+--genomeFastaFiles ./ref.fa
diff --git a/workflow/scripts/merge_ReadsPerGene_counts.R b/workflow/scripts/merge_ReadsPerGene_counts.R
index 4eb352b..25fae3b 100755
--- a/workflow/scripts/merge_ReadsPerGene_counts.R
+++ b/workflow/scripts/merge_ReadsPerGene_counts.R
@@ -16,15 +16,15 @@ for (i in 1:length(files)){
   sname=unlist(strsplit(basename(files[i]),"_p2"))[1]
   datasets_unstranded[[sname]]=read_counts(files[i],sname,2)
   datasets_stranded[[sname]]=read_counts(files[i],sname,3)
-  datasets_revstranded[[sname]]=read_counts(files[i],sname,4)	
+  datasets_revstranded[[sname]]=read_counts(files[i],sname,4)
 }
 
 
-x=Reduce(function(d1, d2) merge(d1, d2, by = "Gene", all.x = TRUE, all.y = FALSE), 
+x=Reduce(function(d1, d2) merge(d1, d2, by = "Gene", all.x = TRUE, all.y = FALSE),
        datasets_unstranded)
-y=Reduce(function(d1, d2) merge(d1, d2, by = "Gene", all.x = TRUE, all.y = FALSE), 
+y=Reduce(function(d1, d2) merge(d1, d2, by = "Gene", all.x = TRUE, all.y = FALSE),
          datasets_stranded)
-z=Reduce(function(d1, d2) merge(d1, d2, by = "Gene", all.x = TRUE, all.y = FALSE), 
+z=Reduce(function(d1, d2) merge(d1, d2, by = "Gene", all.x = TRUE, all.y = FALSE),
          datasets_revstranded)
 
 write.table(x,file="unstranded_STAR_GeneCounts.tsv",quote = FALSE,row.names = FALSE,sep="\t")
diff --git a/workflow/scripts/merge_counts_tables_2_counts_matrix.py b/workflow/scripts/merge_counts_tables_2_counts_matrix.py
index b9bc314..312e6fc 100755
--- a/workflow/scripts/merge_counts_tables_2_counts_matrix.py
+++ b/workflow/scripts/merge_counts_tables_2_counts_matrix.py
@@ -9,35 +9,60 @@
 import os
 import numpy
 
-debug=False
+debug = False
 
 # no truncations during if debug: print pandas data frames
-pandas.set_option('display.max_rows', None)
-pandas.set_option('display.max_columns', None)
-pandas.set_option('display.width', None)
-pandas.set_option('display.max_colwidth', None)
-
-
-parser = argparse.ArgumentParser(description='Merge per sample counts tables to a single annotated counts matrix')
-parser.add_argument('--per_sample_tables', nargs='+', dest='ctables', type=argparse.FileType('r'), required=True,
-                    help='space separated list of input per-sample count tables')
-parser.add_argument('--lookup_table', dest='lookup', type=argparse.FileType('r'), required=True,
-                    help='annotation lookup table (host-only)')
-parser.add_argument('-o',dest='outfile',required=True,type=argparse.FileType('w'),help='merged countsmatrix')
+pandas.set_option("display.max_rows", None)
+pandas.set_option("display.max_columns", None)
+pandas.set_option("display.width", None)
+pandas.set_option("display.max_colwidth", None)
+
+
+parser = argparse.ArgumentParser(
+    description="Merge per sample counts tables to a single annotated counts matrix"
+)
+parser.add_argument(
+    "--per_sample_tables",
+    nargs="+",
+    dest="ctables",
+    type=argparse.FileType("r"),
+    required=True,
+    help="space separated list of input per-sample count tables",
+)
+parser.add_argument(
+    "--lookup_table",
+    dest="lookup",
+    type=argparse.FileType("r"),
+    required=True,
+    help="annotation lookup table (host-only)",
+)
+parser.add_argument(
+    "-o",
+    dest="outfile",
+    required=True,
+    type=argparse.FileType("w"),
+    help="merged countsmatrix",
+)
 args = parser.parse_args()
 
 if debug:
     print(args)
 
+
 def prefix_counts(colname):
     # returns true if the col needs to be an int
-    if colname.endswith("_read_count"): return True
-    if colname.endswith("_ntools"): return True
-    if colname.endswith(".length"): return True
+    if colname.endswith("_read_count"):
+        return True
+    if colname.endswith("_ntools"):
+        return True
+    if colname.endswith(".length"):
+        return True
     return False
 
+
 def prefix_annotations(colname):
-    if colname.endswith("_annotation"): return True
+    if colname.endswith("_annotation"):
+        return True
     return False
 
 
@@ -48,134 +73,174 @@ def atof(text):
         retval = text
     return retval
 
+
 def natural_keys(text):
-    '''
+    """
     alist.sort(key=natural_keys) sorts in human order
     http://nedbatchelder.com/blog/200712/human_sorting.html
     (See Toothy's implementation in the comments)
     float regex comes from https://stackoverflow.com/a/12643073/190597
-    '''
-    return [ atof(c) for c in re.split(r'[+-]?([0-9]+(?:[.][0-9]*)?|[.][0-9]+)', str(text)) ]
+    """
+    return [
+        atof(c) for c in re.split(r"[+-]?([0-9]+(?:[.][0-9]*)?|[.][0-9]+)", str(text))
+    ]
+
 
 def get_count_and_annotation_columns(df):
-    count_cols=['circRNA_id2']
-    annotation_cols=['circRNA_id2']
+    count_cols = ["circRNA_id2"]
+    annotation_cols = ["circRNA_id2"]
     for col in df.columns:
         if prefix_counts(col):
             count_cols.append(col)
         if prefix_annotations(col):
             annotation_cols.append(col)
-    return count_cols,annotation_cols
+    return count_cols, annotation_cols
+
 
 def readin_counts_file(f):
-    intable=pandas.read_csv(f,sep="\t",header=0)
-    intable['circRNA_id2']=intable['circRNA_id'].astype(str)+"##"+intable['strand'].astype(str)
-    intable.drop(['circRNA_id','strand'],axis=1,inplace=True)
-    count_cols,annotation_cols=get_count_and_annotation_columns(intable)
-    count_table=intable[count_cols]
-    annotation_table=intable[annotation_cols]
-    count_table.set_index(['circRNA_id2'],inplace=True)
-    annotation_table.set_index(['circRNA_id2'],inplace=True)
-    return(count_table,annotation_table)
+    intable = pandas.read_csv(f, sep="\t", header=0)
+    intable["circRNA_id2"] = (
+        intable["circRNA_id"].astype(str) + "##" + intable["strand"].astype(str)
+    )
+    intable.drop(["circRNA_id", "strand"], axis=1, inplace=True)
+    count_cols, annotation_cols = get_count_and_annotation_columns(intable)
+    count_table = intable[count_cols]
+    annotation_table = intable[annotation_cols]
+    count_table.set_index(["circRNA_id2"], inplace=True)
+    annotation_table.set_index(["circRNA_id2"], inplace=True)
+    return (count_table, annotation_table)
 
 
 # per_sample_files=list(Path(args.folder).rglob("*.circRNA_counts.txt"))
 # per_sample_files=list(filter(lambda x: os.stat(x).st_size !=0, per_sample_files))
 # per_sample_files.sort(key=natural_keys)
 
-per_sample_files=args.ctables
+per_sample_files = args.ctables
 
-if debug: print(per_sample_files)
+if debug:
+    print(per_sample_files)
 
-annotation_tables=list()
-f=per_sample_files[0]
-if debug: print("Currently reading file:"+str(f))
-ctable,atable=readin_counts_file(f)
+annotation_tables = list()
+f = per_sample_files[0]
+if debug:
+    print("Currently reading file:" + str(f))
+ctable, atable = readin_counts_file(f)
 annotation_tables.append(atable)
-count_matrix=ctable.copy()
+count_matrix = ctable.copy()
 # count_matrix.set_index(['circRNA_id2'],inplace=True)
-if debug: print("Head of this file looks like this:")
-if debug: print(count_matrix.head())
-for i in range(1,len(per_sample_files)):
-    f=per_sample_files[i]
-    if debug: print("Currently reading file:"+str(f))
-    ctable,atable=readin_counts_file(f)
+if debug:
+    print("Head of this file looks like this:")
+if debug:
+    print(count_matrix.head())
+for i in range(1, len(per_sample_files)):
+    f = per_sample_files[i]
+    if debug:
+        print("Currently reading file:" + str(f))
+    ctable, atable = readin_counts_file(f)
     # ctable.set_index(['circRNA_id2'],inplace=True)
-    if debug: print("Head of this file looks like this:")
-    if debug: print(ctable.head())
-    count_matrix=pandas.concat([count_matrix,ctable],axis=1,join="outer",sort=False)
-    count_matrix.fillna(0,inplace=True)
+    if debug:
+        print("Head of this file looks like this:")
+    if debug:
+        print(ctable.head())
+    count_matrix = pandas.concat(
+        [count_matrix, ctable], axis=1, join="outer", sort=False
+    )
+    count_matrix.fillna(0, inplace=True)
     annotation_tables.append(atable)
 
-for i,a in enumerate(annotation_tables):
-    if i==0:
-        amatrix=a.copy()
+for i, a in enumerate(annotation_tables):
+    if i == 0:
+        amatrix = a.copy()
     else:
-        oldi=set(list(amatrix.index))
-        newi=set(list(a.index))
-        toadd=newi-oldi
-        suba=a.loc[list(toadd)]
-        amatrix=pandas.concat([amatrix,suba])
+        oldi = set(list(amatrix.index))
+        newi = set(list(a.index))
+        toadd = newi - oldi
+        suba = a.loc[list(toadd)]
+        amatrix = pandas.concat([amatrix, suba])
 
 
-if debug: print(count_matrix.head())
-if debug: print(annotation_tables[0].head())
-if debug: print(count_matrix.shape)
-if debug: print(annotation_tables[0].shape)
-if debug: print(annotation_tables[1].shape)
-if debug: print(amatrix.shape)
+if debug:
+    print(count_matrix.head())
+if debug:
+    print(annotation_tables[0].head())
+if debug:
+    print(count_matrix.shape)
+if debug:
+    print(annotation_tables[0].shape)
+if debug:
+    print(annotation_tables[1].shape)
+if debug:
+    print(amatrix.shape)
 
 
-annotations=pandas.read_csv(args.lookup,sep="\t",header=0)
-annotations_cols=annotations.columns
+annotations = pandas.read_csv(args.lookup, sep="\t", header=0)
+annotations_cols = annotations.columns
 # annotations.set_index([annotations_cols[0]],inplace=True)
-annotations['circRNA_id2']=annotations[annotations_cols[0]].astype(str)+"##"+annotations['strand'].astype(str)
-annotations.set_index(annotations['circRNA_id2'],inplace=True)
+annotations["circRNA_id2"] = (
+    annotations[annotations_cols[0]].astype(str)
+    + "##"
+    + annotations["strand"].astype(str)
+)
+annotations.set_index(annotations["circRNA_id2"], inplace=True)
 # annotations.drop(['strand'],axis=1,inplace=True)
-if debug: print(annotations.head())
-if debug: print(annotations.shape)
-
+if debug:
+    print(annotations.head())
+if debug:
+    print(annotations.shape)
 
 
 # count_matrix=pandas.concat([count_matrix,annotations],axis=1,join="outer",sort=False)
-cmatrix = pandas.merge(amatrix,annotations,left_index=True,right_index=True,sort=False,how='left')
-cmatrix['circRNA_id2']=cmatrix.index
-count_matrix = pandas.merge(count_matrix,cmatrix,left_index=True,right_index=True,sort=False,how='left')
-count_matrix.replace('.',numpy.nan,inplace=True)
-count_matrix.fillna(0,inplace=True)
+cmatrix = pandas.merge(
+    amatrix, annotations, left_index=True, right_index=True, sort=False, how="left"
+)
+cmatrix["circRNA_id2"] = cmatrix.index
+count_matrix = pandas.merge(
+    count_matrix, cmatrix, left_index=True, right_index=True, sort=False, how="left"
+)
+count_matrix.replace(".", numpy.nan, inplace=True)
+count_matrix.fillna(0, inplace=True)
 # count_matrix.replace(re.compile('\.'),'0', regex=True,inplace=True)
 
-if debug: print(count_matrix.head())
-if debug: print(count_matrix.shape)
+if debug:
+    print(count_matrix.head())
+if debug:
+    print(count_matrix.shape)
 
-coltypes=dict()
+coltypes = dict()
 for col in count_matrix.columns:
-    coltypes[col]=str
+    coltypes[col] = str
     if prefix_counts(col):
         # count_matrix[[col]].replace(re.compile('\.'),'0', regex=True,inplace=True)
-        coltypes[col]=int
+        coltypes[col] = int
 count_matrix = count_matrix.astype(coltypes)
-count_matrix[['circRNA_coord', 'circRNA_strand']] = count_matrix['circRNA_id2'].str.split('##', expand=True)
-count_matrix.drop(['circRNA_id2'],axis=1,inplace=True)
-cols=list(count_matrix.columns)
-col1index=cols.index('circRNA_coord')
-col2index=cols.index('circRNA_strand')
-other_indices=list(set(range(len(cols)))-set([col1index,col2index]))
-new_order=['circRNA_coord','circRNA_strand']
+count_matrix[["circRNA_coord", "circRNA_strand"]] = count_matrix[
+    "circRNA_id2"
+].str.split("##", expand=True)
+count_matrix.drop(["circRNA_id2"], axis=1, inplace=True)
+cols = list(count_matrix.columns)
+col1index = cols.index("circRNA_coord")
+col2index = cols.index("circRNA_strand")
+other_indices = list(set(range(len(cols))) - set([col1index, col2index]))
+new_order = ["circRNA_coord", "circRNA_strand"]
 for i in other_indices:
     new_order.append(cols[i])
-count_matrix=count_matrix[new_order]
-if debug: print(count_matrix.head())
-
-df2 = count_matrix[list(filter(lambda x:x.endswith("_read_count"),list(count_matrix.columns)))]
-df2 = df2.astype('int')
-count_matrix['sum_of_all_counts'] = df2.sum(axis=1)
-df3 = count_matrix[list(filter(lambda x:x.endswith("_ntools"),list(count_matrix.columns)))]
-df3 = df3.astype('int')
-count_matrix['sum_of_all_ntools'] = df3.sum(axis=1)
-
-count_matrix = count_matrix.sort_values(by=['sum_of_all_ntools','sum_of_all_counts'], ascending=False)
-count_matrix.drop(['sum_of_all_ntools','sum_of_all_counts'],axis=1,inplace=True)
-count_matrix.to_csv(args.outfile,sep="\t",header=True,index=False)
-
-
+count_matrix = count_matrix[new_order]
+if debug:
+    print(count_matrix.head())
+
+df2 = count_matrix[
+    list(filter(lambda x: x.endswith("_read_count"), list(count_matrix.columns)))
+]
+df2 = df2.astype("int")
+count_matrix["sum_of_all_counts"] = df2.sum(axis=1)
+df3 = count_matrix[
+    list(filter(lambda x: x.endswith("_ntools"), list(count_matrix.columns)))
+]
+df3 = df3.astype("int")
+count_matrix["sum_of_all_ntools"] = df3.sum(axis=1)
+
+count_matrix = count_matrix.sort_values(
+    by=["sum_of_all_ntools", "sum_of_all_counts"], ascending=False
+)
+count_matrix.drop(["sum_of_all_ntools", "sum_of_all_counts"], axis=1, inplace=True)
+count_matrix.to_csv(args.outfile, sep="\t", header=True, index=False)
diff --git a/workflow/scripts/reformat_hg38_2_hg19.py b/workflow/scripts/reformat_hg38_2_hg19.py
index 14b310a..67ba897 100755
--- a/workflow/scripts/reformat_hg38_2_hg19.py
+++ b/workflow/scripts/reformat_hg38_2_hg19.py
@@ -1,53 +1,53 @@
-f=open("hg19_hg38_annotated_lookup.txt")
-hg38_2_hg19=dict()
+f = open("hg19_hg38_annotated_lookup.txt")
+hg38_2_hg19 = dict()
 for l in f.readlines():
-	l=l.strip().split("\t")
-	hg19ID=l[0]
-	hg38ID=l[1]
-	strand=l[2]
-	circRNA_ID=l[3]
-	genomic_length=l[4]
-	spliced_seq_length=l[5]
-	samples=l[6].split(",")	
-	repeats=l[7]
-	annotation=l[8].split(",")	
-	best_transcript=l[9]
-	gene_symbol=l[10]
-	circRNA_study=l[11].split(",")
-	if not hg38ID in hg38_2_hg19:
-		hg38_2_hg19[hg38ID]=dict()
-		hg38_2_hg19[hg38ID]['hg19ID']=list()
-		hg38_2_hg19[hg38ID]['circRNA_ID']=list()
-		hg38_2_hg19[hg38ID]['samples']=list()
-		hg38_2_hg19[hg38ID]['annotation']=list()
-		hg38_2_hg19[hg38ID]['circRNA_study']=list()
-	hg38_2_hg19[hg38ID]['hg19ID'].append(hg19ID)
-	hg38_2_hg19[hg38ID]['strand']=strand
-	hg38_2_hg19[hg38ID]['circRNA_ID'].append(circRNA_ID)
-	hg38_2_hg19[hg38ID]['genomic_length']=genomic_length
-	hg38_2_hg19[hg38ID]['spliced_seq_length']=spliced_seq_length
-	hg38_2_hg19[hg38ID]['samples'].extend(samples)
-	hg38_2_hg19[hg38ID]['repeats']=repeats
-	hg38_2_hg19[hg38ID]['annotation'].extend(annotation)
-	hg38_2_hg19[hg38ID]['best_transcript']=best_transcript
-	hg38_2_hg19[hg38ID]['gene_symbol']=gene_symbol
-	hg38_2_hg19[hg38ID]['circRNA_study'].extend(circRNA_study)
-
-#print("\t".join(["hg38ID","hg19ID","strand","circRNA.ID","genomic.length","spliced.seq.length","samples","repeats","annotation","best.transcript","gene.symbol","circRNA.study"]),)
-for k,v in hg38_2_hg19.items():
-	l=list()
-	l.append(k)
-	l.append(",".join(set(v['hg19ID'])))
-	l.append(v['strand'])
-	l.append(",".join(set(v['circRNA_ID'])))
-	l.append(v['genomic_length'])
-	l.append(v['spliced_seq_length'])
-	l.append(",".join(set(v['samples'])))
-	l.append(v['repeats'])
-	l.append(",".join(set(v['annotation'])))
-	l.append(v['best_transcript'])
-	l.append(v['gene_symbol'])
-	l.append(",".join(set(v['circRNA_study'])))
-	print("\t".join(l),)
-
+    l = l.strip().split("\t")
+    hg19ID = l[0]
+    hg38ID = l[1]
+    strand = l[2]
+    circRNA_ID = l[3]
+    genomic_length = l[4]
+    spliced_seq_length = l[5]
+    samples = l[6].split(",")
+    repeats = l[7]
+    annotation = l[8].split(",")
+    best_transcript = l[9]
+    gene_symbol = l[10]
+    circRNA_study = l[11].split(",")
+    if not hg38ID in hg38_2_hg19:
+        hg38_2_hg19[hg38ID] = dict()
+        hg38_2_hg19[hg38ID]["hg19ID"] = list()
+        hg38_2_hg19[hg38ID]["circRNA_ID"] = list()
+        hg38_2_hg19[hg38ID]["samples"] = list()
+        hg38_2_hg19[hg38ID]["annotation"] = list()
+        hg38_2_hg19[hg38ID]["circRNA_study"] = list()
+    hg38_2_hg19[hg38ID]["hg19ID"].append(hg19ID)
+    hg38_2_hg19[hg38ID]["strand"] = strand
+    hg38_2_hg19[hg38ID]["circRNA_ID"].append(circRNA_ID)
+    hg38_2_hg19[hg38ID]["genomic_length"] = genomic_length
+    hg38_2_hg19[hg38ID]["spliced_seq_length"] = spliced_seq_length
+    hg38_2_hg19[hg38ID]["samples"].extend(samples)
+    hg38_2_hg19[hg38ID]["repeats"] = repeats
+    hg38_2_hg19[hg38ID]["annotation"].extend(annotation)
+    hg38_2_hg19[hg38ID]["best_transcript"] = best_transcript
+    hg38_2_hg19[hg38ID]["gene_symbol"] = gene_symbol
+    hg38_2_hg19[hg38ID]["circRNA_study"].extend(circRNA_study)
 
+# print("\t".join(["hg38ID","hg19ID","strand","circRNA.ID","genomic.length","spliced.seq.length","samples","repeats","annotation","best.transcript","gene.symbol","circRNA.study"]),)
+for k, v in hg38_2_hg19.items():
+    l = list()
+    l.append(k)
+    l.append(",".join(set(v["hg19ID"])))
+    l.append(v["strand"])
+    l.append(",".join(set(v["circRNA_ID"])))
+    l.append(v["genomic_length"])
+    l.append(v["spliced_seq_length"])
+    l.append(",".join(set(v["samples"])))
+    l.append(v["repeats"])
+    l.append(",".join(set(v["annotation"])))
+    l.append(v["best_transcript"])
+    l.append(v["gene_symbol"])
+    l.append(",".join(set(v["circRNA_study"])))
+    print(
+        "\t".join(l),
+    )
diff --git a/workflow/scripts/transcript2gene.py b/workflow/scripts/transcript2gene.py
index 9e5b963..95c29aa 100755
--- a/workflow/scripts/transcript2gene.py
+++ b/workflow/scripts/transcript2gene.py
@@ -1,20 +1,23 @@
 import sys
-def get_id(s,whatid):
-	s=s.split()
-	for i,j in enumerate(s):
-		if j==whatid:
-			r=s[i+1]
-	r=r.replace('"','')
-	r=r.replace(';','')
-	return r
-gtffile=sys.argv[1]
+
+
+def get_id(s, whatid):
+    s = s.split()
+    for i, j in enumerate(s):
+        if j == whatid:
+            r = s[i + 1]
+    r = r.replace('"', "")
+    r = r.replace(";", "")
+    return r
+
+
+gtffile = sys.argv[1]
 for i in open(gtffile).readlines():
-	if i.startswith("#"):
-		continue
-	i=i.strip().split("\t")
-	if i[2]!="transcript":
-		continue
-	gid=get_id(i[8],"gene_id")
-	tid=get_id(i[8],"transcript_id")
-	print("%s\t%s"%(tid,gid))
-		
+    if i.startswith("#"):
+        continue
+    i = i.strip().split("\t")
+    if i[2] != "transcript":
+        continue
+    gid = get_id(i[8], "gene_id")
+    tid = get_id(i[8], "transcript_id")
+    print("%s\t%s" % (tid, gid))
diff --git a/workflow/scripts/validate_BSJ_reads_and_split_BSJ_bam_by_strand.py b/workflow/scripts/validate_BSJ_reads_and_split_BSJ_bam_by_strand.py
index 8970797..1e0a3db 100755
--- a/workflow/scripts/validate_BSJ_reads_and_split_BSJ_bam_by_strand.py
+++ b/workflow/scripts/validate_BSJ_reads_and_split_BSJ_bam_by_strand.py
@@ -10,10 +10,10 @@
 3. BSJ bed file with score(number of reads supporting the BSJ) and strand information
 Logic (for PE reads):
 Each BSJ is represented by a 3 alignments in the output BAM file.
-Alignment 1 is complete alignment of one of the reads in pair and 
-Alignments 2 and 3 are split alignment of the mate at two distinct loci on the same reference 
+Alignment 1 is complete alignment of one of the reads in pair and
+Alignments 2 and 3 are split alignment of the mate at two distinct loci on the same reference
 chromosome.
-These alignments are grouped together by the "HI" tags in SAM file. For example, all 3 
+These alignments are grouped together by the "HI" tags in SAM file. For example, all 3
 alignments for the same BSJ will have the same "HI" value... something like "HI:i:1".
 BSJ alignment sam bitflag combinations can have 8 different possibilities, 4 from sense strand
 and 4 from anti-sense strand:
@@ -29,12 +29,12 @@
 #         |<------------------BSJ----------------->|
 3. 83,163,2209
 4. 339,419,2465
-#         						  R1									  
-#       						<------									
+#         						  R1
+#       						<------
 #     5'--|------------------------------------------|---3'
 #     3'--|------------------------------------------|---5'
 #         |------>							  ------>|
-#         | R2.2								R2.1 | 
+#         | R2.2								R2.1 |
 #         |                                          |
 #         |<-----------------BSJ-------------------->|
 5. 99,147,2193
@@ -49,346 +49,378 @@
 #         |<------------------BSJ----------------->|
 7. 99,147,2145
 8. 355, 403, 2401
-#         						  R2									  
-#       						<------									
+#         						  R2
+#       						<------
 #     5'--|------------------------------------------|---3'
 #     3'--|------------------------------------------|---5'
 #         |------>							  ------>|
-#         | R1.2								R1.1 | 
+#         | R1.2								R1.1 |
 #         |                                          |
 #         |<-----------------BSJ-------------------->|
 """
 
 
 class BSJ:
-	def __init__(self):
-		self.chrom=""
-		self.start=""
-		self.end=""
-		self.score=0
-		self.name="."
-		self.strand="U"
-		self.bitids=list()
-		self.rids=list()
-		
-	def plusone(self):
-		self.score+=1
-	
-	def set_strand(self,strand):
-		self.strand=strand
-	
-	def set_chrom(self,chrom):
-		self.chrom=chrom
-
-	def set_start(self,start):
-		self.start=start
-
-	def set_end(self,end):
-		self.end=end
-	
-	def append_bitid(self,bitid):
-		self.bitids.append(bitid)
-
-	def append_rid(self,rid):
-		self.rids.append(rid)
-		
-	def write_out_BSJ(self,outbed):
-		t=[]
-		t.append(self.chrom)
-		t.append(str(self.start))
-		t.append(str(self.end))
-		t.append(self.name)
-		t.append(str(self.score))
-		t.append(self.strand)
-		t.append(",".join(self.bitids))
-		t.append(",".join(self.rids))
-		outbed.write("\t".join(t)+"\n")		
-		
+    def __init__(self):
+        self.chrom = ""
+        self.start = ""
+        self.end = ""
+        self.score = 0
+        self.name = "."
+        self.strand = "U"
+        self.bitids = list()
+        self.rids = list()
+
+    def plusone(self):
+        self.score += 1
+
+    def set_strand(self, strand):
+        self.strand = strand
+
+    def set_chrom(self, chrom):
+        self.chrom = chrom
+
+    def set_start(self, start):
+        self.start = start
+
+    def set_end(self, end):
+        self.end = end
+
+    def append_bitid(self, bitid):
+        self.bitids.append(bitid)
+
+    def append_rid(self, rid):
+        self.rids.append(rid)
+
+    def write_out_BSJ(self, outbed):
+        t = []
+        t.append(self.chrom)
+        t.append(str(self.start))
+        t.append(str(self.end))
+        t.append(self.name)
+        t.append(str(self.score))
+        t.append(self.strand)
+        t.append(",".join(self.bitids))
+        t.append(",".join(self.rids))
+        outbed.write("\t".join(t) + "\n")
+
+
 class Readinfo:
-	def __init__(self,readid,rname):
-		self.readid=readid
-		self.refname=rname
-		self.alignments=list()
-		self.bitflags=list()
-		self.bitid=""
-		self.strand="."
-		self.start=-1
-		self.end=-1
-		self.refcoordinates=dict()
-		self.isread1=dict()
-		self.isreverse=dict()
-		self.issecondary=dict()
-		self.issupplementary=dict()
-	
-	def __str__(self):
-		s = "readid: %s"%(self.readid)
-		s = "%s\tbitflags: %s"%(s,self.bitflags)
-		s = "%s\tbitid: %s"%(s,self.bitid)
-		return s
-
-	def set_refcoordinates(self,bitflag,refpos):
-		self.refcoordinates[bitflag]=refpos
-	
-	def set_read1_reverse_secondary_supplementary(self,bitflag,read):
-		if read.is_read1:
-			self.isread1[bitflag]="Y"
-		else:
-			self.isread1[bitflag]="N"
-		if read.is_reverse:
-			self.isreverse[bitflag]="Y"
-		else:
-			self.isreverse[bitflag]="N"
-		if read.is_secondary:
-			self.issecondary[bitflag]="Y"
-		else:
-			self.issecondary[bitflag]="N"
-		if read.is_supplementary:
-			self.issupplementary[bitflag]="Y"
-		else:
-			self.issupplementary[bitflag]="N"
-	
-	def append_alignment(self,read):
-		self.alignments.append(read)
-	
-	def append_bitflag(self,bf):
-		self.bitflags.append(bf)
-	
-	# def extend_ref_positions(self,refcoords):
-	# 	self.refcoordinates.extend(refcoords)
-	
-	def generate_bitid(self):
-		bitlist=sorted(self.bitflags)
-		self.bitid="##".join(list(map(lambda x:str(x),bitlist)))
-# 		self.bitid=str(bitlist[0])+"##"+str(bitlist[1])+"##"+str(bitlist[2])
-	
-	def get_strand(self):
-		if self.bitid=="83##163##2129":
-			self.strand="+"
-		elif self.bitid=="339##419##2385":
-			self.strand="+"
-		elif self.bitid=="83##163##2209":
-			self.strand="+"
-		elif self.bitid=="339##419##2465":
-			self.strand="+"		
-		elif self.bitid=="99##147##2193":
-			self.strand="-"
-		elif self.bitid=="355##403##2449":
-			self.strand="-"
-		elif self.bitid=="99##147##2145":
-			self.strand="-"
-		elif self.bitid=="355##403##2401":
-			self.strand="-"
-		elif self.bitid=="16##2064":
-			self.strand="+"
-		elif self.bitid=="272##2320":
-			self.strand="+"
-		elif self.bitid=="0##2048":
-			self.strand="-"
-		elif self.bitid=="256##2304":
-			self.strand="-"
-		elif self.bitid=="153##2201":
-			self.strand="-"
-		else:
-			self.strand="U"
-
-	def validate_read(self):
-		"""
-		Checks if read is truly a BSJ originitor.
-		* Defines left, right and middle alignments
-		* Left and right alignments should not overlap
-		* Middle alignment should be between left and right alignments
-		"""
-		if len(self.bitid.split("##"))==3:
-			left=-1
-			right=-1
-			middle=-1
-			if self.bitid=="83##163##2129":
-				left=2129
-				right=83
-				middle=163
-			if self.bitid=="339##419##2385":
-				left=2385
-				right=339
-				middle=419				
-			if self.bitid=="83##163##2209":
-				left=163
-				right=2209
-				middle=83
-			if self.bitid=="339##419##2465":
-				left=419
-				right=2465
-				middle=339
-			if self.bitid=="99##147##2145":
-				left=99
-				right=2145
-				middle=147
-			if self.bitid=="355##403##2401":
-				left=355
-				right=2401
-				middle=403
-			if self.bitid=="99##147##2193":
-				left=2193
-				right=147
-				middle=99
-			if self.bitid=="355##403##2449":
-				left=2449
-				right=403
-				middle=355
-			print(left,right,middle)
-			if left == -1 or right == -1 or middle == -1:
-				return False
-			if not (self.refcoordinates[left][-1] < self.refcoordinates[right][0] and self.refcoordinates[middle][-1] <= self.refcoordinates[right][-1] and self.refcoordinates[middle][0] >= self.refcoordinates[left][0]):
-				print("HERE")
-				print(self.refcoordinates[left][-1])
-				print(self.refcoordinates[right][0])
-				print(self.refcoordinates[middle][-1])
-				print(self.refcoordinates[right][-1])
-				print(self.refcoordinates[middle][0])
-				print(self.refcoordinates[left][0])
-				print(self.refcoordinates[left][-1] < self.refcoordinates[right][0])
-				print(self.refcoordinates[middle][-1] <= self.refcoordinates[right][-1])
-				print(self.refcoordinates[middle][0] >= self.refcoordinates[left][0])
-				return False
-			else:
-				return True
-		else:
-			return False
-		# print("NOT_THREE",self.readid,self.bitid,self.refcoordinates.keys())					
-		# 	if not (self.refcoordinates[163][-1] < self.refcoordinates[2209][0] and self.refcoordinates[83][-1] <= self.refcoordinates[2209][-1] and self.refcoordinates[83][0] >= self.refcoordinates[163][0]):
-		# 		print(self.readid,self.bitid)
-		# 		print(self.refcoordinates.keys())
-		# 		print(self.refcoordinates[163][0],self.refcoordinates[163][-1],"\t",self.refcoordinates[2209][0],self.refcoordinates[2209][-1])
-		# 		print(self.refcoordinates[83][0],self.refcoordinates[83][-1])
-			
-	
-	def get_start_end(self):
-		refcoordinates=self.refcoordinates
-		isread1=self.isread1
-		if len(self.isread1)!=3:			
-			refcoords=[]
-			for i in refcoordinates.keys():
-				refcoords.extend(refcoordinates[i])
-		else:			
-			l=[]
-			for i in isread1.keys():
-				l.append(isread1[i])
-			Ycount=l.count("Y")
-			Ncount=l.count("N")
-			if Ycount>Ncount:
-				useread1="Y"
-			else:
-				useread1="N"
-			refcoords=[]
-			for i in refcoordinates.keys():
-				if isread1[i]==useread1:
-					refcoords.extend(refcoordinates[i])
-		refcoords=sorted(refcoords)
-		self.start=str(refcoords[0])
-		self.end=str(int(refcoords[-1])+1)
-	
-	def get_bsjid(self):
-		t=[]
-		t.append(self.refname)
-		t.append(self.start)
-		t.append(self.end)
-		t.append(self.strand)
-		return "##".join(t)
-	
-	def write_out_reads(self,outbam):
-		for r in self.alignments:
-			outbam.write(r)
-		
-			
+    def __init__(self, readid, rname):
+        self.readid = readid
+        self.refname = rname
+        self.alignments = list()
+        self.bitflags = list()
+        self.bitid = ""
+        self.strand = "."
+        self.start = -1
+        self.end = -1
+        self.refcoordinates = dict()
+        self.isread1 = dict()
+        self.isreverse = dict()
+        self.issecondary = dict()
+        self.issupplementary = dict()
+
+    def __str__(self):
+        s = "readid: %s" % (self.readid)
+        s = "%s\tbitflags: %s" % (s, self.bitflags)
+        s = "%s\tbitid: %s" % (s, self.bitid)
+        return s
+
+    def set_refcoordinates(self, bitflag, refpos):
+        self.refcoordinates[bitflag] = refpos
+
+    def set_read1_reverse_secondary_supplementary(self, bitflag, read):
+        if read.is_read1:
+            self.isread1[bitflag] = "Y"
+        else:
+            self.isread1[bitflag] = "N"
+        if read.is_reverse:
+            self.isreverse[bitflag] = "Y"
+        else:
+            self.isreverse[bitflag] = "N"
+        if read.is_secondary:
+            self.issecondary[bitflag] = "Y"
+        else:
+            self.issecondary[bitflag] = "N"
+        if read.is_supplementary:
+            self.issupplementary[bitflag] = "Y"
+        else:
+            self.issupplementary[bitflag] = "N"
+
+    def append_alignment(self, read):
+        self.alignments.append(read)
+
+    def append_bitflag(self, bf):
+        self.bitflags.append(bf)
+
+    # def extend_ref_positions(self,refcoords):
+    # 	self.refcoordinates.extend(refcoords)
+
+    def generate_bitid(self):
+        bitlist = sorted(self.bitflags)
+        self.bitid = "##".join(list(map(lambda x: str(x), bitlist)))
+
+    # 		self.bitid=str(bitlist[0])+"##"+str(bitlist[1])+"##"+str(bitlist[2])
+
+    def get_strand(self):
+        if self.bitid == "83##163##2129":
+            self.strand = "+"
+        elif self.bitid == "339##419##2385":
+            self.strand = "+"
+        elif self.bitid == "83##163##2209":
+            self.strand = "+"
+        elif self.bitid == "339##419##2465":
+            self.strand = "+"
+        elif self.bitid == "99##147##2193":
+            self.strand = "-"
+        elif self.bitid == "355##403##2449":
+            self.strand = "-"
+        elif self.bitid == "99##147##2145":
+            self.strand = "-"
+        elif self.bitid == "355##403##2401":
+            self.strand = "-"
+        elif self.bitid == "16##2064":
+            self.strand = "+"
+        elif self.bitid == "272##2320":
+            self.strand = "+"
+        elif self.bitid == "0##2048":
+            self.strand = "-"
+        elif self.bitid == "256##2304":
+            self.strand = "-"
+        elif self.bitid == "153##2201":
+            self.strand = "-"
+        else:
+            self.strand = "U"
+
+    def validate_read(self):
+        """
+        Checks if read is truly a BSJ originitor.
+        * Defines left, right and middle alignments
+        * Left and right alignments should not overlap
+        * Middle alignment should be between left and right alignments
+        """
+        if len(self.bitid.split("##")) == 3:
+            left = -1
+            right = -1
+            middle = -1
+            if self.bitid == "83##163##2129":
+                left = 2129
+                right = 83
+                middle = 163
+            if self.bitid == "339##419##2385":
+                left = 2385
+                right = 339
+                middle = 419
+            if self.bitid == "83##163##2209":
+                left = 163
+                right = 2209
+                middle = 83
+            if self.bitid == "339##419##2465":
+                left = 419
+                right = 2465
+                middle = 339
+            if self.bitid == "99##147##2145":
+                left = 99
+                right = 2145
+                middle = 147
+            if self.bitid == "355##403##2401":
+                left = 355
+                right = 2401
+                middle = 403
+            if self.bitid == "99##147##2193":
+                left = 2193
+                right = 147
+                middle = 99
+            if self.bitid == "355##403##2449":
+                left = 2449
+                right = 403
+                middle = 355
+            print(left, right, middle)
+            if left == -1 or right == -1 or middle == -1:
+                return False
+            if not (
+                self.refcoordinates[left][-1] < self.refcoordinates[right][0]
+                and self.refcoordinates[middle][-1] <= self.refcoordinates[right][-1]
+                and self.refcoordinates[middle][0] >= self.refcoordinates[left][0]
+            ):
+                print("HERE")
+                print(self.refcoordinates[left][-1])
+                print(self.refcoordinates[right][0])
+                print(self.refcoordinates[middle][-1])
+                print(self.refcoordinates[right][-1])
+                print(self.refcoordinates[middle][0])
+                print(self.refcoordinates[left][0])
+                print(self.refcoordinates[left][-1] < self.refcoordinates[right][0])
+                print(self.refcoordinates[middle][-1] <= self.refcoordinates[right][-1])
+                print(self.refcoordinates[middle][0] >= self.refcoordinates[left][0])
+                return False
+            else:
+                return True
+        else:
+            return False
+        # print("NOT_THREE",self.readid,self.bitid,self.refcoordinates.keys())
+        # 	if not (self.refcoordinates[163][-1] < self.refcoordinates[2209][0] and self.refcoordinates[83][-1] <= self.refcoordinates[2209][-1] and self.refcoordinates[83][0] >= self.refcoordinates[163][0]):
+        # 		print(self.readid,self.bitid)
+        # 		print(self.refcoordinates.keys())
+        # 		print(self.refcoordinates[163][0],self.refcoordinates[163][-1],"\t",self.refcoordinates[2209][0],self.refcoordinates[2209][-1])
+        # 		print(self.refcoordinates[83][0],self.refcoordinates[83][-1])
+
+    def get_start_end(self):
+        refcoordinates = self.refcoordinates
+        isread1 = self.isread1
+        if len(self.isread1) != 3:
+            refcoords = []
+            for i in refcoordinates.keys():
+                refcoords.extend(refcoordinates[i])
+        else:
+            l = []
+            for i in isread1.keys():
+                l.append(isread1[i])
+            Ycount = l.count("Y")
+            Ncount = l.count("N")
+            if Ycount > Ncount:
+                useread1 = "Y"
+            else:
+                useread1 = "N"
+            refcoords = []
+            for i in refcoordinates.keys():
+                if isread1[i] == useread1:
+                    refcoords.extend(refcoordinates[i])
+        refcoords = sorted(refcoords)
+        self.start = str(refcoords[0])
+        self.end = str(int(refcoords[-1]) + 1)
+
+    def get_bsjid(self):
+        t = []
+        t.append(self.refname)
+        t.append(self.start)
+        t.append(self.end)
+        t.append(self.strand)
+        return "##".join(t)
+
+    def write_out_reads(self, outbam):
+        for r in self.alignments:
+            outbam.write(r)
+
+
 def get_uniq_readid(r):
-	rname=r.query_name
-	hi=r.get_tag("HI")
-	rid=rname+"##"+str(hi)
-	return rid
+    rname = r.query_name
+    hi = r.get_tag("HI")
+    rid = rname + "##" + str(hi)
+    return rid
 
-def get_bitflag(r):
-	bitflag=str(r).split("\t")[1]
-	return int(bitflag)
 
+def get_bitflag(r):
+    bitflag = str(r).split("\t")[1]
+    return int(bitflag)
 
 
 def main():
-	debug = True
-	parser = argparse.ArgumentParser()
-	parser.add_argument("-i","--inbam",dest="inbam",required=True,type=argparse.FileType('r'),
-		help="Input bam file")
-	parser.add_argument("-p","--plusbam",dest="plusbam",required=True,type=argparse.FileType('w'),
-		help="Output plus strand bam file")
-	parser.add_argument("-m","--minusbam",dest="minusbam",required=True,type=argparse.FileType('w'),
-		help="Output plus strand bam file")
-	parser.add_argument("-b","--bed",dest="bed",required=True,type=argparse.FileType('w', encoding='UTF-8'),
-		help="Output BSJ bed file (with strand info)")
-	args = parser.parse_args()		
-	samfile = pysam.AlignmentFile(args.inbam, "rb")
-	plusfile = pysam.AlignmentFile(args.plusbam, "wb", template=samfile)
-	minusfile = pysam.AlignmentFile(args.minusbam, "wb", template=samfile)
-# 	bsjfile = open(args.bed,"w")
-	bigdict=dict()
-	for read in samfile.fetch():
-		if read.reference_id != read.next_reference_id: continue
-		rid=get_uniq_readid(read)
-		if debug:print(rid)
-		if not rid in bigdict:
-			bigdict[rid]=Readinfo(rid,read.reference_name)
-		bigdict[rid].append_alignment(read)
-		bitflag=get_bitflag(read)
-		if debug:print(bitflag)
-		bigdict[rid].append_bitflag(bitflag)
-		# bigdict[rid].extend_ref_positions(read.get_reference_positions(full_length=False))
-		refpos=list(filter(lambda x:x!=None,read.get_reference_positions(full_length=True)))
-		bigdict[rid].set_refcoordinates(bitflag,refpos)
-		bigdict[rid].set_read1_reverse_secondary_supplementary(bitflag,read)
-		# bigdict[rid].extend_ref_positions(list(filter(lambda x:x!=None,read.get_reference_positions(full_length=True))))
-		if debug:print(bigdict[rid])
-	bsjdict=dict()
-	bitid_counts=dict()
-	for rid in bigdict.keys():
-		bigdict[rid].generate_bitid()
-		if debug:print(bigdict[rid])
-		bigdict[rid].get_strand()
-		if not bigdict[rid].validate_read():
-			continue
-		if debug:print("HERE",bigdict[rid])
-		bigdict[rid].get_start_end()
-		# print(bigdict[rid])
-		if bigdict[rid].strand=="+":
-			bigdict[rid].write_out_reads(plusfile)
-		if bigdict[rid].strand=="-":
-			bigdict[rid].write_out_reads(minusfile)
-		bsjid=bigdict[rid].get_bsjid()
-		if not bsjid in bsjdict:
-			bsjdict[bsjid]=BSJ()
-			bsjdict[bsjid].set_chrom(bigdict[rid].refname)
-			bsjdict[bsjid].set_start(bigdict[rid].start)
-			bsjdict[bsjid].set_end(bigdict[rid].end)
-			bsjdict[bsjid].set_strand(bigdict[rid].strand)
-		bsjdict[bsjid].plusone()
-		bsjdict[bsjid].append_bitid(bigdict[rid].bitid)
-		if not bigdict[rid].bitid in bitid_counts:
-			bitid_counts[bigdict[rid].bitid]=0
-		bitid_counts[bigdict[rid].bitid]+=1
-		bsjdict[bsjid].append_rid(rid)
-		
-	for b in bitid_counts.keys():
-		print(b,bitid_counts[b])
-	
-	for bsjid in bsjdict.keys():
-		bsjdict[bsjid].write_out_BSJ(args.bed)
-		
-	plusfile.close()
-	minusfile.close()
-	samfile.close()
-	args.bed.close()
-	
-		
+    debug = True
+    parser = argparse.ArgumentParser()
+    parser.add_argument(
+        "-i",
+        "--inbam",
+        dest="inbam",
+        required=True,
+        type=argparse.FileType("r"),
+        help="Input bam file",
+    )
+    parser.add_argument(
+        "-p",
+        "--plusbam",
+        dest="plusbam",
+        required=True,
+        type=argparse.FileType("w"),
+        help="Output plus strand bam file",
+    )
+    parser.add_argument(
+        "-m",
+        "--minusbam",
+        dest="minusbam",
+        required=True,
+        type=argparse.FileType("w"),
+        help="Output plus strand bam file",
+    )
+    parser.add_argument(
+        "-b",
+        "--bed",
+        dest="bed",
+        required=True,
+        type=argparse.FileType("w", encoding="UTF-8"),
+        help="Output BSJ bed file (with strand info)",
+    )
+    args = parser.parse_args()
+    samfile = pysam.AlignmentFile(args.inbam, "rb")
+    plusfile = pysam.AlignmentFile(args.plusbam, "wb", template=samfile)
+    minusfile = pysam.AlignmentFile(args.minusbam, "wb", template=samfile)
+    # 	bsjfile = open(args.bed,"w")
+    bigdict = dict()
+    for read in samfile.fetch():
+        if read.reference_id != read.next_reference_id:
+            continue
+        rid = get_uniq_readid(read)
+        if debug:
+            print(rid)
+        if not rid in bigdict:
+            bigdict[rid] = Readinfo(rid, read.reference_name)
+        bigdict[rid].append_alignment(read)
+        bitflag = get_bitflag(read)
+        if debug:
+            print(bitflag)
+        bigdict[rid].append_bitflag(bitflag)
+        # bigdict[rid].extend_ref_positions(read.get_reference_positions(full_length=False))
+        refpos = list(
+            filter(lambda x: x != None, read.get_reference_positions(full_length=True))
+        )
+        bigdict[rid].set_refcoordinates(bitflag, refpos)
+        bigdict[rid].set_read1_reverse_secondary_supplementary(bitflag, read)
+        # bigdict[rid].extend_ref_positions(list(filter(lambda x:x!=None,read.get_reference_positions(full_length=True))))
+        if debug:
+            print(bigdict[rid])
+    bsjdict = dict()
+    bitid_counts = dict()
+    for rid in bigdict.keys():
+        bigdict[rid].generate_bitid()
+        if debug:
+            print(bigdict[rid])
+        bigdict[rid].get_strand()
+        if not bigdict[rid].validate_read():
+            continue
+        if debug:
+            print("HERE", bigdict[rid])
+        bigdict[rid].get_start_end()
+        # print(bigdict[rid])
+        if bigdict[rid].strand == "+":
+            bigdict[rid].write_out_reads(plusfile)
+        if bigdict[rid].strand == "-":
+            bigdict[rid].write_out_reads(minusfile)
+        bsjid = bigdict[rid].get_bsjid()
+        if not bsjid in bsjdict:
+            bsjdict[bsjid] = BSJ()
+            bsjdict[bsjid].set_chrom(bigdict[rid].refname)
+            bsjdict[bsjid].set_start(bigdict[rid].start)
+            bsjdict[bsjid].set_end(bigdict[rid].end)
+            bsjdict[bsjid].set_strand(bigdict[rid].strand)
+        bsjdict[bsjid].plusone()
+        bsjdict[bsjid].append_bitid(bigdict[rid].bitid)
+        if not bigdict[rid].bitid in bitid_counts:
+            bitid_counts[bigdict[rid].bitid] = 0
+        bitid_counts[bigdict[rid].bitid] += 1
+        bsjdict[bsjid].append_rid(rid)
 
+    for b in bitid_counts.keys():
+        print(b, bitid_counts[b])
 
+    for bsjid in bsjdict.keys():
+        bsjdict[bsjid].write_out_BSJ(args.bed)
 
-if __name__ == "__main__":
-    main()
+    plusfile.close()
+    minusfile.close()
+    samfile.close()
+    args.bed.close()
 
 
+if __name__ == "__main__":
+    main()