From f31c776a31f4f4616b17a2f5232362ffa4a008a6 Mon Sep 17 00:00:00 2001 From: "github-actions[bot]" Date: Fri, 13 Mar 2026 03:51:28 +0000 Subject: [PATCH 1/2] =?UTF-8?q?chore:=20=F0=9F=A4=96=20sync=20copilot=20in?= =?UTF-8?q?structions=20-=202026-03-13?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit --- .github/copilot-instructions.md | 164 ++++++++++++++++++++++++++++++++ 1 file changed, 164 insertions(+) create mode 100644 .github/copilot-instructions.md diff --git a/.github/copilot-instructions.md b/.github/copilot-instructions.md new file mode 100644 index 0000000..ab79ac9 --- /dev/null +++ b/.github/copilot-instructions.md @@ -0,0 +1,164 @@ +# CoPilot Instructions for CCBR Repositories + +## Reviewer guidance (what to look for in PRs) + +- Reviewers must validate enforcement rules: no secrets, container specified, and reproducibility pins. +- If code is AI-generated, reviewers must ensure the author documents what was changed and why, and that the PR is labeled `generated-by-AI`. +- Reviewers should verify license headers and ownership metadata (for example, `CODEOWNERS`) are present. +- Reviews must read the code and verify that it adheres to the project's coding standards, guidelines, and best practices in software engineering. + +## CI & enforcement suggestions (automatable) + +1. **PR template**: include optional AI-assistance disclosure fields (model used, high-level prompt intent, manual review confirmation). +2. **Pre-merge check (GitHub Action)**: verify `.github/copilot-instructions.md` is present in the repository and that new pipeline files include a `# CRAFT:` header. +3. **Lint jobs**: `ruff` for Python, `shellcheck` for shell, `lintr` for R, and `nf-core lint` or Snakemake lint checks where applicable. +4. **Secrets scan**: run `TruffleHog` or `Gitleaks` on PRs to detect accidental credentials. +5. **AI usage label**: if AI usage is declared, an Action should add `generated-by-AI` label (create this label if it does not exist); the PR body should end with the italicized Markdown line: _Generated using AI_, and any associated commit messages should end with the plain footer line: `Generated using AI`. + +_Sample GH Action check (concept): if AI usage is declared, require an AI-assistance disclosure field in the PR body._ + +## Security & compliance (mandatory) + +- Developers must not send PHI or sensitive NIH internal identifiers to unapproved external AI services; use synthetic examples. +- Repository content must only be sent to model providers approved by NCI/NIH policy (for example, Copilot for Business or approved internal proxies). +- For AI-assisted actions, teams must keep an auditable record including: user, repository, action, timestamp, model name, and endpoint. +- If using a server wrapper (Option C), logs must include the minimum metadata above and follow institutional retention policy. +- If policy forbids external model use for internal code, teams must use approved local/internal LLM workflows. + +## Operational notes (practical) + +- `copilot-instructions.md` should remain concise and prescriptive; keep only high-value rules and edge-case examples. +- Developers should include the CRAFT block in edited files when requesting substantial generated code to improve context quality. +- CoPilot must ask the user for permission before deleting any file unless the file was created by CoPilot for a temporary run or test. +- CoPilot must not edit any files outside of the current open workspace. + +## Code authoring guidance + +- Code must not include hard-coded secrets, credentials, or sensitive absolute paths on disk. +- Code should be designed for modularity, reusability, and maintainability. It should ideally be platform-agnostic, with special support for running on the Biowulf HPC. +- Use pre-commit to enforce code style and linting during the commit process. + +### Pipelines + +- Authors must review existing CCBR pipelines first: . +- New pipelines should follow established CCBR conventions for folder layout, rule/process naming, config structure, and test patterns. +- Pipelines must define container images and pin tool/image versions for reproducibility. +- Contributions should include a test dataset and a documented example command. + +#### Snakemake + +- In general, new pipelines should be created with Nextflow rather than Snakemake, unless there is a compelling reason to use Snakemake. +- Generate new pipelines from the CCBR_SnakemakeTemplate repo: +- For Snakemake, run `snakemake --lint` and a dry-run before PR submission. + +#### Nextflow + +- Generate new pipelines from the CCBR_NextflowTemplate repo: +- For Nextflow pipelines, authors must follow nf-core patterns and references: . +- Nextflow code must use DSL2 only (DSL1 is not allowed). +- For Nextflow, run `nf-core lint` (or equivalent checks) before PR submission. +- Where possible, reuse modules and subworkflows from CCBR/nf-modules or nf-core/modules. +- New modules and subworkflows should be tested with `nf-test`. + +### Python scripts and packages + +- Python scripts must include module and function/class docstrings. +- Where a standard CLI framework is adopted, Python CLIs should use `click` or `typer` for consistency with existing components. +- Scripts must support `--help` and document required/optional arguments. +- Python code must follow [PEP 8](https://peps.python.org/pep-0008/), use `snake_case`, and include type hints for public functions. +- Scripts must raise descriptive error messages on failure and warnings when applicable. Prefer raising an exception over printing an error message, and over returning an error code. +- Python code should pass `ruff`; +- Each script must include a documented example usage in comments or README. +- Tests should be written with `pytest`. Other testing frameworks may be used if justified. +- Do not catch bare exceptions. The exception type must always be specified. +- Only include one return statement at the end of a function. + +### R scripts and packages + +- R scripts must include function and class docstrings via roxygen2. +- CLIs must be defined using the `argparse` package. +- CLIs must support `--help` and document required/optional arguments. +- R code should pass `lintr` and `air`. +- Tests should be written with `testthat`. +- Packages should pass `devtools::check()`. +- R code should adhere to the tidyverse style guide. https://style.tidyverse.org/ +- Only include one return statement at the end of a function, if a return statement is used at all. Explicit returns are preferred but not required for R functions. + +## AI-generated commit messages (Conventional Commits) + +- Commit messages must follow [Conventional Commits](https://www.conventionalcommits.org/en/v1.0.0/) (as enforced in `CONTRIBUTING.md`). +- Generate messages from staged changes only (`git diff --staged`); do not include unrelated work. +- Commits should be atomic: one logical change per commit. +- If mixed changes are present, split into multiple logical commits; the number of commits does not need to equal the number of files changed. +- Subject format must be: `(optional-scope): short imperative summary` (<=72 chars), e.g., `fix(profile): update release table parser`. +- Add a body only when needed to explain **why** and notable impact; never include secrets, tokens, PHI, or large diffs. +- For AI-assisted commits, add this final italicized footer line in the commit message body: _commit message is ai-generated_ + +Suggested prompt for AI tools: + +```text +Create a Conventional Commit message from this staged diff. +Rules: +1) Use one of: feat|fix|docs|style|refactor|perf|test|build|ci|chore|revert. +2) Keep subject <= 72 chars, imperative mood, no trailing period. +3) Include optional scope when clear. +4) Add a short body only if needed (why/impact), wrapped at ~72 chars. +5) Output only the final commit message. +``` + +## Pull Requests + +When opening a pull request, use the repository's pull request template (usually it is `.github/PULL_REQUEST_TEMPLATE.md`). +Different repos have different PR templates depending on their needs. +Ensure that the pull request follows the repository's PR template and includes all required information. +Do not allow the developer to proceed with opening a PR if it does not fill out all sections of the template. +Before a PR can be moved from draft to "ready for review", all of the relevant checklist items must be checked, and any +irrelevant checklist items should be crossed out. + +When new features, bug fixes, or other behavioral changes are introduced to the code, +unit tests must be added or updated to cover the new or changed functionality. + +If there are any API or other user-facing changes, the documentation must be updated both inline via docstrings and long-form docs in the `docs/` or `vignettes/` directory. + +When a repo contains a build workflow (i.e. a workflow file in `.github/workflows` starting with `build` or named `R-CMD-check`), +the build workflow must pass before the PR can be approved. + +### Changelog + +The changelog for the repository should be maintained in a `CHANGELOG.md` file +(or `NEWS.md` for R packages) at the root of the repository. Each pull request +that introduces user-facing changes must include a concise entry with the PR +number and author username tagged. Developer-only changes (i.e. updates to CI +workflows, development notes, etc.) should never be included in the changelog. +Example: + +``` +## development version + +- Fix bug in `detect_absolute_paths()` to ignore comments. (#123, @username) +``` + +## Onboarding checklist for new developers + +- [ ] Read `.github/CONTRIBUTING.md` and `.github/copilot-instructions.md`. +- [ ] Configure VSCode workspace to open `copilot-instructions.md` by default (so Copilot Chat sees it). +- [ ] Install pre-commit and run `pre-commit install`. + +## Appendix: VSCode snippet (drop into `.vscode/snippets/craft.code-snippets`) + +```json +{ + "Insert CRAFT prompt": { + "prefix": "craft", + "body": [ + "/* C: Context: Repo=${workspaceFolderBasename}; bioinformatics pipelines; NIH HPC (Biowulf/Helix); containers: quay.io/ccbr */", + "/* R: Rules: no PHI, no secrets, containerize, pin versions, follow style */", + "/* F: Flow: inputs/ -> results/, conf/, tests/ */", + "/* T: Tests: provide a one-line TEST_CMD and expected output */", + "", + "A: $1" + ], + "description": "Insert CRAFT prompt and place cursor at Actions" + } +} +``` From f654306e7b22e419b8923f2a8f9d20d6220348b2 Mon Sep 17 00:00:00 2001 From: "pre-commit-ci[bot]" <66853113+pre-commit-ci[bot]@users.noreply.github.com> Date: Fri, 13 Mar 2026 03:53:06 +0000 Subject: [PATCH 2/2] [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --- docs/RNA-seq/gui.md | 16 +++----- docs/RNA-seq/output.md | 37 +++++++++---------- docs/troubleshooting.md | 6 +-- src/renee/__main__.py | 4 +- src/renee/initialize.py | 4 +- src/renee/util.py | 14 +++---- tests/test_util.py | 59 +++++++++++------------------- workflow/scripts/rNA_flowcells.Rmd | 6 +-- 8 files changed, 60 insertions(+), 86 deletions(-) diff --git a/docs/RNA-seq/gui.md b/docs/RNA-seq/gui.md index 1a175e9..05e70a7 100644 --- a/docs/RNA-seq/gui.md +++ b/docs/RNA-seq/gui.md @@ -16,7 +16,7 @@ ssh -Y $USER@biowulf.nih.gov ### 2.2 Grab an interactive node -> **NOTE**: Make sure to add `--tunnel` flag to the sinteractive command for correct display settings. See details here: https://hpc.nih.gov/docs/tunneling/ +> **NOTE**: Make sure to add `--tunnel` flag to the sinteractive command for correct display settings. See details here: https://hpc.nih.gov/docs/tunneling/ ```bash # Setup Step 2.) Please do not run RENEE on the head node! @@ -37,7 +37,7 @@ module load ccbrpipeliner If the module was loaded correctly, the greetings message should be displayed. ```bash -[+] Loading ccbrpipeliner 6 ... +[+] Loading ccbrpipeliner 6 ... ########################################################################### CCBR Pipeliner release 6 ########################################################################### @@ -47,7 +47,7 @@ If the module was loaded correctly, the greetings message should be displayed. Tools are available on BIOWULF, HELIX and FRCE. The following pipelines/tools will be loaded in this module: - + PIPELINES: RENEE v2.5 https://ccbr.github.io/RENEE/ XAVIER v3.0 https://ccbr.github.io/XAVIER/ @@ -55,11 +55,11 @@ If the module was loaded correctly, the greetings message should be displayed. CHAMPAGNE v0.3 https://ccbr.github.io/CHAMPAGNE/ CRUISE v0.1 https://ccbr.github.io/CRUISE/ ASPEN v1.0 https://ccbr.github.io/ASPEN/ - + TOOLS: spacesavers2 v0.12 https://ccbr.github.io/spacesavers2/ permfix v0.6 https://github.com/ccbr/permfix/ - + ########################################################################### Thank you for using CCBR Pipeliner Comments/Questions/Requests: @@ -97,15 +97,12 @@ or use the **Browse** tab to choose the input and output directories ![renee_browsePath](images/gui_browse.png) - ![renee_enterPath](images/gui_path.png) - Next, from the drop down menu select the reference genome (hg38/mm10) ![renee_genome](images/gui_genome.png) - ### 3.3 Submit RENEE job After all the information is filled out, press **Submit**. @@ -122,7 +119,6 @@ Click **Yes** An email notification will be sent out when the pipeline starts and ends. - ## 4. Special instructions regarding X11 Window System RENEE GUI natively uses the X11 Window System to run RENEE pipeline and display the graphics on a personal desktop or laptop. The X11 Window System can be used to run a program on Biowulf and display the graphics on a desktop or laptop. However, X11 can be unreliable and fail with many graphics applications used on Biowulf. The HPC staff recommends NoMachine (NX) for users who need to run graphics applications. @@ -139,4 +135,4 @@ and start an interactive session (with `--tunnel` flag). Similar to the instructions above, load the `ccbrpipeliner` module and enter `renee gui` to launch the RENEE gui. -![gui_nx_renee](images/gui_nx_renee.png) \ No newline at end of file +![gui_nx_renee](images/gui_nx_renee.png) diff --git a/docs/RNA-seq/output.md b/docs/RNA-seq/output.md index 5e17894..a033824 100644 --- a/docs/RNA-seq/output.md +++ b/docs/RNA-seq/output.md @@ -3,7 +3,7 @@ After a successful `renee` run execution for multisample paired-end data, the fo ```bash renee_output/ ├── bams -├── config +├── config ├── config.json # Contains the configuration and parameters used for this specific RENEE run ├── DEG_ALL ├── dryrun.{datetime}.log # Output from the dry-run of the pipeline @@ -11,9 +11,9 @@ renee_output/ ├── FQscreen2 ├── fusions ├── kraken -├── logfiles -├── nciccbr -├── preseq +├── logfiles +├── nciccbr +├── preseq ├── QC ├── QualiMap ├── rawQC @@ -41,7 +41,7 @@ Contains the STAR aligned reads for each sample analyzed in the run. ```bash /bams/ ├── sample1.fwd.bw # forward strand bigwig files suitable for a genomic track viewer like IGV -├── sample1.rev.bw # reverse strand bigwig files +├── sample1.rev.bw # reverse strand bigwig files ├── sample1.p2.Aligned.toTranscriptome.out.bam # BAM alignments to transcriptome using STAR in two-pass mode ├── sample1.star_rg_added.sorted.dmark.bam # Read groups added and duplicates marked genomic BAM file (using STAR in two-pass mode) ├── sample1.star_rg_added.sorted.dmark.bam.bai @@ -54,7 +54,6 @@ Contains the STAR aligned reads for each sample analyzed in the run. Contains config files for the pipeline. - ### 3. `DEG_ALL` Contains the output from RSEM estimating gene and isoform expression levels for each sample and also combined data matrix with all samples. @@ -64,15 +63,15 @@ Contains the output from RSEM estimating gene and isoform expression levels for ├── combined_TIN.tsv # RSeQC logfiles containing transcript integrity number information for all samples ├── RSEM.genes.expected_count.all_samples.txt # Expected gene counts matrix for all samples (useful for downstream differential expression analysis) ├── RSEM.genes.expected_counts.all_samples.reformatted.tsv # Expected gene counts matrix for all samples with reformatted gene symbols (format: ENSEMBLID | GeneName) -├── RSEM.genes.FPKM.all_samples.txt # FPKM Normalized expected gene counts matrix for all samples +├── RSEM.genes.FPKM.all_samples.txt # FPKM Normalized expected gene counts matrix for all samples ├── RSEM.genes.TPM.all_samples.txt # TPM Normalized expected gene counts matrix for all samples ├── RSEM.isoforms.expected_count.all_samples.txt # File containing isoform level expression estimates for all samples. -├── RSEM.isoforms.FPKM.all_samples.txt # FPKM Normalized expected isoform counts matrix for all samples +├── RSEM.isoforms.FPKM.all_samples.txt # FPKM Normalized expected isoform counts matrix for all samples ├── RSEM.isoforms.TPM.all_samples.txt # TPM Normalized expected isoform counts matrix for all samples ├── sample1.RSEM.genes.results # Expected gene counts for sample 1 ├── sample1.RSEM.isoforms.results # Expected isoform counts for sample 1 ├── sample1.RSEM.stat # RSEM stats for sample 1 -│   ├── sample1.RSEM.cnt +│   ├── sample1.RSEM.cnt │   ├── sample1.RSEM.model │   └── sample1.RSEM.theta ├── sample1.RSEM.time # Run time log for sample 1 @@ -100,8 +99,8 @@ Contains gene fusions output for each sample. ```bash fusions/ ├── sample1_fusions.arriba.pdf -├── sample1_fusions.discarded.tsv # Contains all events that Arriba classified as an artifact or that are also observed in healthy tissue. -├── sample1_fusions.tsv # Contains fusions for sample 1 which pass all of Arriba's filters. The predictions are listed from highest to lowest confidence. +├── sample1_fusions.discarded.tsv # Contains all events that Arriba classified as an artifact or that are also observed in healthy tissue. +├── sample1_fusions.tsv # Contains fusions for sample 1 which pass all of Arriba's filters. The predictions are listed from highest to lowest confidence. ├── sample1.p2.arriba.Aligned.sortedByCoord.out.bam # Sorted BAM file for Arriba's Visualization ├── sample1.p2.arriba.Aligned.sortedByCoord.out.bam.bai ├── sample1.p2.Log.final.out # STAR final log file @@ -109,12 +108,12 @@ fusions/ ├── sample1.p2.Log.progress.out # log files ├── sample1.p2.Log.std.out # STAR runtime output log ├── sample1.p2.SJ.out.tab # Summarizes the high confidence splice junctions for sample 1 -├── sample1.p2._STARgenome # Extra files generated during STAR aligner +├── sample1.p2._STARgenome # Extra files generated during STAR aligner │   ├── exonGeTrInfo.tab │   ├── . │   ├── . -│   └── transcriptInfo.tab -├── sample1.p2._STARpass1 # Extra files generated during STAR first pass +│   └── transcriptInfo.tab +├── sample1.p2._STARpass1 # Extra files generated during STAR first pass │   ├── . │   └── . ... @@ -129,7 +128,7 @@ Contains per sample kraken output files which is a Quality-control step to asses ### 7. `logfiles` -Contains logfiles for the entire RENEE run, job error/output files for each individual job that was submitted to SLURM, and some other stats generated by different software. Important to diagnose errors if the pipeline fails. The per sample stats information is present in the mulitQC report. +Contains logfiles for the entire RENEE run, job error/output files for each individual job that was submitted to SLURM, and some other stats generated by different software. Important to diagnose errors if the pipeline fails. The per sample stats information is present in the mulitQC report. ```bash /logfiles/ @@ -150,8 +149,8 @@ Contains logfiles for the entire RENEE run, job error/output files for each indi │   .. │   . ├── snakemake.log # The snakemake log file which documents the entire pipeline log -├── snakemake.log.jobby # Detailed summary report for each individual job. -└── snakemake.log.jobby.short # Short summary report for each individual job. +├── snakemake.log.jobby # Detailed summary report for each individual job. +└── snakemake.log.jobby.short # Short summary report for each individual job. ``` ### 8. `nciccbr` @@ -172,7 +171,7 @@ Contains per sample output for Quality-control step to assess various post-align ### 12. `Reports` -Contains the multiQC report which visually summarizes the quality control metrics and other statistics for each sample (`multiqc_report.html`). All the data tables used to generate the multiQC report is available in the `multiqc_data` folder. The `RNA_report.html` file is an interactive report the aggregates sample quality-control metrics across all samples. This interactive report to allow users to identify problematic samples prior to downstream analysis. It uses flowcell and lane information from the FastQ file. +Contains the multiQC report which visually summarizes the quality control metrics and other statistics for each sample (`multiqc_report.html`). All the data tables used to generate the multiQC report is available in the `multiqc_data` folder. The `RNA_report.html` file is an interactive report the aggregates sample quality-control metrics across all samples. This interactive report to allow users to identify problematic samples prior to downstream analysis. It uses flowcell and lane information from the FastQ file. ### 13. `resources` @@ -204,4 +203,4 @@ trim ### 17. `workflow` -Contains the RENEE pipeline workflow. \ No newline at end of file +Contains the RENEE pipeline workflow. diff --git a/docs/troubleshooting.md b/docs/troubleshooting.md index 6dc2259..59f57a1 100644 --- a/docs/troubleshooting.md +++ b/docs/troubleshooting.md @@ -36,14 +36,13 @@ To check the status of each individual job submitted to the cluster, there are s Each job that RENEE submits to the cluster starts with the `pl:` prefix. - **Q: What if the pipeline is finished running but I received a "FAILED" status? How do I identify failed jobs?** **A.** In case there was some error during the run, the easiest way to diagnose the problem is to go to logfiles folder within the RENEE output folder and look at the `snakemake.log.jobby.short` file. It contains three columns: jobname, state, and std_err. The jobs that completed successfully would have "COMPLETED" state and jobs that failed would have the FAILED state. !!! tldr "Find Failed Jobs" - === "SLURM output files" - +=== "SLURM output files" + All the failed jobs would be listed with absolute paths to the error file (with extension `.err`). Go through the error files corresponding to the FAILED jobs (std_err) to explore why the job failed. ```bash @@ -53,7 +52,6 @@ To check the status of each individual job submitted to the cluster, there are s # List the files that failed grep "FAILED" snakemake.log.jobby.short | less ``` - Many failures are caused by filesystem or network issues on Biowulf, and in such cases, simply re-starting the Pipeline should resolve the issue. Snakemake will dynamically determine which steps have been completed, and which steps still need to be run. If you are still running into problems after re-running the pipeline, there may be another issue. If that is the case, please feel free to [contact us](https://github.com/CCBR/RENEE/issues). diff --git a/src/renee/__main__.py b/src/renee/__main__.py index db3fab5..c87668f 100755 --- a/src/renee/__main__.py +++ b/src/renee/__main__.py @@ -315,9 +315,7 @@ def configure_build(sub_args, git_repo, output_path): # If a partition was provided, update the copied cluster.json default partition if hasattr(sub_args, "partition") and sub_args.partition: update_cluster_partition( - output_path, - sub_args.partition, - context="after build configuration" + output_path, sub_args.partition, context="after build configuration" ) _reset_write_permission(target=output_path) _configure( diff --git a/src/renee/initialize.py b/src/renee/initialize.py index c61344e..2f5d4b5 100644 --- a/src/renee/initialize.py +++ b/src/renee/initialize.py @@ -49,9 +49,7 @@ def initialize(sub_args, repo_path, output_path): # If a partition was provided, update the copied cluster.json default partition if hasattr(sub_args, "partition") and sub_args.partition: update_cluster_partition( - output_path, - sub_args.partition, - context="after initialization" + output_path, sub_args.partition, context="after initialization" ) # Create renamed symlinks to rawdata diff --git a/src/renee/util.py b/src/renee/util.py index aace863..a04d37b 100644 --- a/src/renee/util.py +++ b/src/renee/util.py @@ -42,10 +42,10 @@ def get_shared_resources_dir(shared_dir, hpc=get_hpcname()): def update_cluster_partition(output_path, partition, context=""): """Update the default partition in cluster.json. - + Reads cluster.json from the output directory, updates the __default__ partition, and writes it back with proper formatting. - + @param output_path : Path to the output directory containing config/cluster.json @param partition : @@ -58,12 +58,12 @@ def update_cluster_partition(output_path, partition, context=""): """ cluster_json = os.path.join(output_path, "config", "cluster.json") context_msg = f" {context}" if context else "" - + if not os.path.exists(cluster_json): raise FileNotFoundError( f"Expected cluster.json at '{cluster_json}'{context_msg}" ) - + with open(cluster_json, "r") as fh: try: cluster_cfg = json.load(fh) @@ -71,13 +71,13 @@ def update_cluster_partition(output_path, partition, context=""): raise RuntimeError( f"Malformed JSON in cluster.json at '{cluster_json}'" ) from e - + if "__default__" not in cluster_cfg: raise KeyError( f"cluster.json missing '__default__' section at '{cluster_json}'" ) - + cluster_cfg["__default__"]["partition"] = partition - + with open(cluster_json, "w") as fh: json.dump(cluster_cfg, fh, indent=4, sort_keys=True) diff --git a/tests/test_util.py b/tests/test_util.py index 709eaaf..32fe382 100644 --- a/tests/test_util.py +++ b/tests/test_util.py @@ -88,28 +88,22 @@ def test_update_cluster_partition_success(): config_dir = os.path.join(tmp_dir, "config") os.makedirs(config_dir) cluster_json_path = os.path.join(config_dir, "cluster.json") - + # Create a valid cluster.json cluster_data = { - "__default__": { - "partition": "norm", - "mem": "8g", - "threads": "1" - }, - "some_rule": { - "mem": "16g" - } + "__default__": {"partition": "norm", "mem": "8g", "threads": "1"}, + "some_rule": {"mem": "16g"}, } with open(cluster_json_path, "w") as fh: json.dump(cluster_data, fh, indent=4, sort_keys=True) - + # Update the partition update_cluster_partition(tmp_dir, "long") - + # Verify the update with open(cluster_json_path, "r") as fh: updated_data = json.load(fh) - + assert updated_data["__default__"]["partition"] == "long" assert updated_data["__default__"]["mem"] == "8g" # Other fields unchanged assert updated_data["some_rule"]["mem"] == "16g" # Other rules unchanged @@ -121,7 +115,7 @@ def test_update_cluster_partition_with_context(): # Don't create cluster.json - should raise FileNotFoundError with pytest.raises(FileNotFoundError) as exc_info: update_cluster_partition(tmp_dir, "long", context="after initialization") - + assert "after initialization" in str(exc_info.value) assert "cluster.json" in str(exc_info.value) @@ -131,10 +125,10 @@ def test_update_cluster_partition_file_not_found(): with tempfile.TemporaryDirectory() as tmp_dir: os.makedirs(os.path.join(tmp_dir, "config")) # Don't create cluster.json - + with pytest.raises(FileNotFoundError) as exc_info: update_cluster_partition(tmp_dir, "short") - + assert "cluster.json" in str(exc_info.value) @@ -144,14 +138,14 @@ def test_update_cluster_partition_malformed_json(): config_dir = os.path.join(tmp_dir, "config") os.makedirs(config_dir) cluster_json_path = os.path.join(config_dir, "cluster.json") - + # Write malformed JSON with open(cluster_json_path, "w") as fh: fh.write("{invalid json content") - + with pytest.raises(RuntimeError) as exc_info: update_cluster_partition(tmp_dir, "long") - + assert "Malformed JSON" in str(exc_info.value) assert "cluster.json" in str(exc_info.value) @@ -162,20 +156,15 @@ def test_update_cluster_partition_missing_default_section(): config_dir = os.path.join(tmp_dir, "config") os.makedirs(config_dir) cluster_json_path = os.path.join(config_dir, "cluster.json") - + # Create cluster.json without __default__ section - cluster_data = { - "some_rule": { - "partition": "norm", - "mem": "16g" - } - } + cluster_data = {"some_rule": {"partition": "norm", "mem": "16g"}} with open(cluster_json_path, "w") as fh: json.dump(cluster_data, fh, indent=4, sort_keys=True) - + with pytest.raises(KeyError) as exc_info: update_cluster_partition(tmp_dir, "long") - + assert "__default__" in str(exc_info.value) assert "cluster.json" in str(exc_info.value) @@ -186,27 +175,23 @@ def test_update_cluster_partition_preserves_formatting(): config_dir = os.path.join(tmp_dir, "config") os.makedirs(config_dir) cluster_json_path = os.path.join(config_dir, "cluster.json") - + # Create a cluster.json with multiple keys to verify sorting cluster_data = { - "__default__": { - "partition": "norm", - "mem": "8g", - "threads": "1" - }, + "__default__": {"partition": "norm", "mem": "8g", "threads": "1"}, "z_rule": {"mem": "16g"}, - "a_rule": {"mem": "32g"} + "a_rule": {"mem": "32g"}, } with open(cluster_json_path, "w") as fh: json.dump(cluster_data, fh, indent=4, sort_keys=True) - + # Update the partition update_cluster_partition(tmp_dir, "long") - + # Read the raw file content to check formatting with open(cluster_json_path, "r") as fh: content = fh.read() - + # Verify indentation (4 spaces) assert ' "__default__"' in content # Verify sorting: a_rule should come before z_rule diff --git a/workflow/scripts/rNA_flowcells.Rmd b/workflow/scripts/rNA_flowcells.Rmd index 387c65a..70bbf09 100755 --- a/workflow/scripts/rNA_flowcells.Rmd +++ b/workflow/scripts/rNA_flowcells.Rmd @@ -12,7 +12,7 @@ output: navbar: - { title: "Pipeline Documentation", href: "https://ccbr.github.io/pipeliner-docs/RNA-seq/Theory-and-practical-guide-for-RNA-seq/", align: right } source_code: "embed" -editor_options: +editor_options: chunk_output_type: console --- @@ -379,8 +379,8 @@ plot_pca_data <- function(dat, pca_max) { "
% UTR: ", pct_utr_bases, "
% Intronic: ", pct_intronic_bases, "

Sequence Range: ", sequence_length, - "
GC Content: ", gc_content, - "
Inner Distance Maxima: ", inner_distance_maxima, + "
GC Content: ", gc_content, + "
Inner Distance Maxima: ", inner_distance_maxima, "
Insert Size: ", median_insert_size ) ))