Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
35 commits
Select commit Hold shift + click to select a range
6351c4d
most minimal changes that still work in dry run
leawlb Oct 4, 2022
18d078d
customized for one object_ID as identifier
leawlb Oct 5, 2022
ded0ee9
WIP small addition for customization later
leawlb Oct 5, 2022
028af44
WIP reying to run rule cellranger_count
leawlb Oct 5, 2022
17e1d0d
path changes to try successful run
leawlb Oct 6, 2022
31cc8e7
minor change for successful run with test dataset
leawlb Oct 6, 2022
1598700
added R script for constructing SCE objects
leawlb Oct 6, 2022
cf7af69
added R script for SCE object construction
leawlb Oct 6, 2022
2fa037f
WIP trying to change 'individual' to identifier wildcards
leawlb Oct 6, 2022
224a296
WIP going back to individual
leawlb Oct 7, 2022
135c3c1
addition of metadata to SCE objects during construction
leawlb Oct 7, 2022
e969133
minor fixes
leawlb Oct 7, 2022
26b7d6f
Conflicts:
leawlb Oct 10, 2022
9276f95
restored modified versions after merge
leawlb Oct 10, 2022
a289532
minor fixes
leawlb Oct 10, 2022
0b12e34
minor adjustments
leawlb Oct 10, 2022
9a869c7
SCe construction now functional with metadata_full
leawlb Oct 11, 2022
cf8fb83
concatenate multiple identifiers
leawlb Oct 11, 2022
f415304
minor changes for functionality
leawlb Oct 12, 2022
1489930
changed subfolderstructure, added second wildcard again
leawlb Oct 21, 2022
fd312f0
add automatic resolution of required conda env for SCE construction
leawlb Nov 9, 2022
3bdeeec
removed manual addition of identifiers to samples.py
leawlb Nov 9, 2022
32a5d55
various corrections
leawlb Nov 27, 2022
7931239
altered individual column generation and renamed config
leawlb Nov 28, 2022
91401fa
Update README.md
leawlb Nov 28, 2022
28fc106
added complete env files and smalle improvements
leawlb Jan 26, 2023
1764867
Merge branch 'add_SCE' of https://github.com/odomlab2/snakemake-cellr…
leawlb Jan 31, 2023
36d3654
Merge branch 'add_SCE' of https://github.com/odomlab2/snakemake-cellr…
leawlb Jan 31, 2023
ed54ac1
extract method to rename BIRTH columns
Feb 17, 2023
5d9d69f
improved readability
leawlb Mar 27, 2023
7e7a3fe
Merge branch 'add_SCE' of https://github.com/odomlab2/snakemake-cellr…
leawlb Mar 27, 2023
47c9b94
readability improvements
leawlb Mar 27, 2023
c5033c6
Update workflow/scripts/samples.py
leawlb Mar 27, 2023
b2d5b8c
adjusted select_columns function plus minor corrections
leawlb Mar 28, 2023
cdcda74
minor corrections
leawlb Mar 30, 2023
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
21 changes: 19 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -49,12 +49,29 @@ Minimal changes needed are:

## How to run

Use snakemake_cellranger.yaml to create an environment with all required packages.

```bash
micromamba create -f snakemake_cellranger.yaml
```

or

```bash
conda env create -f snakemake_cellranger.yaml
```

Set channel priority to strict.

```bash
conda config --set channel_priority strict
```

You may call the pipeline as follows in the directory where you cloned it.

```bash
snakemake --cluster "bsub -n16 -q verylong -R rusage[mem=200GB]" -p -j4 -c42 --configfile config/config-cluster.yaml --use-conda --use-envmodules
snakemake --cluster "bsub -n16 -q verylong -R rusage[mem=200GB]" -p -j4 -c42 --configfile config/config-cluster.yaml --use-conda --use-envmodules --conda-frontend conda
```

- `--cluster` may change depending on the computational footprint of your analyses
- `--configfile` should point to your personal configuration

9 changes: 4 additions & 5 deletions config/config-example.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -22,16 +22,15 @@ metadata:
- ""
# Define the column names used to define a sample
identifiers:
- SAMPLE_NAME
- PID
- Sample_Type
- Age
- fraction
# Define all columns from the metadata spreadsheet that
# will be included in the SingleCellExperiment / Seurat
# objects
single_cell_object_metadata_fields:
- SAMPLE_NAME
- Age
- PID
- Sample_Type
- individual
# Enable / Disable rules and specifiy rule-specific parameters
rules:
cellranger_count:
Expand Down
58 changes: 58 additions & 0 deletions config/config-interspecies-bonemarrow.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,58 @@
paths:
# directory to store output data = cellranger files and SingleCellExperiment objects
# paths to input data = fastq files are stored in metadata sheet (OTP export)
output_dir: "/omics/odcf/analysis/OE0538_projects/DO-0008/data"
target_templates:
# paths for each type of output file
# remove "/{0[Species_ID]}" to store all objects in one directory or
# replace "Species_ID" with other colname as suitable wildcard from metadata sheet (e.g. Sample_Type, ...) to create subfolders
# remove or replace all instances of Species_ID in Snakefile and samples.py
linked_files: "cellranger/linked_files/{0[Species_ID]}/{0[individual]}"
cellranger_count: "cellranger/cellranger_count/{0[Species_ID]}/{0[individual]}/outs"
construct_sce_objects: "sce_objects/01_cellranger_output/{0[Species_ID]}/sce_{0[individual]}-01"
# define specific target files
target_files:
samples_sheet: "metadata.csv"
references:
all_masked: "/omics/groups/OE0538/internal/shared_data/CellRangerReferences/GRCm38_masked_allStrains/"
metadata:
table:
# should have same format as or be based on OTP export sheet
- "/omics/odcf/analysis/OE0538_projects/DO-0008/metadata/OE0538_DO-0008_metadata_combined.csv"
identifiers:
# define one or multiple column name(s) that uniquely define a sample/object
# multiple cols are concatenated into one sample-specific name
# in the format of {Species_ID}_{Age_ID}_{Fraction_ID}_{Sample_NR}
- Species_ID # columns I manually added to metadata sheet
- Age_ID
- Fraction_ID
- Sample_NR
#- PID # columns included in OTP metadata sheet
#- Sample_Type
single_cell_object_metadata_fields:
# Define all columns from the metadata spreadsheet that
# will be included in the SingleCellExperiment / Seurat objects
- Object_ID # columns I manually added to metadata sheet
- Mouse_ID
- Species_ID
- Age
- Age_ID
- Age_weeks
- Fraction
- Fraction_ID
- Antibody_combination
- Sample_NR
- Extrarun
- Batch_exp_day
- Batch_sequencing
- Date_collected
- Keep_sample
- individual
- PID
# Enable / Disable rules and specifiy rule-specific parameters
rules:
cellranger_count:
extra: "" # set additional arguments for cellranger count
allele_specific: False
wasp_filter_reads: False
realign_bam: False
181 changes: 181 additions & 0 deletions snakemake_cellranger.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,181 @@
# this env is required, please install and activate for running snakemake
name: snakemake-cellranger
channels:
- conda-forge
- bioconda
- defaults
dependencies:
- libsqlite=3.39.4=h753d276_0
- toposort=1.7=pyhd8ed1ab_0
- importlib-metadata=5.0.0=pyha770c72_1
- jinja2=3.1.2=pyhd8ed1ab_1
- exceptiongroup=1.0.1=pyhd8ed1ab_0
- veracitools=0.1.3=py_0
- google-cloud-core=2.3.2=pyhd8ed1ab_0
- coincbc=2.10.8=0_metapackage
- oauth2client=4.1.3=py_0
- ubiquerg=0.6.2=pyhd8ed1ab_0
- google-auth=2.14.0=pyh1a96a4e_0
- python-dateutil=2.8.2=pyhd8ed1ab_0
- liblapacke=3.9.0=16_linux64_openblas
- micromamba=1.0.0=1
- coin-or-clp=1.17.7=hc56784d_2
- aioeasywebdav=2.4.0=pyha770c72_0
- libgomp=12.2.0=h65d4601_19
- ply=3.11=py_1
- importlib_resources=5.10.0=pyhd8ed1ab_0
- libgfortran5=12.2.0=h337968e_19
- datrie=0.8.2=py310h5764c6d_6
- pip=22.3.1=pyhd8ed1ab_0
- packaging=21.3=pyhd8ed1ab_0
- dataclasses=0.8=pyhc8e2a94_3
- pycparser=2.21=pyhd8ed1ab_0
- configargparse=1.5.3=pyhd8ed1ab_0
- urllib3=1.26.11=pyhd8ed1ab_0
- colorama=0.4.6=pyhd8ed1ab_0
- yarl=1.8.1=py310h5764c6d_0
- psutil=5.9.4=py310h5764c6d_0
- plac=1.3.5=pyhd8ed1ab_0
- certifi=2022.9.24=pyhd8ed1ab_0
- markupsafe=2.1.1=py310h5764c6d_2
- toolz=0.12.0=pyhd8ed1ab_0
- cachetools=5.2.0=pyhd8ed1ab_0
- google-crc32c=1.1.2=py310he8fe98e_4
- amply=0.1.5=pyhd8ed1ab_0
- reretry=0.11.1=pyhd8ed1ab_0
- c-ares=1.18.1=h7f98852_0
- libsodium=1.0.18=h36c2ea0_1
- peppy=0.35.2=pyhd8ed1ab_0
- snakemake=7.18.1=hdfd78af_0
- pytz=2022.6=pyhd8ed1ab_0
- pytest=7.2.0=pyhd8ed1ab_2
- libcblas=3.9.0=16_linux64_openblas
- httplib2=0.21.0=pyhd8ed1ab_0
- yte=1.5.1=py310hff52083_1
- pyrsistent=0.19.2=py310h5764c6d_0
- libgrpc=1.49.1=h30feacc_1
- pulp=2.7.0=py310hff52083_0
- attrs=22.1.0=pyh71513ae_1
- pandas=1.5.1=py310h769672d_1
- multidict=6.0.2=py310h5764c6d_2
- connection_pool=0.0.3=pyhd3deb0d_0
- _libgcc_mutex=0.1=conda_forge
- google-api-core=2.10.2=pyhd8ed1ab_0
- smmap=3.0.5=pyh44b312d_0
- pygments=2.13.0=pyhd8ed1ab_0
- wheel=0.38.3=pyhd8ed1ab_0
- docutils=0.19=py310hff52083_1
- ftputil=5.0.4=pyhd8ed1ab_0
- conda=22.9.0=py310hff52083_2
- gitdb=4.0.9=pyhd8ed1ab_0
- aiohttp=3.8.3=py310h5764c6d_1
- libgfortran-ng=12.2.0=h69a702a_19
- stopit=1.1.2=py_0
- defusedxml=0.7.1=pyhd8ed1ab_0
- liblapack=3.9.0=16_linux64_openblas
- backports=1.0=py_2
- numpy=1.23.4=py310h53a5b5f_1
- iniconfig=1.1.1=pyh9f0ad1d_0
- snakemake-minimal=7.18.1=pyhdfd78af_0
- coin-or-cbc=2.10.8=h3786ebc_0
- tzdata=2022f=h191b570_0
- readline=8.1.2=h0f457ee_0
- frozenlist=1.3.3=py310h5764c6d_0
- filelock=3.8.0=pyhd8ed1ab_0
- ruamel_yaml=0.15.80=py310h5764c6d_1008
- pycosat=0.6.4=py310h5764c6d_1
- logmuse=0.2.6=pyh8c360ce_0
- boto3=1.26.5=pyhd8ed1ab_0
- pyparsing=3.0.9=pyhd8ed1ab_0
- backports.functools_lru_cache=1.6.4=pyhd8ed1ab_0
- pyu2f=0.1.5=pyhd8ed1ab_0
- protobuf=4.21.9=py310hd8f1fbe_0
- conda-package-handling=1.9.0=py310h5764c6d_1
- pysftp=0.2.9=py_1
- pkgutil-resolve-name=1.3.10=pyhd8ed1ab_0
- python-fastjsonschema=2.16.2=pyhd8ed1ab_0
- ld_impl_linux-64=2.39=hc81fddc_0
- ncurses=6.3=h27087fc_1
- googleapis-common-protos=1.56.4=py310hff52083_1
- google-cloud-storage=2.6.0=pyh1a96a4e_0
- slacker=0.14.0=py_0
- libffi=3.4.2=h7f98852_5
- _openmp_mutex=4.5=2_gnu
- libzlib=1.2.13=h166bdaf_4
- coin-or-utils=2.11.6=h202d8b1_2
- libabseil=20220623.0=cxx17_h48a1fff_5
- prettytable=3.4.1=pyhd8ed1ab_0
- cffi=1.15.1=py310h255011f_2
- typing-extensions=4.4.0=hd8ed1ab_0
- pyyaml=6.0=py310h5764c6d_5
- dropbox=11.35.0=pyhd8ed1ab_0
- python_abi=3.10=2_cp310
- nbformat=5.7.0=pyhd8ed1ab_0
- libnsl=2.0.0=h7f98852_0
- stone=3.3.1=pyhd8ed1ab_0
- libblas=3.9.0=16_linux64_openblas
- cryptography=38.0.3=py310h600f1e7_0
- google-auth-httplib2=0.1.0=pyhd8ed1ab_1
- tk=8.6.12=h27826a3_0
- libopenblas=0.3.21=pthreads_h78a6416_3
- pyasn1-modules=0.2.7=py_0
- jsonschema=4.17.0=pyhd8ed1ab_0
- coin-or-cgl=0.60.6=h6f57e76_2
- dpath=2.0.6=py310hff52083_2
- google-api-python-client=2.65.0=pyhd8ed1ab_0
- setuptools-scm=7.0.5=pyhd8ed1ab_1
- rsa=4.9=pyhd8ed1ab_0
- pyasn1=0.4.8=py_0
- wcwidth=0.2.5=pyh9f0ad1d_2
- tqdm=4.64.1=pyhd8ed1ab_0
- traitlets=5.5.0=pyhd8ed1ab_0
- wrapt=1.14.1=py310h5764c6d_1
- zipp=3.10.0=pyhd8ed1ab_0
- botocore=1.29.5=pyhd8ed1ab_0
- idna=3.4=pyhd8ed1ab_0
- google-resumable-media=2.4.0=pyhd8ed1ab_0
- bcrypt=3.2.2=py310h5764c6d_1
- attmap=0.13.2=pyhd8ed1ab_0
- requests=2.28.1=pyhd8ed1ab_1
- xz=5.2.6=h166bdaf_0
- grpcio=1.49.1=py310hc32fa93_1
- libprotobuf=3.21.9=h6239696_0
- gitpython=3.1.29=pyhd8ed1ab_0
- s3transfer=0.6.0=pyhd8ed1ab_0
- yaml=0.2.5=h7f98852_2
- ratelimiter=1.2.0=pyhd8ed1ab_1003
- bzip2=1.0.8=h7f98852_4
- ca-certificates=2022.9.24=ha878542_0
- uritemplate=4.1.1=pyhd8ed1ab_0
- future=0.18.2=pyhd8ed1ab_6
- jupyter_core=4.11.2=py310hff52083_0
- pluggy=1.0.0=pyhd8ed1ab_5
- brotlipy=0.7.0=py310h5764c6d_1005
- jmespath=1.0.1=pyhd8ed1ab_0
- setuptools=65.5.1=pyhd8ed1ab_0
- libcrc32c=1.1.2=h9c3ff4c_0
- libstdcxx-ng=12.2.0=h46fd767_19
- commonmark=0.9.1=py_0
- zlib=1.2.13=h166bdaf_4
- tabulate=0.9.0=pyhd8ed1ab_1
- re2=2022.06.01=h27087fc_0
- appdirs=1.4.4=pyh9f0ad1d_0
- aiosignal=1.3.1=pyhd8ed1ab_0
- paramiko=2.12.0=pyhd8ed1ab_0
- filechunkio=1.8=py_2
- charset-normalizer=2.1.1=pyhd8ed1ab_0
- python-irodsclient=1.1.5=pyhd8ed1ab_0
- rich=12.6.0=pyhd8ed1ab_0
- async-timeout=4.0.2=pyhd8ed1ab_0
- six=1.16.0=pyh6c4a22f_0
- coin-or-osi=0.108.7=h2720bb7_2
- pyopenssl=22.1.0=pyhd8ed1ab_0
- libuuid=2.32.1=h7f98852_1000
- pysocks=1.7.1=pyha2e5f31_6
- openssl=3.0.7=h166bdaf_0
- typing_extensions=4.4.0=pyha770c72_0
- python=3.10.6=ha86cf86_0_cpython
- pynacl=1.5.0=py310h5764c6d_2
- smart_open=6.2.0=pyha770c72_0
- tomli=2.0.1=pyhd8ed1ab_0
- libgcc-ng=12.2.0=h65d4601_19
13 changes: 13 additions & 0 deletions utils/process_otp_metadata.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,13 @@
import pandas as pd


def rename_date_of_birth(row: pd.Series):
"""Unify the 'BIRTH' and 'DATE_OF_BIRTH' columns into
a single 'DATE_OF_BIRTH' column.
"""
values = [row["BIRTH"], row["DATE_OF_BIRTH"]]
dates = [val for val in values if not pd.isna(val)]
if len(dates) == 1:
return dates[0]
elif len(dates) != 1:
return pd.NA
Loading