Numenos.ai Home Task

Project of our dataset

Molecular portraits of tumor mutational and micro-environmental sculpting by immune checkpoint blockade therapy

Paper

Tumor and Microenvironment Evolution during Immunotherapy with Nivolumab

The paper investigates the genomic and microenvironmental changes in melanoma tumors during anti-PD-1 immunotherapy with nivolumab. It aims to understand how immune checkpoint blockade influences tumor mutation load, clonal evolution, and immune cell dynamics. By analyzing genomic, transcriptomic, and T-cell receptor data, the study provides insights into the mechanisms of response and resistance to immunotherapy.

Data

Downloadalbe via SRA tools

Contains

68 pairs of RNA-Seq (before treatment with Nivolumab and after) for PBMCs (50M reads post QC)
Cyto Score (GSE91061_BMS038109Sample_Cytolytic_Score_20161026.txt) based on RNA expression levels

Preprocessing

Tools

Orchestration done with snakemake and job-specific dockers

ETL Flow

Download SRA from NCBI
Break SRA apart into .fastq
Run quality assesment (FastQC)
Optional: Run fastp for trimming (commented out this part because the files are already post QC and in a high quality - see SRR5088813 for example)
Run TRUST4 over the 2 fastq. Use hg38 and TRUST4's references

Usage

snakemake -s ETL/download_and_preprocess.smk

Requirements

docker installed
run all the build scripts in the dockers directry

Analysis

Sample Analysis

Testing the reportoires of individuals compared to their counterpart test (Pre / On Nivolumab) and their peers. (using the CDR3aa for this part)

Outcomes

A reportoire of an individual is fairy consistent even if taking Nivolumab (at least it's more similar than other individuals samples)
Apart from SRR5088829 - probably due to a significant low number of V(D)J aligned reads (and maybe SRR5088830? unsure)

Clonotype Diversity

After aggregating differnet variants for same genes for the V/J parts Showing interactions between V/J genes Showing top visible genes for each of the families

Outcomes

Some interactions are common between most samples
Huge flactuation in percentages between samples for the same genes
The V / J pair IGKV1-39 / IGKJ3 might have some significance for differentiating between Pre-medication and On-medication

Requirements

numpy, pandas, scipy, matplotlib, seaborn,

Architecture

analysis - Jupyetr notebooks for the analysis
code - everything from plotting, transformation of data and statistics
data - not in .github. where we store all the pre-and post processed data
dockers - dockers used in the ETL
documents - papers, technical documentations etc.
ETL - snakemake code

Challenges and Solutions

Technical

Getting the Data
TCGA is kept tightly under wraps, so it was surprising to find a study with so much freely downloadable information. It took quite a while to identify one with good phenotypes accompanied by PBMC RNA-Seq data.
Solution: I used the GEO access tool suggested in the exercise instead of TCGA.
Working with Snakemake on macOS
It took longer than expected due to some unsupported flags. I had worked with Snakemake a couple of years ago on Linux, so some adjustments were necessary.
Solution: Following guidance from ChatGPT, I reconfigured a couple of settings to accommodate the differences between Linux and macOS for Snakemake and Docker.

Scientific

New Field
I had experience with the protein side of immunoassays but not with antibodies themselves. The jargon was new, and the data required some formatting (e.g., aggregating all variants of the genes).
Solution: I read the TRUST paper, mapped the output according to the README and experimented with the data before composing the analysis.
Sample Similarity
Spearman and Pearson correlations did not provide meaningful results due to a zero-inflated distribution when comparing two samples. Jaccard removed much of the information by being binary.
Solution: I ended up using Bray-Curtis distance. Since the range was very small (with similar samples around ~0.97 BC distance and the maximum at 1.0), I normalized the distances to enhance visibility in the graph.
Hard to show significance

The gene frequencies are zero tailed (the mapping is sparse)
It's log-normal at nature (probably due to some cascading effect) when not missing alltogether.
The numbers are very small We can't use a lot of tests (e.g. even though chi-square works for small numbers - it needs a normal distribution) Solution: Use Wilcoxon test. Doesn't assume normal distribution and can operate with a small number of samples. That being said - the results were lackluster with 7 samples (Pre and On) and if we would have corrected for multiple testing - we would have been left with nothing.

Reliability of the output discussion

Run FastQC to validate that the reads are of a sufficient quality.
Use the sum of the frequencies to easily detect if all types (families) are accounted for
Check nubmer of assembled sequences (in the final.out)
Optional: Since it's RNA-Seq of PBMCs - we can check B-cells / T-cells ratio as it should be somewhat correlative.

Name		Name	Last commit message	Last commit date
Latest commit History 16 Commits
ETL		ETL
analysis		analysis
code		code
dockers		dockers
documents		documents
.gitignore		.gitignore
README.md		README.md
__init__.py		__init__.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Numenos.ai Home Task

Table of Contents

Project of our dataset

Paper

Data

Preprocessing

Tools

ETL Flow

Usage

Requirements

Analysis

Sample Analysis

Outcomes

Clonotype Diversity

Outcomes

Requirements

Architecture

Challenges and Solutions

Technical

Scientific

Reliability of the output discussion

About

Uh oh!

Releases

Packages

Languages

TalShor/Numenos_ex

Folders and files

Latest commit

History

Repository files navigation

Numenos.ai Home Task

Table of Contents

Project of our dataset

Paper

Data

Preprocessing

Tools

ETL Flow

Usage

Requirements

Analysis

Sample Analysis

Outcomes

Clonotype Diversity

Outcomes

Requirements

Architecture

Challenges and Solutions

Technical

Scientific

Reliability of the output discussion

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages