Skip to content

CollinJ0/grp2_paper

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

The Great Repertoire Project v2

This repository contains code and data used in our study of the temporal dynamics of the circulating human antibody repertoire. Briefly, we investigated the dynamics of the human antibody repertoire over time using high-throughput sequencing of antibody transcripts in the peripheral blood. Our study uncovers a profound and previously underappreciated level of repertoire drift of the naïve B cell repertoire within individuals. Despite stable overall repertoire size and diversity, fine-level composition undergoes nearly complete turnover after four years. We observe a delicate interplay between the continuous replacement of naïve B cells and the imprint of immunological exposures, revealing a nuanced model of overall repertoire development. Additionally, a notable feature is the identification of persistent public clonotypes suggesting potentially convergent antibody responses. These findings deepen our understanding of immune system dynamics and offer important insights toward the optimization of vaccine and immunotherapy strategies.

Code

The code used in this project is assembled into a series of Juypter notecooks. There are two sets of notebooks, those containing code used for DATA PROCESSING and those containing code used to MAKE FIGURES. GitHub will render each of the notebooks, but the code cannot be executed from within GitHub. If you'd like to actually run the code contained in the notebooks, you must clone the repository.

NOTE: Whenever possible, the intermediate datasets required to run the code are included in this repository, however, many intermediate datasets are too large to be included. In such cases, links to the required datasets are provided in the appropriate notebook.

Datasets

We have generated several large datasets, in two primary groups: antibody sequences from two healthy adult subjects in 2016, and antibody sequences from the same two healthy adult subjects after four years.

Antibody sequencing data

Raw and processed datasets from each subject can be downloaded using the following links. Some of these datasets are quite large.

For each timepoint, there are a total of 18 samples: 3 technical replicates of each of 6 biological replicates. Biological replicates refer to different aliquots of peripheral blood monomuclear cells (PBMCs), from which total RNA was separately isolated and processed. Thus, sequences or clonotypes found in multiple biological replicates are assumed to have independently occurred in different cells. Technical relicates refer to independent library preparations using the same aliquot of PBMC-derived RNA. In each of the above datasets, samples 1-6 are biological replicates. Samples 7-12 and 13-18 are technical replicates of samples 1-6.

Requirements

  • Python 3.3+ (although Python 2.7 may work for many or most notebooks, this has not been tested)
  • Jupyter Notebook

Additionally, each notebook may require additional third-party Python packages. Any notebook-specific requirements, as well as instructions for package installation with pip, are provided in each notebook.

If you're new to Python, a great way to get started is to install the Anaconda Python distribution, which includes pip as well as a ton of useful scientific Python packages.

About

Code and data used in The Great Repertoire Project v2

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published