IEDB data for IMMREP'25

This repository contains the code used to produce the positive alpha-beta TCR-pMHC pairs visible on Kaggle.

Instructions

If you'd like to reproduce the analysis, install this repo:

git clone https://github.com/erichardson97/iedb_immrep25/
cd iedb_immrep25
pip install .
stitchrdl -s human
python -m iedb_immrep --export_file iedb_immrep/dat/export_2_21_25.parquet --output_dir iedb_immrep/dat/results

This reproduces the paired dataset. Like everyone else, we have many more unpaired beta chains. If you'd like those instead, add "--chains beta" to the above.

Input and output

As input, we selected all TCRs in the IEDB from Humans which show recognition of epitopes presented in the context of MHC Class I. We subset for non-modified peptides and assays which have a fully-specified MHC Class I alpha allele (this is the parquet file in iedb_immrep/dat/results).

Because our data has been manually curated from >300 papers with our earliest TCR reference from 1994, our curated TCR and BCR fields can contain values that are difficult to work with for bioinformaticians as they are author-reported, and may no longer reflect naming conventions. We are standardizing these values in the future so you'll just be able to use the Calculated fields; in this release, we make use of the package tidytcells to standardize our values. We also use Stitchr to produce full-length alpha and beta sequences from our data.

We provide two outputs:

A CSV file (immrep_IEDB.csv). This includes the fields: TRAV, TRAJ, TRBV, TRBJ, and each of the resultant CDRs and TRA/TRB sequence output by Stitchr (with leader+constant removed). In addition, we have the field "receptor_ids" which traces back to our unique receptor ID identifiers, "references" which maps back to our reference IRIs, and finally a field "just_10X". This field is "True" where a given TCR-pHLA pair has only been seen in 10X experiments, a filter which was used to exclude data in the training of NetTCR2.2 demonstrated to improve performance. This is with the exception of iTRAP-corrected data.
An AIRR-formatted TSV file; the additional fields are "epitope" and "mhc".

Other datasets

VDJdb did a special release for the occasion! https://github.com/antigenomics/vdjdb-db/releases/tag/pyvdjdb-2025-02-21. If you'd like to apply similar processing as we did here, you can run process_vdjdb.py.

Contact

Please contact us if you'd like to learn more about our data and tools at help@iedb.org.

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
iedb_immrep		iedb_immrep
README.md		README.md
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

IEDB data for IMMREP'25

Instructions

Input and output

Other datasets

Contact

About

Uh oh!

Releases

Packages

Languages

erichardson97/IEDB_IMMREP

Folders and files

Latest commit

History

Repository files navigation

IEDB data for IMMREP'25

Instructions

Input and output

Other datasets

Contact

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages