🧬 Designing Better Cancer Vaccines — UCL-CCC Hackathon 2025 🏆

This repository contains our project developed for the UCL Cancer Collaborative Centre Hackathon 2025, where our team designed an end-to-end computational pipeline to improve personalised cancer vaccine development. Our aim is to identify high-value neoantigen targets that a patient's immune system has not already failed to recognise.

🏆 Result: Winner — UCL Cancer Collaborative Centre Hackathon 2025

🚀 Project Overview

Cancer vaccines provide a powerful therapeutic avenue, but high non-response rates remain a key challenge. The main bottleneck is the identification and prioritisation of effective neoantigen targets from an enormous search space (≈10¹⁷ peptides). Our solution integrates tumour genomics, patient HLA typing, and TCR repertoire data to generate a ranked list of optimal vaccine candidates.

Key Innovation

We exclude peptides already recognised by exhausted or ineffective T-cell responses, increasing the likelihood of inducing a strong and durable vaccine response.

🧠 High-Level Pipeline

Neoantigen Identification
Extract mutated peptides (9-mers) present in cancer cells but not in normal tissue.
MHC Binding Prediction
Predict HLA-specific binding affinities to determine surface-presentable peptides.
TCR Binding Prediction
Identify and remove peptides already targeted unsuccessfully by the patient's TCRs.
Ranking & Output
Score the remaining peptides using immunogenicity, presentation likelihood, clonality, conservation, and safety metrics.

🧬 Data Requirements

Training Data

Cancer genomes with validated neoantigens
HLA–peptide binding datasets (IEDB)
TCR–peptide binding datasets

Patient-Specific Inputs

Tumour genome sequencing
Patient HLA genotype
TCR repertoire sequencing

🧩 Model Architecture

Our architecture incorporates:

Neoantigen identification module
Transformer-based MHC binding prediction
Structural TCR–peptide interaction modelling (AlphaFold-based)

📊 Ranking Metrics

MHC binding affinity
TCR engagement strength
Surface presentation probability
Peptide abundance
Conservation across cancer subclones
Phylogenetic clonality
Cross-reactivity and safety assessment

📁 Repository Structure

.
├── Human_AF3_inputs/          # AlphaFold 3 JSON input files (6 TCR-pMHC jobs)
├── hack-1.ipynb               # Main pipeline notebook
├── human_tcr_dataset.xlsx     # Raw IEDB export
├── updated_reduced_data.xlsx  # Filtered IEDB data
├── new_reduced_data_TCRonly.csv            # Cleaned TCR-only dataset
├── Processed_Human_TCR_MHC_Dataset.csv    # Enriched dataset with MHC sequences
├── requirements.txt
└── README.md

🗄️ Datasets

Data Sources

Dataset	Source	Format	Public?
TCR–peptide–MHC binding data	IEDB	Excel export	✅ Yes
MHC heavy chain sequences (HLA allotypes)	UniProt REST API	FASTA	✅ Yes
Beta-2-microglobulin sequence (P61769)	UniProt REST API	FASTA	✅ Yes
AlphaFold 3 input files	Generated by pipeline	JSON	—

Dataset Details

human_tcr_dataset.xlsx / updated_reduced_data.xlsx

Source: IEDB query export
Columns: peptide, hla, tcr_alpha, tcr_beta
~70 raw rows filtered to 6 complete entries (both TCR chains required)

new_reduced_data_TCRonly.csv

Cleaned/filtered version of the IEDB export
6 rows with complete TCR alpha + beta sequences

Processed_Human_TCR_MHC_Dataset.csv

Enriched dataset combining IEDB data with UniProt-fetched sequences
Columns: peptide, hla, tcr_alpha, tcr_beta, mhc_heavy_chain, beta_2_microglobulin
6 complete TCR-pMHC entries ready for structure prediction

Human_AF3_inputs/

6 AlphaFold Server JSON files (af3_job_0.json – af3_job_5.json)
Each file encodes one TCR-pMHC complex with 5 protein chains: TCR-α, TCR-β, MHC heavy chain, β2-microglobulin, peptide (9-mer)

Peptides & HLA Allotypes in Current Dataset

Peptide	HLA Allotype
IMDQVPFSV	HLA-A*02:01
TRLALIAPK	HLA-B*27:05
LRVMMLAPF	HLA-B*27:05

HLA → UniProt ID Mapping (used for MHC sequence retrieval)

HLA Allotype	UniProt ID
HLA-A*02:01	P01892
HLA-A*02:05	P30512
HLA-B*27:05	P03989
HLA-B*27:09	P30480
HLA-B*08:01	P01889
HLA-E*01:03	P30511

🔮 Scaling This Further

To extend the pipeline beyond the current 6-entry proof of concept:

More TCR-pMHC data: Relax IEDB query filters (e.g. allow single-chain TCRs, broader HLA coverage)
Neoantigen prediction: Tools like pVACseq, NetMHCpan, or MHCflurry applied to somatic mutation data
Tumour mutation data: TCGA (GDC portal) or ICGC — somatic mutation calls in MAF/VCF format
HLA population frequencies: Allele Frequency Net Database to prioritise broadly immunogenic alleles
Structural validation: PDB templates of TCR-pMHC complexes to benchmark AF3 outputs
Immunogenicity scoring: NetTCR, ERGO, or ImRex for TCR-antigen binding prediction

🧑‍🔬 Team TC-AWARE

Matthew Cowley
Zhen Wei Yap
Mohammad Alawwami
Zarif Shafiei
Linh Hoang
Julia Sala-Bayo
Graham Bonomo-Jackson
Gleb Gmyzov
Nick Keatley

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

🧬 Designing Better Cancer Vaccines — UCL-CCC Hackathon 2025 🏆

🚀 Project Overview

Key Innovation

🧠 High-Level Pipeline

🧬 Data Requirements

Training Data

Patient-Specific Inputs

🧩 Model Architecture

📊 Ranking Metrics

📁 Repository Structure

🗄️ Datasets

Data Sources

Dataset Details

Peptides & HLA Allotypes in Current Dataset

HLA → UniProt ID Mapping (used for MHC sequence retrieval)

🔮 Scaling This Further

🧑‍🔬 Team TC-AWARE

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 20 Commits
Human_AF3_inputs		Human_AF3_inputs
.DS_Store		.DS_Store
.gitattributes		.gitattributes
.gitignore		.gitignore
Cancer Vaccine UCL-CCC-Hackathon-2025.pdf		Cancer Vaccine UCL-CCC-Hackathon-2025.pdf
NEXT_STEPS.md		NEXT_STEPS.md
Processed_Human_TCR_MHC_Dataset.csv		Processed_Human_TCR_MHC_Dataset.csv
README.md		README.md
hack-1.ipynb		hack-1.ipynb
human_tcr_dataset.xlsx		human_tcr_dataset.xlsx
new_reduced_data_TCRonly.csv		new_reduced_data_TCRonly.csv
requirements.txt		requirements.txt
updated_reduced_data.xlsx		updated_reduced_data.xlsx

Folders and files

Latest commit

History

Repository files navigation

🧬 Designing Better Cancer Vaccines — UCL-CCC Hackathon 2025 🏆

🚀 Project Overview

Key Innovation

🧠 High-Level Pipeline

🧬 Data Requirements

Training Data

Patient-Specific Inputs

🧩 Model Architecture

📊 Ranking Metrics

📁 Repository Structure

🗄️ Datasets

Data Sources

Dataset Details

Peptides & HLA Allotypes in Current Dataset

HLA → UniProt ID Mapping (used for MHC sequence retrieval)

🔮 Scaling This Further

🧑‍🔬 Team TC-AWARE

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages