Skip to content

Snakemake workflow for comparative genomic variant analysis of HEK293 cell lines

Notifications You must be signed in to change notification settings

NBorthLab/HEK293_genomes

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

20 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Genomic Variant Comparison Workflow for HEK293 Cell Lines

This repository contains a semi-automated workflow using Snakemake for genomic variant comparison in HEK293 cell lines, starting from paired-end whole genome sequencing FASTQ files

Abstract

Human embryonic kidney cells (HEK293) are widely used in biopharmaceutical manufacturing, particularly for recombinant adeno-associated virus (rAAV) production. Despite their industrial relevance, a comprehensive understanding of their genomic stability remains limited. In this study, we systematically analyzed the genetic landscape of various HEK293 cell lines to evaluate their responses to different cultivation conditions and assess potential implications for rAAV production. Therefore, adherent HEK293 cells were adapted to suspension growth using various serum-free media formulations. Following successful adaptation, whole-genome sequencing was performed on both adapted and parental cell lines. The sequenced reads were then aligned to the human reference genome, enabling the assessment of genome stability, by evaluation of identified structural variants. Comparative analysis, including additional publicly available HEK293 sequences, revealed a conserved genetic core across all lines, regardless of cultivation history or phenotypic divergence. The distribution of structural variants and single nucleotide polymorphisms (SNPs) indicated a gradual accumulation of mutations over time in culture rather than abrupt shifts in response to environmental changes. Notably, the adenoviral genes integrated into the HEK293 genome remained highly conserved both with respect to copy number and integration site. These findings provide insight into the genomic evolution of HEK293 cells and offer a foundation for further multi-omics studies aimed at optimizing rAAV production performance.

Workflow Overview

  1. Quality Control

    • FastQC: Assess read quality.
    • GATK and BEDTools: Evaluate alignment quality.
    • MultiQC: Summarize quality metrics.
  2. Preprocessing

    • Trimmomatic: Remove adapter sequences using TruSeq3-PE-2 templates.
  3. Alignment

    • BWA-MEM: Align reads against customized reference genomes (hg38 and human adenovirus 5).
    • SAMtools: Sort raw alignments.
    • GATK MarkDuplicates: Deduplicate alignments.
  4. Variant Calling

    • GATK HaplotypeCaller: Identify small genomic variants.
    • Manta: Detect structural rearrangements.
    • SURVIVOR: Filter structural variants (≥ 300 bp).
  5. Annotation and Analysis

    • SnpEff: Functionally annotate variants using the hg38 reference.
    • CNVkit: Evaluate and visualize copy number alterations.
  6. Comparative Analysis

    • R: Custom scripts for comparative analysis of small and structural variants.
    • vcfR, VariantAnnotation, Biostrings: Evaluate variants.
    • UpSetR, ggplot2: Visualize results.

Usage

  1. Clone the repository

  2. Ensure all dependencies are installed. Required packages are provided in workflow.

  3. Adapt Snakemake if needed before execution

    • provide resources/sampleList.csv
    • adjust optional settings directly in workflow/Snakefile
    • data: directory for raw reads .fastq files
    • UpSetR: Visualization scripts can be used for 2 to 13 samples, adapt rule input accordingly

About

Snakemake workflow for comparative genomic variant analysis of HEK293 cell lines

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors