Bioinformatics integrates biology, statistics, and computer science to develop and apply theory, methods, and tools for the collection, storage, and analysis of biological and related data. Some key application areas in bioinformatics include:
- 𧬠Genomic and molecular analysis
- π Drug discovery and development
- π©Ί Medical diagnosis and treatment
- πΎ Agricultural biotechnology
- π Environmental monitoring
The National Cancer Institute (NCI) uses bioinformatics extensively in its research efforts to combat cancer, including research on the "origin, evolution, progression, and treatment of cancer".
This course was designed to teach the basic skills needed for bioinformatics, including working on the Unix command line. This course primarily focuses on RNA-Seq analysis. All steps of the RNA-Seq workflow, from raw data to differential expression and gene ontology analysis, are covered. However, many of the skills learned are foundational to most bioinformatics analyses and can be applied to other types of next generation sequencing experiments.
Here are a few compelling reasons to explore the world of bioinformatics:
- π Analyze your data: Empower yourself to delve into your own biological data, gaining valuable insights.
- π¬ Enhancing Scientific Skills: Broaden your scientific knowledge and skills by mastering bioinformatics tools and techniques. By understanding the principles involved with data collection and analysis, you'll be better equipped to design robust experiments and interpret their results effectively.
- πΌ Career Opportunities: Open doors to exciting career paths in the rapidly growing field of bioinformatics.
- π€ Understand the Data Landscape: Gain a deeper appreciation for how others analyze biological data, fostering collaboration and critical thinking.
Lessons focus on developing command line skills, getting started and working on Biowulf (the NIH HPC cluster), and downloading data from NCBI.
- Lesson 1 - What is Biowulf?
- Lesson 2 - Navigating file systems with Unix
- Lesson 3 - Useful Unix
- Lesson 4 - Working on Biowulf
- Lesson 5 - Downloading data from the SRA
Lessons focus on RNA-Seq analysis including experimental design and best practices, quality control, trimming, alignment based methods, feature counts, differential expression analysis, and biological interpretation.
- Lesson 6 - Introduction to RNA-Seq
- Lesson 7 - Introduction to NGS Data and Quality Control
- Lesson 8 - Cleaning and Preparing NGS Data for Downstream Analysis
- Lesson 9 - Aligning NGS Data to Genome
- Lesson 10 - Quantifying Gene Expression from Bulk RNA Sequencing Data
- Lesson 11 - Visualizing Genomic Data: Preparing Files
- Lesson 12 - Visualizing Genomic Data with the IGV
- Lesson 13 - Differential Expression Analysis for Bulk RNA Sequencing: QC
- Lesson 14 - Differential Expression Analysis for Bulk RNA Sequencing: The Actual Analysis
Lessons focus on gene ontology and pathway analysis.
- Lesson 15 - Introduction to gene ontology and pathway analysis
- Lesson 16 - Functional enrichment with DAVID
- Lesson 17 - Pathway Analysis with Reactome
Who can take this course? There are no prerequisites to take this course. This course is open to NCI-CCR researchers interested in learning bioinformatics skills, especially those relevant to analyzing bulk RNA sequencing data.
How will we work through lesson content? For the hands-on sessions, participants will use Biowulf student accounts. To sign up for a student account, click here. Student accounts are only available to course registrants.
Class Usage:
- Class documents are available at https://bioinformatics.ccr.cancer.gov/docs/bioinformatics-for-beginners-2025/
- Lesson content and practice questions can be found in the submodule folders.
Below are the links for the class data in case participants would like to practice outside of and after this course series. There is no need to download these for this course as the instructors have made them available on Biowulf.
You can find compressed Module 1 data here. Download the data and unzip.
unzip module_1.zipAll Module 2 data were obtained from the Griffith lab RNA sequencing tutorial and renamed for this course series.
- Reference Genome: Download Here
- Annotation: Download Here
- RNAseq Data: See instructions here for downloading the HBR-UHR and hcc1395 data.
https://bioinformatics.ccr.cancer.gov/docs/bioinformatics-for-beginners-2025/
BIOF-102
βββ Module 1 - Unix and Biowulf
β βββ Lesson 1 - What is Biowulf
β βββ Lesson 2 - Navigating file systems with Unix
β βββ Lesson 3 - Useful Unix
β βββ Lesson 4 - Working on Biowulf
β βββ Lesson 5 - Downloading data from the SRA
βββ Module 2 - RNA-Seq Analysis
β βββ Lesson 6 - Introduction to RNA-Seq
β βββ Lesson 7 - Introduction to Next Generation Sequencing (NGS) Data and Quality Control
β βββ Lesson 8 - Cleaning and Preparing Next Generation Sequencing (NGS) Data for Downstream Analysis
β βββ Lesson 9 - Aligning Next Generation Sequencing (NGS) Data to Genome
β βββ Lesson 10 - Quantifying Gene Expression from Bulk RNA Sequencing Data
β βββ Lesson 11 - Visualizing Genomic Data - Preparing Files
β βββ Lesson 12 - Visualizing Genomic Data with the Integrative Genomics Viewer
β βββ Lesson 13 - Differential Expression Analysis for Bulk RNA Sequencing - QC
β βββ Lesson 14 - Differential Expression Analysis for Bulk RNA Sequencing - The Actual Analysis
βββ Module 3 - Pathway Analysis
βββ Lesson 15 - Introduction to gene ontology and pathway analysis
βββ Lesson 16 - Functional enrichment with DAVID
βββ Lesson 17 - Pathway Analysis with Reactome
Flexycode
- π GitHub: @flexycode
Thanks for visiting! β€οΈ
