Skip to content

erayfirat/PSMViewer

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

21 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

PSM Viewer: Peptide-Spectrum Match Visualization Tool

Python Version Streamlit Version License

A comprehensive web-based application for visualizing and analyzing Peptide-Spectrum Matches (PSMs) from mass spectrometry proteomics experiments. Built with Streamlit, this interactive tool enables researchers to explore the quality and characteristics of peptide identifications through intuitive spectrum visualization.

🌟 Key Features

πŸ“Š Interactive PSM Visualization

  • Upload & Parse: Support for standard mass spectrometry data formats (MGF spectra and mzTab identifications)
  • Intelligent Mapping: Automatic matching of peptide identifications to their corresponding mass spectra using flexible reference matching
  • Quality Assessment: Visual inspection of PSM quality through annotated spectrum plots
  • Real-time Interaction: Web-based interface for immediate data exploration and analysis

πŸ”¬ Scientific Visualization

  • Fragment Ion Annotation: Automatic labeling of b-ion and y-ion fragments with theoretical masses
  • Spectrum Preprocessing: Intelligent peak filtering, intensity normalization, and precursor peak removal
  • Publication-Ready Plots: High-quality matplotlib-based visualizations suitable for research reports
  • Mass Range Optimization: Automatic m/z range selection focusing on peptide fragment regions

πŸ–₯️ User-Friendly Interface

  • Drag-and-Drop Upload: Simple file upload for MGF and mzTab files
  • Interactive Tables: Sortable, filterable PSM lists with key identification metrics
  • Spectrum Selection: Click-to-view individual spectra with peptide sequence annotations
  • No Installation Required: Runs in any modern web browser

πŸš€ Installation

Prerequisites

  • Python 3.7 or higher
  • pip package manager
  • Web browser (Chrome, Firefox, Safari, or Edge)

Quick Setup

  1. Clone the repository:

    git clone https://github.com/erayfirat/PSMViewer1.git
    cd PSMViewer1
  2. Create a virtual environment:

    # On macOS/Linux
    python3 -m venv venv
    source venv/bin/activate
    
    # On Windows
    python -m venv venv
    venv\Scripts\activate
  3. Install dependencies:

    pip install -r requirements.txt

πŸ”§ Required Dependencies

Package Version Purpose
streamlit β‰₯1.28.0 Web application framework
pyteomics β‰₯4.6.5 Mass spectrometry file parsing
spectrum-utils β‰₯0.4.4 Advanced spectrum processing & annotation
matplotlib β‰₯3.8.0 Publication-quality plotting
pandas β‰₯2.0.0 Data manipulation & analysis
numpy β‰₯1.25.0 Numerical computations
pytest β‰₯7.4.0 Unit testing framework

πŸ—οΈ Project Structure

The PSM Viewer is organized into modular components for maintainability and extensibility:

Core Modules

  • data_loading.py: Handles loading and parsing of spectral data files (MGF) and identification results (mzTab) with comprehensive error handling
  • processing.py: Contains functions for extracting spectrum indices and mapping PSMs to spectra using optimized vectorized operations
  • visualization.py: Generates annotated spectrum plots with fragment ion annotations using hardcoded spectrum processing parameters
  • app.py: Main entry point that orchestrates the Streamlit web application

Configuration

The application uses hardcoded default parameters for optimal spectrum visualization:

  • Charge state: 2+ (precursor ion charge)
  • Fragment tolerance: 10 ppm for ion matching
  • Mass range: 100-1400 m/z for peptide fragments
  • Peak filtering: Retain top 50 most intense peaks, minimum 5% relative intensity
  • Ion annotation: b, y, and a ions with root-scale intensity normalization

πŸ“– Usage Guide

Getting Started

  1. Launch the application:

    streamlit run app.py
  2. Open your browser:

    • Navigate to http://localhost:8501
    • The PSM Viewer interface will load automatically

Workflow Overview

graph TD
    A[Upload MGF File] --> B[Upload mzTab File]
    B --> C[Automatic PSM-Spectrum Mapping]
    C --> D[View PSM Mapping Table]
    D --> E[Select Individual PSM]
    E --> F[Visualize Annotated Spectrum]
    F --> G[Assess PSM Quality]
Loading

Step-by-Step Usage

1. File Upload

  • MGF File: Contains experimental mass spectra (collision-induced dissociation fragmentation patterns)
  • mzTab File: Contains peptide search results from database search engines (Mascot, MaxQuant, Comet, etc.)

2. Data Processing

  • Automatic Parsing: Both files are parsed and loaded into structured data formats
  • Smart Matching: PSMs are mapped to spectra using:
    • Direct spectrum title matching
    • Numeric index extraction from spectrum references
    • Support for various reference formats (index=123, scan=456, spectrum:789)

3. Results Visualization

  • Summary Statistics: Overview of loaded spectra, PSMs, and successful matches
  • Interactive Table: Browse PSMs with filtering and sorting capabilities
  • Spectrum Plots: Annotated mass spectra showing:
    • Experimental peaks (raw data)
    • Theoretical fragment ions (b-ions: N-terminal fragments, y-ions: C-terminal fragments)
    • Precursor ion m/z (parent peptide mass)

Understanding the Output

PSM Mapping Table

Column Description
psm_index Original PSM row number
sequence Identified peptide amino acid sequence
matched_title Associated spectrum identifier
spectra_ref Original spectrum reference from mzTab

Spectrum Visualization

  • X-axis: Mass-to-charge ratio (m/z) in Daltons
  • Y-axis: Relative intensity (normalized)
  • Red annotations: Detected b-ion fragments
  • Blue annotations: Detected y-ion fragments
  • Mass tolerance: 10 ppm for fragment matching

πŸ“‹ Supported Data Formats

πŸ”¬ MGF (Mascot Generic Format)

BEGIN IONS
TITLE=Sample_001.123.123.2
PEPMASS=500.256 12345.6
CHARGE=2+
123.456 789.012
234.567 456.789
...
END IONS
  • Mass spectra containing m/z values and intensities
  • Precursor information (parent ion mass and charge)
  • Spectrum metadata (titles, scan numbers, etc.)

πŸ“„ mzTab Format

Tab-separated proteomics results format supporting:

  • PSM section: Peptide-spectrum matches with:
    • Peptide sequences
    • Spectrum references
    • Search engine scores
    • Protein assignments
    • Modification information

πŸ” Spectrum Reference Formats

The application handles various spectrum referencing conventions:

  • index=42 (numeric index)
  • scan=123 (scan number)
  • spectrum:789 (colon-separated)
  • 42 (plain numeric)

πŸ§ͺ Testing

The PSM Viewer includes comprehensive unit and integration tests to ensure reliability and correctness of the data processing pipeline.

Test Structure

tests/
β”œβ”€β”€ __init__.py                 # Test module initialization
β”œβ”€β”€ conftest.py                 # Shared pytest fixtures and configuration
β”œβ”€β”€ test_load_mgf.py           # Unit tests for MGF file loading
β”œβ”€β”€ test_load_mztab.py         # Unit tests for mzTab file loading
β”œβ”€β”€ test_extract_index_from_spectra_ref.py  # Unit tests for spectrum reference parsing
β”œβ”€β”€ test_map_psms_to_spectra.py  # Unit tests for PSM-spectrum mapping
└── test_integration.py        # Integration tests for full pipeline

Test Coverage

The test suite covers:

  • Unit Tests (28 tests): Individual function testing for core components
  • Integration Tests (9 tests): End-to-end pipeline validation
  • Edge Cases: Malformed input handling, missing data, error conditions
  • Data Formats: Various spectrum reference formats and file structure variations

Running Tests

  1. Install test dependencies (included in requirements.txt):

    pip install -r requirements.txt
  2. Run all tests:

    pytest tests/
  3. Run with verbose output:

    pytest tests/ -v
  4. Run specific test file:

    pytest tests/test_load_mgf.py
  5. Run tests with coverage:

    pytest tests/ --cov=app --cov-report=html

Key Test Areas

MGF File Loading (test_load_mgf.py)

  • βœ… Valid MGF parsing with complete spectrum data
  • βœ… Handling missing spectrum titles or PEPMASS fields
  • βœ… Empty spectrum processing
  • βœ… Multiple spectra in single file
  • βœ… Error handling for malformed or invalid data

mzTab File Loading (test_load_mztab.py)

  • βœ… Standard mzTab PSM section parsing
  • βœ… Modified peptide sequences
  • βœ… Multiple PSM entries
  • βœ… Required PSM_ID column validation

Spectrum Reference Parsing (test_extract_index_from_spectra_ref.py)

  • βœ… index=123 format extraction
  • βœ… scan=456 format extraction
  • βœ… :789 suffix format
  • βœ… Plain numeric references
  • βœ… Edge cases and error conditions

PSM-Spectrum Mapping (test_map_psms_to_spectra.py)

  • βœ… Direct title-based matching
  • βœ… Index-based matching with various formats
  • βœ… Title precedence over index
  • βœ… No-match scenarios
  • βœ… Mixed matching strategies

Integration Tests (test_integration.py)

  • βœ… Full pipeline: MGF β†’ mzTab β†’ mapping β†’ visualization
  • βœ… Edge cases: empty files, mismatched references
  • βœ… Error propagation and handling

Test Data

Tests use sample data files in the data/ directory:

  • sample_preprocessed_spectra.mgf: Real mass spectrometry data
  • casanovo_20251029091517.mztab: Peptide identification results

❓ Troubleshooting

Common Issues

No spectra match found

  • Cause: Spectrum references in mzTab don't match MGF titles/indices
  • Solution: Check reference format consistency between files

Empty spectrum plots

  • Cause: Missing precursor information or malformed data
  • Solution: Verify MGF file contains proper PEPMASS fields

Import errors

  • Cause: Missing dependencies or Python version incompatibility
  • Solution: Install from requirements.txt and ensure Python 3.7+

Memory issues with large files

  • Cause: Very large proteomics datasets
  • Solution: Process data in smaller batches or increase system memory

Performance Tips

  • File size: Optimized for datasets up to 10,000 spectra/PSMs
  • Browser: Use Chrome/Firefox for best performance
  • Network: Local deployment recommended for large files

πŸ“„ License

This project is licensed under the MIT License - see the LICENSE file for details.

πŸ“š References

Scientific Background

  • Mass Spectrometry Basics: Understanding peptide fragmentation patterns
  • Proteomics Standards: HUPO mzTab format specifications
  • PSM Validation: Best practices for visual spectrum inspection

Technical Resources


For questions, issues, or contributions, please open an issue on GitHub.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 2

  •  
  •  

Languages