A comprehensive web-based application for visualizing and analyzing Peptide-Spectrum Matches (PSMs) from mass spectrometry proteomics experiments. Built with Streamlit, this interactive tool enables researchers to explore the quality and characteristics of peptide identifications through intuitive spectrum visualization.
- Upload & Parse: Support for standard mass spectrometry data formats (MGF spectra and mzTab identifications)
- Intelligent Mapping: Automatic matching of peptide identifications to their corresponding mass spectra using flexible reference matching
- Quality Assessment: Visual inspection of PSM quality through annotated spectrum plots
- Real-time Interaction: Web-based interface for immediate data exploration and analysis
- Fragment Ion Annotation: Automatic labeling of b-ion and y-ion fragments with theoretical masses
- Spectrum Preprocessing: Intelligent peak filtering, intensity normalization, and precursor peak removal
- Publication-Ready Plots: High-quality matplotlib-based visualizations suitable for research reports
- Mass Range Optimization: Automatic m/z range selection focusing on peptide fragment regions
- Drag-and-Drop Upload: Simple file upload for MGF and mzTab files
- Interactive Tables: Sortable, filterable PSM lists with key identification metrics
- Spectrum Selection: Click-to-view individual spectra with peptide sequence annotations
- No Installation Required: Runs in any modern web browser
- Python 3.7 or higher
- pip package manager
- Web browser (Chrome, Firefox, Safari, or Edge)
-
Clone the repository:
git clone https://github.com/erayfirat/PSMViewer1.git cd PSMViewer1 -
Create a virtual environment:
# On macOS/Linux python3 -m venv venv source venv/bin/activate # On Windows python -m venv venv venv\Scripts\activate
-
Install dependencies:
pip install -r requirements.txt
| Package | Version | Purpose |
|---|---|---|
| streamlit | β₯1.28.0 | Web application framework |
| pyteomics | β₯4.6.5 | Mass spectrometry file parsing |
| spectrum-utils | β₯0.4.4 | Advanced spectrum processing & annotation |
| matplotlib | β₯3.8.0 | Publication-quality plotting |
| pandas | β₯2.0.0 | Data manipulation & analysis |
| numpy | β₯1.25.0 | Numerical computations |
| pytest | β₯7.4.0 | Unit testing framework |
The PSM Viewer is organized into modular components for maintainability and extensibility:
data_loading.py: Handles loading and parsing of spectral data files (MGF) and identification results (mzTab) with comprehensive error handlingprocessing.py: Contains functions for extracting spectrum indices and mapping PSMs to spectra using optimized vectorized operationsvisualization.py: Generates annotated spectrum plots with fragment ion annotations using hardcoded spectrum processing parametersapp.py: Main entry point that orchestrates the Streamlit web application
The application uses hardcoded default parameters for optimal spectrum visualization:
- Charge state: 2+ (precursor ion charge)
- Fragment tolerance: 10 ppm for ion matching
- Mass range: 100-1400 m/z for peptide fragments
- Peak filtering: Retain top 50 most intense peaks, minimum 5% relative intensity
- Ion annotation: b, y, and a ions with root-scale intensity normalization
-
Launch the application:
streamlit run app.py
-
Open your browser:
- Navigate to
http://localhost:8501 - The PSM Viewer interface will load automatically
- Navigate to
graph TD
A[Upload MGF File] --> B[Upload mzTab File]
B --> C[Automatic PSM-Spectrum Mapping]
C --> D[View PSM Mapping Table]
D --> E[Select Individual PSM]
E --> F[Visualize Annotated Spectrum]
F --> G[Assess PSM Quality]
- MGF File: Contains experimental mass spectra (collision-induced dissociation fragmentation patterns)
- mzTab File: Contains peptide search results from database search engines (Mascot, MaxQuant, Comet, etc.)
- Automatic Parsing: Both files are parsed and loaded into structured data formats
- Smart Matching: PSMs are mapped to spectra using:
- Direct spectrum title matching
- Numeric index extraction from spectrum references
- Support for various reference formats (
index=123,scan=456,spectrum:789)
- Summary Statistics: Overview of loaded spectra, PSMs, and successful matches
- Interactive Table: Browse PSMs with filtering and sorting capabilities
- Spectrum Plots: Annotated mass spectra showing:
- Experimental peaks (raw data)
- Theoretical fragment ions (b-ions: N-terminal fragments, y-ions: C-terminal fragments)
- Precursor ion m/z (parent peptide mass)
| Column | Description |
|---|---|
| psm_index | Original PSM row number |
| sequence | Identified peptide amino acid sequence |
| matched_title | Associated spectrum identifier |
| spectra_ref | Original spectrum reference from mzTab |
- X-axis: Mass-to-charge ratio (m/z) in Daltons
- Y-axis: Relative intensity (normalized)
- Red annotations: Detected b-ion fragments
- Blue annotations: Detected y-ion fragments
- Mass tolerance: 10 ppm for fragment matching
BEGIN IONS
TITLE=Sample_001.123.123.2
PEPMASS=500.256 12345.6
CHARGE=2+
123.456 789.012
234.567 456.789
...
END IONS
- Mass spectra containing m/z values and intensities
- Precursor information (parent ion mass and charge)
- Spectrum metadata (titles, scan numbers, etc.)
Tab-separated proteomics results format supporting:
- PSM section: Peptide-spectrum matches with:
- Peptide sequences
- Spectrum references
- Search engine scores
- Protein assignments
- Modification information
The application handles various spectrum referencing conventions:
index=42(numeric index)scan=123(scan number)spectrum:789(colon-separated)42(plain numeric)
The PSM Viewer includes comprehensive unit and integration tests to ensure reliability and correctness of the data processing pipeline.
tests/
βββ __init__.py # Test module initialization
βββ conftest.py # Shared pytest fixtures and configuration
βββ test_load_mgf.py # Unit tests for MGF file loading
βββ test_load_mztab.py # Unit tests for mzTab file loading
βββ test_extract_index_from_spectra_ref.py # Unit tests for spectrum reference parsing
βββ test_map_psms_to_spectra.py # Unit tests for PSM-spectrum mapping
βββ test_integration.py # Integration tests for full pipeline
The test suite covers:
- Unit Tests (28 tests): Individual function testing for core components
- Integration Tests (9 tests): End-to-end pipeline validation
- Edge Cases: Malformed input handling, missing data, error conditions
- Data Formats: Various spectrum reference formats and file structure variations
-
Install test dependencies (included in
requirements.txt):pip install -r requirements.txt
-
Run all tests:
pytest tests/
-
Run with verbose output:
pytest tests/ -v
-
Run specific test file:
pytest tests/test_load_mgf.py
-
Run tests with coverage:
pytest tests/ --cov=app --cov-report=html
- β Valid MGF parsing with complete spectrum data
- β Handling missing spectrum titles or PEPMASS fields
- β Empty spectrum processing
- β Multiple spectra in single file
- β Error handling for malformed or invalid data
- β Standard mzTab PSM section parsing
- β Modified peptide sequences
- β Multiple PSM entries
- β Required PSM_ID column validation
- β
index=123format extraction - β
scan=456format extraction - β
:789suffix format - β Plain numeric references
- β Edge cases and error conditions
- β Direct title-based matching
- β Index-based matching with various formats
- β Title precedence over index
- β No-match scenarios
- β Mixed matching strategies
- β Full pipeline: MGF β mzTab β mapping β visualization
- β Edge cases: empty files, mismatched references
- β Error propagation and handling
Tests use sample data files in the data/ directory:
sample_preprocessed_spectra.mgf: Real mass spectrometry datacasanovo_20251029091517.mztab: Peptide identification results
- Cause: Spectrum references in mzTab don't match MGF titles/indices
- Solution: Check reference format consistency between files
- Cause: Missing precursor information or malformed data
- Solution: Verify MGF file contains proper PEPMASS fields
- Cause: Missing dependencies or Python version incompatibility
- Solution: Install from
requirements.txtand ensure Python 3.7+
- Cause: Very large proteomics datasets
- Solution: Process data in smaller batches or increase system memory
- File size: Optimized for datasets up to 10,000 spectra/PSMs
- Browser: Use Chrome/Firefox for best performance
- Network: Local deployment recommended for large files
This project is licensed under the MIT License - see the LICENSE file for details.
- Mass Spectrometry Basics: Understanding peptide fragmentation patterns
- Proteomics Standards: HUPO mzTab format specifications
- PSM Validation: Best practices for visual spectrum inspection
For questions, issues, or contributions, please open an issue on GitHub.