The POs Database Tools project provides a modular framework for creating and managing the POs Database - an adaptable system built to ingest, normalize, and organize tertiary analysis data from diverse sources.
Its core functionality processes raw patient genomic data, extracts variant and clinical metadata, and enriches records through external APIs, ultimately building a comprehensive database to support tertiary analysis.
A lightweight web application complements the backend, offering an interactive interface for querying and exploring the data.
For more details, please refer to the Documentation.ipynb file.
- Python 3.12.3
- Docker and Docker Compose (for containerized setup)
- MySQL/MariaDB (if running without Docker)
Download Docker Desktop for Mac or Windows. Docker Compose will be automatically installed. On Linux, make sure you have the latest version of Compose.
git clone https://github.com/im175pinheiro/POsDBtools.git
cd POsDBtoolscp .env.example .env
cp pos_database_tools/credentials/db_config.yaml.example pos_database_tools/credentials/db_config.yaml
cp pos_database_tools/credentials/ncbi_credentials.yaml.example pos_database_tools/credentials/ncbi_credentials.yamlEdit with your own DB credentials if desired. Add NCBI API key if you have one.
Note: .env credentials must match db_config.yaml.
docker-compose up --buildNote: The Quick Start setup is configured to ingest only the first 20 entries from the data source to ensure rapid deployment and demonstration.
- Visit http://localhost:8000
- Check log files under
logs/for schema creation and ingestion summaries.
docker-compose downThe database consists of three core tables and platform-specific table(s).
- Probands - Analysis and proband metadata, as well as file provenance
- Variants - Unique variant information, identified by
vcf_shorthand - Main - Variant entries per analysis
- Platform - Platform-specific attributes
The application provides several key features that directly support variant interpretation workflows:
- Search modes: users can query the database either by Proband or Genomic Position, selecting the preferred mode via radio buttons;
- Dynamic Filters (in Proband search mode): a case filter appears, allowing users to refine results;
- Results tables: query results are displayed in interactive tables, with each row offering a View button to access detailed variant information;
- Detailed Information Panels:
- Variant Information card provides a consistent overview of the variant, including key identifiers and annotations;
- Interpretation card displays the current assessment of the variant;
- Related Cases card (in Proband search mode) shows a table listing other probands/cases where the same variant was observed;
- Case Information card (in Genomic Position search mode) shows metadata about the proband and case associated with the variant.
git clone https://github.com/im175pinheiro/POsDBtools.git
cd POsDBtoolscp .env.example .env
cp pos_database_tools/credentials/db_config.yaml.example pos_database_tools/credentials/db_config.yaml
cp pos_database_tools/credentials/ncbi_credentials.yaml.example pos_database_tools/credentials/ncbi_credentials.yamlCopy environment variables and edit with personal credentials.
- If you are running the database inside Docker, make sure that the connection parameters in
credentials/db_config.yamlmatch the database credentials defined in your.envfile. - If you are using a local MariaDB installation, the
.envfile can be ignored, and onlydb_config.yamlneeds to be updated.
Important Note: Set RUN_FULL_SETUP=0 in .env before building, as you will be running scripts directly on your machine.
docker-compose up --buildpip install -r requirements.txtRun the schema creation script to initialize the database structure.
python -m pos_database_tools.create_database_schemaProcess data and populate all tables.
python -m pos_database_tools.data_ingestion_pipeline --basedir /path/to/directory/containing/data --limit nritemsDeploy the interface that allows data exploration.
cd pos_database_tools/gui_shiny
DB_CONFIG='../credentials/db_config.yaml' shiny run --reload app.pyDevelopers aiming to adapt the tool to new data sources can refer to the Platform_Integration_Guide.ipynb, which outlines the integration process step by step.
A dedicated script, troubleshooting_pmid.py, helps diagnose and debug common issues with data ingestion in this project.
This script is meant to be a flexible tool for troubleshooting. If new problems arise with pmid platform, update the script to include additional checks and logic.
For a more detailed discussion of the troubleshooting results, refer to the Jupyter notebook Documentation.ipynb.
Run the script from the command line:
python pos_database_tools/tools/troubleshooting_pmid.py --log logfile_path/to/inspect --excel path/to/excel --problem problemnrThe script inspect_clinvar_xml.py is meant for a more thorough analysis on the results of the ClinVar API call, for a specific variant of a Proband. It returns a txt file with a pretty print of the entire xml result.
Run the script from the command line:
python pos_database_tools/tools/inspect_clinvar_xml.py --proband proband_name --variant variantnumber --excel path/to/excelThis work is part of the Master's thesis titled 'Design and Implementation of a Database-Driven Software for Genetic Variant Interpretation', completed within the Master in Clinical Bioinformatics (Genome specialization) at the University of Aveiro.
Developed by: Inês Pinheiro, during the MSc Internship at Unilabs Genetics, Portugal.
Contributors: Alberto Pessoa, Unilabs Genetics Team
Supervised by: Alberto Pessoa, Unilabs Genetics Team
This project is licensed under the terms of the MIT License.
Developer: Inês Pinheiro, MSc in Clinical Bioinformatics, University of Aveiro
Email: ines.pinheiro@ua.pt

