POs Database Tools

The POs Database Tools project provides a modular framework for creating and managing the POs Database - an adaptable system built to ingest, normalize, and organize tertiary analysis data from diverse sources.

Its core functionality processes raw patient genomic data, extracts variant and clinical metadata, and enriches records through external APIs, ultimately building a comprehensive database to support tertiary analysis.

A lightweight web application complements the backend, offering an interactive interface for querying and exploring the data.

For more details, please refer to the Documentation.ipynb file.

System Requirements

Python 3.12.3
Docker and Docker Compose (for containerized setup)
MySQL/MariaDB (if running without Docker)

Quick Start (with Docker)

Download Docker Desktop for Mac or Windows. Docker Compose will be automatically installed. On Linux, make sure you have the latest version of Compose.

1. Clone the repository:

git clone https://github.com/im175pinheiro/POsDBtools.git

cd POsDBtools

2. Create the .env file and credentials' files from the examples

cp .env.example .env

cp pos_database_tools/credentials/db_config.yaml.example pos_database_tools/credentials/db_config.yaml

cp pos_database_tools/credentials/ncbi_credentials.yaml.example pos_database_tools/credentials/ncbi_credentials.yaml

Edit with your own DB credentials if desired. Add NCBI API key if you have one.

Note: .env credentials must match db_config.yaml.

3. Build and Start the containers:

docker-compose up --build

Note: The Quick Start setup is configured to ingest only the first 20 entries from the data source to ensure rapid deployment and demonstration.

4. Access the web application:

Visit http://localhost:8000
Check log files under logs/ for schema creation and ingestion summaries.

To stop the services:

docker-compose down

Overview

Database

The database consists of three core tables and platform-specific table(s).

Probands - Analysis and proband metadata, as well as file provenance
Variants - Unique variant information, identified by vcf_shorthand
Main - Variant entries per analysis
Platform - Platform-specific attributes

User Interface

The application provides several key features that directly support variant interpretation workflows:

Search modes: users can query the database either by Proband or Genomic Position, selecting the preferred mode via radio buttons;
Dynamic Filters (in Proband search mode): a case filter appears, allowing users to refine results;
Results tables: query results are displayed in interactive tables, with each row offering a View button to access detailed variant information;
Detailed Information Panels:
- Variant Information card provides a consistent overview of the variant, including key identifiers and annotations;
- Interpretation card displays the current assessment of the variant;
- Related Cases card (in Proband search mode) shows a table listing other probands/cases where the same variant was observed;
- Case Information card (in Genomic Position search mode) shows metadata about the proband and case associated with the variant.

Usage (for devs)

Docker Setup

git clone https://github.com/im175pinheiro/POsDBtools.git

cd POsDBtools

cp .env.example .env

cp pos_database_tools/credentials/db_config.yaml.example pos_database_tools/credentials/db_config.yaml

cp pos_database_tools/credentials/ncbi_credentials.yaml.example pos_database_tools/credentials/ncbi_credentials.yaml

Copy environment variables and edit with personal credentials.

If you are running the database inside Docker, make sure that the connection parameters in credentials/db_config.yaml match the database credentials defined in your .env file.
If you are using a local MariaDB installation, the .env file can be ignored, and only db_config.yaml needs to be updated.

Important Note: Set RUN_FULL_SETUP=0 in .env before building, as you will be running scripts directly on your machine.

docker-compose up --build

Dependencies

pip install -r requirements.txt

1. Create the Database Schema

Run the schema creation script to initialize the database structure.

python -m pos_database_tools.create_database_schema

2. Run the Data Ingestion Pipeline

Process data and populate all tables.

python -m pos_database_tools.data_ingestion_pipeline --basedir /path/to/directory/containing/data --limit nritems

3. Launch Web Application

Deploy the interface that allows data exploration.

cd pos_database_tools/gui_shiny 
DB_CONFIG='../credentials/db_config.yaml' shiny run --reload app.py

Integrating new platforms or data sources

Developers aiming to adapt the tool to new data sources can refer to the Platform_Integration_Guide.ipynb, which outlines the integration process step by step.

Troubleshooting and Validation

Diagnosing Ingestion Problems

A dedicated script, troubleshooting_pmid.py, helps diagnose and debug common issues with data ingestion in this project.

This script is meant to be a flexible tool for troubleshooting. If new problems arise with pmid platform, update the script to include additional checks and logic.

For a more detailed discussion of the troubleshooting results, refer to the Jupyter notebook Documentation.ipynb.

Usage

Run the script from the command line:

python pos_database_tools/tools/troubleshooting_pmid.py --log logfile_path/to/inspect --excel path/to/excel --problem problemnr

Detailed inspection of ClinVar XML result

The script inspect_clinvar_xml.py is meant for a more thorough analysis on the results of the ClinVar API call, for a specific variant of a Proband. It returns a txt file with a pretty print of the entire xml result.

Usage

Run the script from the command line:

python pos_database_tools/tools/inspect_clinvar_xml.py --proband proband_name --variant variantnumber --excel path/to/excel

Acknowledgments

This work is part of the Master's thesis titled 'Design and Implementation of a Database-Driven Software for Genetic Variant Interpretation', completed within the Master in Clinical Bioinformatics (Genome specialization) at the University of Aveiro.

Developed by: Inês Pinheiro, during the MSc Internship at Unilabs Genetics, Portugal.

Contributors: Alberto Pessoa, Unilabs Genetics Team

Supervised by: Alberto Pessoa, Unilabs Genetics Team

Licence

This project is licensed under the terms of the MIT License.

Contact

Developer: Inês Pinheiro, MSc in Clinical Bioinformatics, University of Aveiro
Email: ines.pinheiro@ua.pt

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
existing_files		existing_files
pos_database_tools		pos_database_tools
.dockerignore		.dockerignore
.env.example		.env.example
.gitignore		.gitignore
Dockerfile		Dockerfile
Documentation.ipynb		Documentation.ipynb
Interface.png		Interface.png
LICENCE		LICENCE
Platform_Integration_Guide.ipynb		Platform_Integration_Guide.ipynb
README.md		README.md
RMD.png		RMD.png
docker-compose.yml		docker-compose.yml
entrypoint.sh		entrypoint.sh
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

POs Database Tools

System Requirements

Quick Start (with Docker)

1. Clone the repository:

2. Create the .env file and credentials' files from the examples

3. Build and Start the containers:

4. Access the web application:

To stop the services:

Overview

Database

User Interface

Usage (for devs)

Docker Setup

Dependencies

1. Create the Database Schema

2. Run the Data Ingestion Pipeline

3. Launch Web Application

Integrating new platforms or data sources

Troubleshooting and Validation

Diagnosing Ingestion Problems

Usage

Detailed inspection of ClinVar XML result

Usage

Acknowledgments

Licence

Contact

About

Uh oh!

Releases

Packages

Languages

License

im175pinheiro/POsDBtools

Folders and files

Latest commit

History

Repository files navigation

POs Database Tools

System Requirements

Quick Start (with Docker)

1. Clone the repository:

2. Create the .env file and credentials' files from the examples

3. Build and Start the containers:

4. Access the web application:

To stop the services:

Overview

Database

User Interface

Usage (for devs)

Docker Setup

Dependencies

1. Create the Database Schema

2. Run the Data Ingestion Pipeline

3. Launch Web Application

Integrating new platforms or data sources

Troubleshooting and Validation

Diagnosing Ingestion Problems

Usage

Detailed inspection of ClinVar XML result

Usage

Acknowledgments

Licence

Contact

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages