I design and build end-to-end ML systems, NLP/RAG pipelines, and production-grade Python/MLOps tooling.
My engineering focus is on reproducibility, structured retrieval, robust data pipelines, and clean, testable ML code.
I specialize in:
- NLP / RAG systems (structured retrieval, SPARQL reasoning, knowledge-graphs)
- Model reproducibility (MLflow, DVC, deterministic pipelines)
- Python engineering (packaging, CI/CD, testing, modular design)
- High-performance data processing (parallel pipelines, QC systems)
I care about building ML systems that are reliable, interpretable, and easy for others to run and extend.
Parallel, chromosome-aware phasing tool with confidence scoring.
Packaged as a CLI with full tests, CI, and docs.
β‘οΈ https://github.com/SFGLab/SvPhaser
A complete clinical retrieval assistant using DrugBank/FDA/PubChem RDF, multi-stage retrieval, and structured evidence reasoning.
β‘οΈ https://github.com/PM-0125/INFERMed
Format-aware parsing, QC metrics, deterministic workflows, and reproducible CLI tooling (pytest/mypy/ruff + CI).
β‘οΈ https://github.com/SFGLab/lophos
Languages: Python Β· C++ Β· SQL Β· SPARQL
ML/AI: PyTorch Β· TensorFlow Β· XGBoost Β· scikit-learn Β· NLP Β· RAG Β· Feature Engineering
MLOps: MLflow Β· DVC Β· Docker Β· Conda Β· GitHub Actions Β· CI/CD Β· pytest Β· mypy Β· ruff Β· black
Data/DB: Pandas Β· NumPy Β· PostgreSQL Β· MySQL Β· RDF Knowledge Graphs Β· Apache Jena Β· QLever
Dev/Platforms: Linux Β· Git/GitLab Β· VS Code Β· Google Colab Β· GCP (Cloud Run / Vertex AI basics)
Parallel SV phasing with confidence scoring; shipped as a reproducible CLI tool with tests/CI/docs.
β‘οΈ https://github.com/SFGLab/SvPhaser
RAG-style structured retrieval + reasoning over biomedical knowledge graphs.
β‘οΈ https://github.com/PM-0125/INFERMed
High-performance QC pipeline with reproducible environments and automation.
β‘οΈ https://github.com/SFGLab/lophos
Algorithmic pipeline integrating read-depth and split-read signals.
β‘οΈ https://github.com/PM-0125/Computational-Genomics/tree/main/Structural_Variant_Detection_Algorithm
Comparative ML modelling (XGBoost/PCA/IF) on the METABRIC dataset.
β‘οΈ https://github.com/PM-0125/AI_ML_Projects/tree/main/Advanced%20Breast%20Cancer%20Analysis
- M.Sc., Computer Science & Information Systems (AI) β Warsaw University of Technology
- B.Tech., Computer Engineering (AI) β Marwadi University
I teach ML in a practical, builder-first way.
π§ Email: pranjulmishra228161@gmail.com
π LinkedIn: https://www.linkedin.com/in/pranjul-mishra/
π» GitHub: https://github.com/PM-0125