Confidentiality Notice: This project was developed as a freelance-style engagement for a private client. Source code, datasets, and proprietary business logic are not publicly available.
Designed and implemented a production-grade ML backend powering intelligent features within a Healthcare SaaS platform.
The system processes both structured patient records and unstructured clinical notes to generate real-time predictions and insights, enabling faster and more informed medical decision-making.
AI/ML Engineer (End-to-End Ownership)
- Translated product requirements into scalable ML solutions
- Designed full data → model → API pipeline
- Built and optimized NLP and predictive models
- Deployed models as production-ready services
- Ensured reliability, validation, and performance under real-world constraints
| Layer | Technology |
|---|---|
| Core | Python 3.11 |
| ML | Scikit-learn |
| NLP | HuggingFace Transformers (DistilBERT) |
| API | FastAPI + Uvicorn |
| Data | Pandas, NumPy |
| Validation | Pydantic |
| Deployment | Docker |
| Infra | Cloud (client-managed) |
- Fine-tuned DistilBERT on domain-specific clinical notes
- Extracts structured categories from unstructured text
- Designed for triage automation and intelligent routing
- Built a classification pipeline using engineered clinical features
- Predicts patient risk levels (Low / Medium / High)
- Optimized using cross-validation and feature selection
- Robust preprocessing for noisy healthcare datasets
- Missing value handling, normalization, and encoding
- Text normalization for medical abbreviations
- Unified FastAPI service exposing all models
- Strict input validation via Pydantic schemas
- JSON responses with prediction scores and metadata
The system was architected as a modular ML service:
- Data Layer: ingestion, validation, preprocessing
- Model Layer: training, evaluation, versioning
- Service Layer: inference orchestration
- API Layer: REST endpoints for external integration
This approach ensured scalability, maintainability, and ease of extension.
Patient Data & Medical Notes from Healthcare Providers enter the MediFlow Platform.
⬇️
The core intelligence layer processes all structured and unstructured inputs:
🔍 NLP Text Classifier: Reads and categorizes clinical notes.
📊 Risk Scoring Model: Predicts patient risk levels.
⬇️
The platform outputs actionable insights, enabling Faster, Data-Driven Decisions.
| Component | Metric | Value |
|---|---|---|
| NLP Classifier | Accuracy | 88% |
| NLP Classifier | Weighted F1 | 0.86 |
| Risk Model | F1 Score | 0.87 |
| API Latency | P95 | < 50ms |
Impact:
- Enabled real-time clinical decision support
- Reduced manual triage effort
- Established scalable ML foundation for future features
- DistilBERT over BERT — Reduced latency while maintaining strong accuracy.
- Scikit-learn for tabular ML — Simpler deployment + interpretability (critical in healthcare).
- Single API service — Easier deployment & integration.
- Strict validation layer — Prevents invalid data in sensitive workflows.