This repository contains a 4-bit quantized ClinicalBERT model for disease classification based on clinical text. Inspired by CheXNet, this model can predict diseases from patient symptom descriptions, particularly focusing on chest-related conditions.
The model uses a quantized version of ClinicalBERT to classify disease conditions based on clinical text descriptions. Quantization reduces the model size and memory requirements while maintaining performance, making it more efficient for deployment.
This project uses a synthetic dataset of clinical text descriptions for training. The synthetic data mimics real-world clinical notes while avoiding privacy concerns associated with actual patient data. The dataset includes various chest-related conditions with their corresponding symptom descriptions.
-
4-bit quantized ClinicalBERT for efficient inference.
-
Disease classification from symptom descriptions.
-
Flask (Python) web interface for easy interaction.
-
Compatible with Google Colab for accessible deployment.
# Clone the repository
git clone https://github.com/john-osborne-j quantized-clinicalbert.git
cd quantized-clinicalbert# Install dependencies
pip install -r requirements.txtThe model was trained by first fine-tuning ClinicalBERT on a dataset of chest-related disease descriptions, then quantizing the trained model to 4-bit precision using the BitsAndBytes library.
1.Train the full-precision model on the disease classification dataset.
2.Quantize the trained model to 4-bit precision for efficient inference.
3.Save both the class mapping and the quantized model.
symptoms_text = "Progressive shortness of breath over several months, now worse with minimal exertion. Chronic productive cough especially in the mornings with clear to white sputum. Reports chest tightness but no sharp pain. Long history of smoking 1 pack per day for 30 years."
predicted_disease = predict_disease(symptoms_text, model_path="clinicalbert-4bit-quantized")
print(f"Predicted disease: {predicted_disease}")output:
`low_cpu_mem_usage` was None, now default to True since model is quantized.
Predicted disease: COPD- This model should not replace professional medical diagnosis.
- Performance depends on how closely the input matches the format and vocabulary of the training data.
- The 4-bit quantization may result in slight accuracy degradation compared to the full-precision model.
- This project was inspired by CheXNet for chest X-ray diagnosis.
- Uses the ClinicalBERT model from (https://github.com/huggingface/transformers) .
- Quantization implemented using the (https://github.com/TimDettmers/bitsandbytes) library.


