HIMART is a predictive modeling framework designed to simulate and forecast the selection of initial HIV treatment regimens. By integrating patient clinical data with historical prescription trends and clinical guidelines, this tool predicts both the broader regimen class and specific backbone drug combinations.
- Era-Specific Modeling: Accounts for major shifts in HIV treatment guidelines by utilizing distinct models for the 2010–2018 and 2019–2022 periods.
- Non-linear Feature Engineering: Implements Restricted Cubic Splines (RCS) for clinical variables such as BMI, age, and eGFR to capture complex biological relationships.
- Clinical Constraint Integration: Automatically enforces medical thresholds, such as eGFR-based contraindications for specific drugs.
- Probabilistic Prediction: Uses softmax-based sampling from model logits to generate realistic distributions of drug usage rather than simple deterministic classification.
The pipeline requires Python 3.9+ and the following libraries:
pip install pandas numpy matplotlib seaborn scikit-learn imbalanced-learn joblib patsy scipy
To generate predictions for a patient cohort, use the predict_art_init_drugs function:
import pandas as pd
from HIMART_Model import predict_art_init_drugs
# Load your patient clinical data
df = pd.read_csv('patient_data.csv')
# Run the prediction pipeline
# The 'seed' parameter ensures reproducibility for probabilistic sampling
final_predicted_df = predict_art_init_drugs(df, seed=123)
# View the predicted regimen and backbone drugs
print(final_predicted_df[['REGIMEN_pred', 'TAF_pred', 'TDF_pred', 'ABC_pred']].head())HIMART Model.ipynb: The core research notebook containing predictor processing logic and model definitions.model/: Directory for pre-trained models and scalers (e.g.,regimen2019-2022_model.pkl).