A modular framework for training and evaluating different neural network architectures on blood glucose prediction tasks. This project is based on the work described in Brandon Harris's article and provides a clean, extensible architecture for comparing different models.
TFT_glucose/
βββ models/ # Model implementations
β βββ base/ # Base classes and utilities
β β βββ __init__.py
β β βββ base_evaluator.py # Abstract base evaluator
β β βββ base_trainer.py # Abstract base trainer
β β βββ data_handler.py # Data loading and preprocessing
β β βββ metrics_calculator.py # Performance metrics
β βββ tft_models/ # TFT-specific implementations
β β βββ __init__.py
β β βββ tft_evaluator.py # TFT evaluator
β β βββ tft_trainer.py # TFT trainer
β β βββ [legacy scripts] # Original TFT scripts
β βββ chronos_models/ # Chronos-specific implementations
β βββ __init__.py
β βββ chronos_evaluator.py # Chronos evaluator
β βββ chronos_trainer.py # Chronos trainer
βββ data/ # Data files
β βββ t1d_glucose_data.csv # Main dataset
β βββ data_prep/ # Data preparation notebooks
βββ results/ # Output files
β βββ *.png # Prediction plots
β βββ *.json # Comparison results
βββ evaluate_tft.py # TFT evaluation script
βββ train_tft.py # TFT training script
βββ evaluate_chronos.py # Chronos evaluation script
βββ compare_models.py # Model comparison framework
βββ README.md # This file
uv syncTFT Model:
# Quick evaluation with training (5 epochs)
uv run python evaluate_tft.py --quick_train --epochs 5
# Use existing trained model
uv run python evaluate_tft.py --model_path models/TFT_Glucose
# Evaluate on test data instead of holdout
uv run python evaluate_tft.py --quick_train --use_testChronos Model:
# Evaluate with default model (chronos-t5-small)
uv run python evaluate_chronos.py
# Use different Chronos model
uv run python evaluate_chronos.py --model_name amazon/chronos-t5-base
# List available Chronos models
uv run python evaluate_chronos.py --list_models
# Evaluate on test data
uv run python evaluate_chronos.py --use_test# Train with default parameters (100 epochs)
uv run python train_tft.py
# Train with custom parameters
uv run python train_tft.py --epochs 50 --hidden_size 8 --lstm_layers 4# Compare TFT and Chronos models
uv run python compare_models.py --models tft chronos --quick_train
# Compare only TFT model
uv run python compare_models.py --models tft --quick_train
# Compare only Chronos model
uv run python compare_models.py --models chronos
# List available models
uv run python compare_models.py --list_models
# Use different Chronos model in comparison
uv run python compare_models.py --models tft chronos --chronos_model_name amazon/chronos-t5-baseThe framework evaluates models using multiple metrics:
- Range: 0 to β (lower is better)
- Interpretation: Average magnitude of prediction errors
- Typical values: 10-50 mg/dL for glucose prediction
- Range: 0 to β (lower is better)
- Interpretation: Average absolute difference between predicted and actual values
- Less sensitive to outliers than RMSE
- Range: 0% to β% (lower is better)
- Interpretation: Average percentage error relative to actual values
- Range: 0% to 200% (lower is better)
- Interpretation: Symmetric version of MAPE, less biased toward low values
The models predict multiple quantiles, each representing different confidence levels:
- Q01 (1%): Very conservative prediction (low glucose values)
- Q10 (10%): Conservative prediction
- Q20 (20%): Lower confidence bound
- Q50 (50%): Median prediction (most likely value)
- Q80 (80%): Upper confidence bound
- Q90 (90%): Optimistic prediction
- Q99 (99%): Very optimistic prediction (high glucose values)
Different quantiles may perform better in different physiological states:
- Sleep periods: Higher quantiles (Q80-Q90) often perform better
- Active periods: Lower quantiles (Q20-Q50) often perform better
- Meal times: May require switching between quantiles
The framework uses abstract base classes to ensure consistency across different model implementations:
- Abstract base class for model evaluation
- Provides common evaluation functionality
- Must be extended by specific model implementations
- Abstract base class for model training
- Provides common training functionality
- Must be extended by specific model implementations
- Handles data loading, preprocessing, and splitting
- Provides consistent data interface across models
- Supports train/test/holdout splits
- Calculates performance metrics
- Provides metric interpretation guidelines
- Supports quantile-based evaluation
The TFT (Temporal Fusion Transformer) implementation includes:
- TFTGlucoseEvaluator: TFT-specific evaluation functionality
- TFTGlucoseTrainer: TFT-specific training functionality
- Hyperparameters: Optimized based on Brandon Harris's work
The Chronos implementation includes:
- ChronosGlucoseEvaluator: Chronos-specific evaluation functionality
- ChronosGlucoseTrainer: Chronos-specific training functionality
- Pretrained Models: Uses foundation models from Amazon's Chronos repository
- Zero-shot Forecasting: No training required, uses pretrained weights
To add a new model architecture:
-
Create model directory:
mkdir models/your_model
-
Implement evaluator:
# models/your_model/your_evaluator.py from ..base.base_evaluator import BaseGlucoseEvaluator class YourModelEvaluator(BaseGlucoseEvaluator): def create_model(self, **kwargs): # Implement model creation pass def train_model(self, ts_train_scaled, ts_test_scaled, ts_features_scaled, **kwargs): # Implement training pass def generate_predictions(self, ts_input, ts_features, n_steps, **kwargs): # Implement prediction generation pass
-
Implement trainer:
# models/your_model/your_trainer.py from ..base.base_trainer import BaseGlucoseTrainer class YourModelTrainer(BaseGlucoseTrainer): def create_model(self, **kwargs): # Implement model creation pass def train_model(self, ts_train_scaled, ts_test_scaled, ts_features_scaled, **kwargs): # Implement training pass
-
Add to comparison framework:
# In compare_models.py from models.your_model.your_evaluator import YourModelEvaluator self.available_models = { 'tft': TFTGlucoseEvaluator, 'chronos': ChronosGlucoseEvaluator, 'your_model': YourModelEvaluator, # Add your model here }
Based on model evaluation, here's what you might expect:
+------------+--------+-------+------------+-------------+
| Quantile | RMSE | MAE | MAPE (%) | SMAPE (%) |
|------------+--------+-------+------------+-------------+
| Q01 | 104.14 | 92.68 | 63.35 | 95.57 |
| Q10 | 89.16 | 75.38 | 48.95 | 68.84 |
| Q20 | 78.76 | 63.02 | 38.79 | 52.65 |
| Q50 | 48.71 | 44.11 | 37.02 | 33.37 |
| Q80 | 57.78 | 49.21 | 49.99 | 35.91 |
| Q90 | 69.20 | 54.95 | 58.44 | 38.58 |
| Q99 | 87.29 | 73.06 | 75.30 | 46.81 |
+------------+--------+-------+------------+-------------+
Best performing quantile: Q50 (50th percentile) across all metrics
+------------+--------+--------+------------+-------------+
| Quantile | RMSE | MAE | MAPE (%) | SMAPE (%) |
|------------+--------+--------+------------+-------------|
| Q01 | 144.40 | 136.38 | 99.87 | 199.48 |
| Q10 | 144.40 | 136.38 | 99.87 | 199.48 |
| Q20 | 144.37 | 136.34 | 99.84 | 199.37 |
| Q50 | 144.28 | 136.26 | 99.78 | 199.12 |
| Q80 | 144.17 | 136.16 | 99.70 | 198.81 |
| Q90 | 144.11 | 136.10 | 99.66 | 198.66 |
| Q99 | 144.11 | 136.10 | 99.66 | 198.66 |
+------------+--------+--------+------------+-------------+
Best performing quantile: Q90 (90th percentile) across all metrics
+---------+-------------+------------+-------------+--------------+
| Model | Best RMSE | Best MAE | Best MAPE | Best SMAPE |
|---------+-------------+------------+-------------+--------------|
| TFT | 48.71 | 44.11 | 37.02 | 33.37 |
| CHRONOS | 144.11 | 136.09 | 99.65 | 198.59 |
+---------+-------------+------------+-------------+--------------+
Winner: TFT model significantly outperforms Chronos on this glucose prediction task.
# TFT with custom parameters
uv run python evaluate_tft.py --quick_train \
--hidden_size 8 \
--lstm_layers 4 \
--attention_heads 4 \
--batch_size 64 \
--learning_rate 0.001# Compare TFT and Chronos models
uv run python compare_models.py --models tft chronos --quick_train
# Compare with different Chronos model
uv run python compare_models.py --models tft chronos --chronos_model_name amazon/chronos-t5-base# Use custom data path
uv run python evaluate_tft.py --data_path your_data.csv --quick_trainThe framework expects a CSV file with the following columns:
date_time: Timestamp (datetime format)glucose_value: Target variable (float)carbs: Carbohydrate intake (float)bolus: Insulin bolus (float)insulin_on_board: Calculated insulin remaining (float)glucose_trend_20: 20-minute glucose trend (float)last_delta: Immediate glucose change (float)
-
"Invalid past_covariates" error:
- The framework automatically handles this by using the full feature dataset
- This is normal for autoregressive models
-
Memory issues with large datasets:
- Reduce batch size in model parameters
- Use fewer epochs for quick evaluation
-
Model loading errors:
- Ensure the model path exists and contains valid checkpoints
- Use
--quick_trainto train a new model instead
- For quick evaluation: Use
--epochs 1or--epochs 5 - For production training: Use
--epochs 100or more - For different datasets: Adjust the split ratio in
DataHandler
- Run evaluation: Start with
uv run python evaluate_tft.py --quick_train - Analyze results: Look at which quantiles perform best for different time periods
- Implement quantile switching: Based on physiological states (sleep, meals, etc.)
- Add new models: Extend the framework with LSTM, GRU, Transformer, etc.
- Fine-tune hyperparameters: Use the hyperparameter tuning notebook for optimization
This framework is designed to be extensible. To add new models:
- Follow the base class interfaces
- Implement the required abstract methods
- Add your model to the comparison framework
- Test with the existing data
- Document your model's specific parameters
The modular architecture makes it easy to add new neural network architectures while maintaining consistency in evaluation and comparison.