This directory contains naive baseline methods for the CTF for Science framework. These baselines serve as simple benchmarks for comparison with more sophisticated models.
- Average: Predicts the average of the training data for each feature, repeated across all test time steps.
- Constant: Predicts a specified constant value for all features and time steps. To use the zero baseline, set
constant_value: 0.0. - Random: Generates random predictions based on a specified distribution (e.g., uniform or normal).
To run a baseline, use the run.py script from the project root followed by the path to a configuration file. For example:
python models/CTF_NaiveBaselines/run.py models/CTF_NaiveBaselines/config/config_Lorenz_average_batch_1-6.yamlImportant: Always run the script from the project root to ensure relative paths (e.g., to datasets and results directories) work correctly.
This command executes the average baseline on the specified dataset (e.g., ODE_Lorenz) for the sub-datasets defined in the config file (e.g., sub-datasets 1 through 6). Results, including predictions, evaluation metrics, and visualizations, are saved in results/<dataset_name>/CTF_NaiveBaselines_<method>/<batch_id>/.
The run.py script supports batch runs across multiple sub-datasets as specified in the configuration file. It:
- Parses the
pair_idfrom the config to determine which sub-datasets to process. - Generates a unique
batch_idfor the run (e.g.,batch__20250404_164642). - Processes each sub-dataset, saving results and visualizations in a structured directory.
- Aggregates metrics in a
batch_results.yamlfile.
Configuration files are located in the models/CTF_NaiveBaselines/config/ directory and specify the dataset, sub-datasets, and baseline method, along with method-specific parameters.
config_Lorenz_average_batch_1-6.yaml: Runs the average baseline onODE_Lorenzfor sub-datasets 1 through 6.config_Lorenz_constant_batch_123456.yaml: Runs the constant baseline onODE_Lorenzfor sub-datasets 1, 2, 3, 4, 5, and 6.config_Lorenz_random_batch_1-6.yaml: Runs the random baseline onODE_Lorenzfor sub-datasets 1 through 6.config_KS_average_batch_all.yaml: Runs the average baseline onPDE_KSfor all sub-datasets.config_KS_constant_batch_all.yaml: Runs the constant baseline onPDE_KSfor all sub-datasets.config_KS_random_batch_all.yaml: Runs the random baseline onPDE_KSfor all sub-datasets.
Each configuration file must include:
dataset(required):name: The dataset name (e.g.,ODE_Lorenz,PDE_KS).pair_id: Specifies sub-datasets to run on. Formats:- Single integer:
pair_id: 3 - List:
pair_id: [1, 2, 3, 4, 5, 6] - Range string:
pair_id: '1-6' - Omitted or
'all': Runs on all sub-datasets.
- Single integer:
model:name: TypicallyCTF_NaiveBaselines.method: The baseline method (average,constant,random).- Method-specific parameters:
- For
constant:constant_value(e.g.,0.0for zero baseline). - For
random:random_distribution(e.g.,uniform,normal),random_seed, etc.
- For
Example (models/CTF_NaiveBaselines/config/config_Lorenz_average_batch_1-6.yaml):
dataset:
name: ODE_Lorenz
pair_id: '1-6' # Runs on sub-datasets 1 through 6
model:
name: CTF_NaiveBaselines
method: average| Method | Description | Config Example |
|---|---|---|
average |
Predicts the mean of the training data | method: average |
constant |
Predicts a specified constant value | method: constantconstant_value: 0.0 |
random |
Generates random predictions from a distribution | method: randomrandom_distribution: uniform |
-
Average Baseline on Lorenz sub-datasets 1-6:
python models/CTF_NaiveBaselines/run.py models/CTF_NaiveBaselines/config/config_Lorenz_average_batch_1-6.yaml
-
Constant Baseline (zero baseline) on KS all sub-datasets:
python models/CTF_NaiveBaselines/run.py models/CTF_NaiveBaselines/config/config_KS_constant_batch_all.yaml
-
Random Baseline on Lorenz sub-datasets 1-6:
python models/CTF_NaiveBaselines/run.py models/CTF_NaiveBaselines/config/config_Lorenz_random_batch_1-6.yaml
The baselines rely on packages already in the main requirements.txt:
- numpy
- scipy
- pyyaml
- matplotlib (for visualizations)
No additional dependencies are required.
- The
randombaseline generates predictions based on a specified distribution (e.g., uniform or normal), differing from a random search over constants as in earlier implementations. - Ensure configuration files match the desired dataset and sub-datasets.
- Results are saved with a unique
batch_idto prevent overwriting and organize runs. - Visualizations are automatically generated and saved in
results/<dataset>/<model>/<batch_id>/<pair_id>/visualizations/.