We should consider adding a CLI endpoint to benchmark models on specified datasets on the command line. Key utility of this is to be able to easily compare to a ground truth and compute statistics easily.
This might look something like the following
openadmet bench --dataset-file my_real_dataset.csv --smiles-col SMILES --y-true-col CYP3A4_pIC50 --model-dir my_model --taskname OADMET_LOGAC50