I'm currently reproducing the paper results for the UCI dataset, using the GRU architecture.
My results were almost identical to them:
RMSE: 0.745 ± 0.001
MAE: 0.529 ± 0.002
But looking at the recurrent.yaml file used, the output_sequence_length seems to be only 24 steps ahead:
train: False
dataset: 'uci'
exogenous: False
epochs: 1
batch_size: 1024
input_sequence_length: 96
output_sequence_length: 24
dropout: 0.0
layers: 1
units: 50
learning_rate: 0.001
cell: 'gru'
l2: 0.0005
MIMO: True
detrend: True
Due to the UCI dataset having 15min sampling frequency, this means the model is forecasting only 6h into the future, instead of the 24h reported.