For each method, the average time per replication should be added to the `result_df`. Averaged scores over all replications should be sufficient.