Construct working benchmark

Benchmarking procedure should consist of:
- Multiple datasets loaded into the data lakehouse
- Assigned to either meta-train or meta-test sets
- Assigned to either support or query sets

We must ensure the following conditions are not broken:
- Data from a single dataset remains in either the training or testing set