Support for lm_eval v0.4 and higher

Hi,

I noticed that you're using lm_eval v0.2.0 as an evaluation flow for more tasks, however, in v0.4.0 and later lm_eval added more datasets and made major changes to the library. Can you add adaptation for lm_eval v0.4.0 to your future plans?

Thanks.