Thank you for your work, it has been very helpful, but I have encountered some issues:
my code:
ds = load_dataset(
"/data/public/models/RedPajama-Data-V2/RedPajama-Data-V2/RedPajama-Data-V2.py",
partition="head_middle",
languages=["en"],
name="sample",)
but ds contains results outside of English:

Thank you for your reply!