Hello,
I'm trying to reproduce your experiments and the work of https://github.com/hkust-nlp/PreSelect with the pool data DCLM-refined
aws s3 ls s3://commoncrawl/contrib/datacomp/DCLM-refinedweb/
I created a IAM user for myself, access and secret key
but the data are not available or public "aws s3 ls s3://commoncrawl/contrib/datacomp/DCLM-refinedweb/". Coud you please share the refined DCML ?
Hello,
I'm trying to reproduce your experiments and the work of https://github.com/hkust-nlp/PreSelect with the pool data DCLM-refined
aws s3 ls s3://commoncrawl/contrib/datacomp/DCLM-refinedweb/
I created a IAM user for myself, access and secret key
but the data are not available or public "aws s3 ls s3://commoncrawl/contrib/datacomp/DCLM-refinedweb/". Coud you please share the refined DCML ?