forked from EleutherAI/pilev2
-
Notifications
You must be signed in to change notification settings - Fork 6
reddit data sources #7
Copy link
Copy link
Open
Description
Hey @taisazero I saw you were working on the reddit part of this script. I'm interested in including more natural dialogue data in future open source models, so I thought I would share some links to reddit data that might help you.
- this dataset has a list of nice subreddits https://huggingface.co/datasets/stanfordnlp/SHP
- there are other sources of reddit data, not just the api pushshift.io. Some of them might be faster to process!
- Up to 2023 (2TB) on academictorrents
- Up to 2020 on bigquery
Thanks for your work here.
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels