-
Notifications
You must be signed in to change notification settings - Fork 0
Open
Description
Since we're going to read / write millions of files, let's make sure we have a format that doesn't limit us.
One first step would the to write unindented .json.gz files instead of indented .json files
but we can also think about common practices like jsonl files, sharded .jsonl.gz files (part-0001.jsonl.gz, part-0002.jsonl.gz) or even parquet? not sure but let's just keep these in mind
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels