-
Notifications
You must be signed in to change notification settings - Fork 0
Open
Labels
documentationImprovements or additions to documentationImprovements or additions to documentationenhancementNew feature or requestNew feature or requestgood first issueGood for newcomersGood for newcomers
Description
Ideally instead of the simple bullet list, it would be useful to have a table defining a few key things about the datasets.
As an example, this could look like this but we can iterate and add/modify anythings in the design:
| Dataset | Source | Size | Approx # Tokens | Modalities | Remarks |
|---|---|---|---|---|---|
| Conceptual Captions | https://ai.google.com/research/ConceptualCaptions/ | X GB/TB | XYZ | ||
This would help us identify the right datasets to include while training our Neko model.
Metadata
Metadata
Assignees
Labels
documentationImprovements or additions to documentationImprovements or additions to documentationenhancementNew feature or requestNew feature or requestgood first issueGood for newcomersGood for newcomers