-
Notifications
You must be signed in to change notification settings - Fork 0
Description
I've been looking around at different CSV parsers to use as an alternative to JSON because:
- often just 10% of the size to start with (and compresses even smaller)
- easier to read and reason about
- easier for LLMs
- easier for local caching (and more importantly: cache busting)
I found this via Reddit and I've also looked at csvutil (my current front-runner) and gocarina/gocsv, but nothing is a perfect fit.
Before I go forking and creating my own thing, I wanted to lay out my use case and ideal csv parser and get your feedback on your experience using what you've created and what you learned coding it - I've been using Go since it's release, but I never got into reflection or did anything like this.
Anyway here's my wishlist:
- focused on being a structured data format - much like JSON
- okay to have more columns - they get ignored (or captured in a catchall*)
- okay to have fewer columns - they get zero values
- use
jsontag by default (for compatibility across the ecosystem -sqlcand other tools only work with that tag 😢 - ability to change tag to
csv - serialize the same as JSON (time.Time, []byte, etc)
- maybe store all rows with the same number of columns as the header
- ignore any extra columns
- expand any rows that are show (allowing for saving bytes on rarely used fields)
- (this might be a bad idea, it may be better to just consider anything that fails a parse fail)
- for unknown structs try MarshalCSV, MarshalJSON, MarshalText
- easy to allow nested lists (e.g. comma or space separated ids/slugs) and JSON
- allow for a catchall field (e.g. to capture runtime fields, such as for templating text with arbitrary values)
- if the type is
csv.Fields []Torcsv.Records map[string]T, then it collects whatever isn't part of the struct - (I'm mostly thinking
[]stringormap[string]string, but having the option for[]intand[]json.RawMessagewould be very helpful) - have
func (f Fields[T any]) Get(name string) Tto efficiently get by header index for narrow rows where the cost of hashing is more thanslices.Index(header, name)
- if the type is
I want something that's maximally Go-ish in the since of leaving most implementation details up to the struct, but slightly more convenient that json.RawMessage for generic data, and just as convenient as JSON for structured data, and not concerned with unstructured data.
Something that pairs well with databases.
Dreaming Big
If I could go a giant step further, I'd play off the "TOON" / "Values Separated by Comma" Ai meme that came in response to the "Progressive JSON" announcement, but be a little more serious about it and create a format with multiple files separated by both Name and Type headers, something like:
TYPE,Person,id,name,age,favorite_book
TYPE,Person,string,string,int,Book
Person,1,John,21,abc123
TYPE,Book,id,title,author
TYPE,Book,string,string,string
Book,abc123,The Art of Computer Programming,Donald E. KnuthI'm not exactly sure what the ergonomics of that would be, but it would be able to handle cyclic structures and keep all of the caching, type, and performance benefits.