-
Notifications
You must be signed in to change notification settings - Fork 7
Description
The specification mentions the extension field types copy, back-ref and forward-ref (as not finished).
This could turn out to be an outstanding feature, but not even an idea of how this could work is given.
Consider the following use cases, which is quite common:
In a table (for example the results of a SQL query), very often the value of a column is the identical to the same column in the previous row.
With the existing solution the value has to be repeated nevertheless.
This wastes band-width and memory when processing/reading the object (for example in Java, with N rows where column X has only M<=N distinct values (where often M<<N), it would create N string objects, many of them with the same value, where M would be enough.
On the application side, this could be optimized for example by using a magic value like utf-8 "" (two double quotation marks) or a special int64-value (meaning: see previous row), but then the creator and the consumer of the table would have to agree on this magic value and take care of it in the code.
I propose that inside a table, the extended type back-ref with a null value officialy means: Take the value from the same column in the previous row.
Also it is clear that by using back-references of any kind, the consumer has to keep the previous objects in memory to a certain amount.
In case of my proposal for using a back-ref to the previous row in a table, it suffices to keep the previous row (and the objects it created) in memory.
The same for the writer: It can automatically create back-references to the previous row if it still knows the (data source for) the content of the previous row.
The idea could maybe be extended a bit further for arrays ob objects, but I think it is specially useful to avoid data duplication in tables, saving band-width and memory.
For generic back-references, things would be more complicated, e.g. one would have to invent something XPath-like to specify what it referenced - or one would need the tagging concept like CBOR.