-
Notifications
You must be signed in to change notification settings - Fork 480
GH-502: Clarify Int96 status and add recommended ordering #504
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Changes from all commits
Commits
File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@alamb how about wording this more strongly?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I actually prefer the original wording as it a) leaves the ordering undefined (and thus isn't really a change to the specification) and b) spells out exactly how to perform the in-the-wild ordering.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 to @etseidl's suggestion. BTW, I think we need to avoid saying
only logical type stored in INT96since literally it is not a Parquet logical type or (deprecated) converted type.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@etseidl I agree that leaving it undefined is an advantage in not changing the spec. The disadvantage is that readers will have less confidence on the meaning of this field when it exists.
@wgtmac I see INT96 as a weird physical type. Had it been a real physical type it would have a logical type TIMESTAMP that would signify that top 64 bits are nanos and bottom 32 bits are julian days. It could also have a logical type
IntType{96, true/false}which could be used to store numbers greater than int64max. In the former case the ordering would be what we propose today and in the latter case the ordering would be signed comparison of the 96 bit signed or unsigned integer. The reality is that there is no logical type ever defined for INT96 and its logical type is defacto the timestamp representation.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think @etseidl is referring to my original proposed change (not the wording prior to this PR)
In terms of changing the wording from
I would personally read this as a change in the spec, which obligated writers to use the new ordering. I was under the impression we didn't have consensus on making that change
I prefer leaving the wording as undefined personally.
This might be a good thing given that we have instances where old writers wrote it incorrectly may not be correct.