-
Notifications
You must be signed in to change notification settings - Fork 7
Open
Description
hi, i'm using columnify with avro input record. and found that records of logical types(around datetime: date, timemillis, timemicros, timestampmillis, timestampmicros) are broken.
for example, the sample data gets result below.
# jsonl input(OK)
$ ./columnify -schemaType avro -schemaFile columnifier/testdata/schema/logicals.avsc -recordType jsonl columnifier/testdata/record/logicals.jsonl > jsonl.parquet
$ parquet-tools cat -json jsonl.parquet
{"date":1,"timemillis":1000,"timemicros":1000000,"timestampmillis":1000,"timestampmicros":1000000}
{"date":2,"timemillis":2000,"timemicros":2000000,"timestampmillis":2000,"timestampmicros":2000000}
{"date":3,"timemillis":3000,"timemicros":3000000,"timestampmillis":3000,"timestampmicros":3000000}
{"date":4,"timemillis":4000,"timemicros":4000000,"timestampmillis":4000,"timestampmicros":4000000}
{"date":5,"timemillis":5000,"timemicros":5000000,"timestampmillis":5000,"timestampmicros":5000000}
{"date":6,"timemillis":6000,"timemicros":6000000,"timestampmillis":6000,"timestampmicros":6000000}
{"date":7,"timemillis":7000,"timemicros":7000000,"timestampmillis":7000,"timestampmicros":7000000}
{"date":8,"timemillis":8000,"timemicros":8000000,"timestampmillis":8000,"timestampmicros":8000000}
{"date":9,"timemillis":9000,"timemicros":9000000,"timestampmillis":9000,"timestampmicros":9000000}
{"date":10,"timemillis":10000,"timemicros":10000000,"timestampmillis":10000,"timestampmicros":10000000}
# avro input(NG)
$ ./columnify -schemaType avro -schemaFile columnifier/testdata/schema/logicals.avsc -recordType avro columnifier/testdata/record/logicals.avro > avro.parquet
$ parquet-tools cat -json avro.parquet
{"date":1970,"timemillis":1000000000,"timemicros":1000000000,"timestampmillis":1970,"timestampmicros":1970}
{"date":1970,"timemillis":2000000000,"timemicros":2000000000,"timestampmillis":1970,"timestampmicros":1970}
{"date":1970,"timemillis":0,"timemicros":3000000000,"timestampmillis":1970,"timestampmicros":1970}
{"date":1970,"timemillis":0,"timemicros":4000000000,"timestampmillis":1970,"timestampmicros":1970}
{"date":1970,"timemillis":0,"timemicros":5000000000,"timestampmillis":1970,"timestampmicros":1970}
{"date":1970,"timemillis":0,"timemicros":6000000000,"timestampmillis":1970,"timestampmicros":1970}
{"date":1970,"timemillis":0,"timemicros":7000000000,"timestampmillis":1970,"timestampmicros":1970}
{"date":1970,"timemillis":0,"timemicros":8000000000,"timestampmillis":1970,"timestampmicros":1970}
{"date":1970,"timemillis":0,"timemicros":9000000000,"timestampmillis":1970,"timestampmicros":1970}
{"date":1970,"timemillis":0,"timemicros":10000000000,"timestampmillis":1970,"timestampmicros":1970}
this behavior seems to come from goavro that format logical types to go native types(using time).
though i dont have good idea to reformat go native types to parquet primitive types before writing :(
Metadata
Metadata
Assignees
Labels
No labels