Skip to content

Conversation

@ZJONSSON
Copy link
Contributor

Here is a very basic example of how we can use dockerized parquet-tools (from parquet-mr) to test on travis whether files created by parquetjs can be read by parquet-mr (and therefore spark etc)

The basic test succeeds but more advanced tests fail. I will add a failing branch that we can use as a guide for fixing any errors.

image

@ZJONSSON ZJONSSON requested review from asmuth and kessler February 28, 2018 01:53
tags: true
npm_api_key:
secure: HK/tFvgj/TtYTJ3s2Bszc1/yJWvbSkLcfY3ki3GEuudMpfzcq134/2fbdZLb+B7Ukg31rdRVFCrSg8k6a1KhztkRr9SnMts5WO2ZGulmzNQ+XsBwdd0Bf7KYamAtqft5qBnSvh+ypBloQJQqq5qazb31971Fwvg5pdkYTQgCQxyIfZlH8nUbOxcYyl4w6Mvz5zsQp2c4OKOdq0FgeU3OqJ05i5lWL/CZWRO9L7+f0Uih5Jr9CuRzBUcVVxIopn1uOX1czug+OudIuUMLxbJwJt69ZpWdTbywLg6wVvA58ozbyialuEx8S1UaehsqHFj29JJWcOw+6TCi5+512DrBZMguiyTkjq5I5kmRcPNPY8dcqJUZUD6eDpKYQemFeg+6vKIvT3spK53VXNoEOIqAAiNTpmfY6JQ17S31gy1TqZldMtWr1HXf95LGlLC+czgMHPi1m6YiUgdDx5N7MFXumdOxiyHNdoitQFyyyS57RS7BG8/5ZMeKIXEfhQ9KU/D5L3KpgNCBmwVR72vF3nb89aVETrvNIbZEgc/cTdYWquezfPibGoGjWVJ4c38nd30s6rmoMBwoDwznaDg87ameoHUKSCSMx3uVXRZ5uR2C4SmTqVbWNKLXszL4iIW54EaLf3M+AYjoAb+EupaPMuEonJukdzkalp03RekYVeIY23U=

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ZJONSSON ZJONSSON force-pushed the parquet-mr branch 2 times, most recently from 0094133 to 8764ac4 Compare February 28, 2018 02:26
@ZJONSSON
Copy link
Contributor Author

Here is a failing branch: https://github.com/ZJONSSON/parquetjs/tree/parquet-mr-fail
Problems with the RLE encoding

image

* bitpacking should work for any length of data, not just multiple of 8 (last packed is padded if less than 8)

* Improve runs estimation - only start a new run if we are at a mod 8 === 0, otherwise use bitpacking
@ZJONSSON
Copy link
Contributor Author

ZJONSSON commented Mar 1, 2018

This PR has been rebased on #57 to include fixes for RLE in dlevels and rlevels + more test added to verify that the results are correct as seen from parquet-mr

@justinsoliz
Copy link

I seem to be running into this issue as well. Are there any outstanding items on this PR that I might be able to help with to get it merged in?

@ZJONSSON
Copy link
Contributor Author

Do your problems go away when you use this branch? The only outstanding thing here is a code review afaik.

@justinsoliz
Copy link

NPM install per this comment does the trick for me:
#29 (comment)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants