feat(function): Implement Google Polyline encoding and decoding functions#16516
feat(function): Implement Google Polyline encoding and decoding functions#16516deepthibose01 wants to merge 6 commits intofacebookincubator:mainfrom
Conversation
✅ Deploy Preview for meta-velox canceled.
|
|
Hi @deepthibose01! Thank you for your pull request and welcome to our community. Action RequiredIn order to merge any pull request (code, docs, etc.), we require contributors to sign our Contributor License Agreement, and we don't seem to have one on file for you. ProcessIn order for us to review and merge your suggested changes, please sign at https://code.facebook.com/cla. If you are contributing on behalf of someone else (eg your employer), the individual CLA may not be sufficient and your employer may need to sign the corporate CLA. Once the CLA is signed, our tooling will perform checks and validations. Afterwards, the pull request will be tagged with If you have received this in error or have any questions, please contact us at cla@meta.com. Thanks! |
|
Thank you for signing our Contributor License Agreement. We can now accept your code for this (and any) Meta Open Source project. Thanks! |
|
Thank you for signing our Contributor License Agreement. We can now accept your code for this (and any) Meta Open Source project. Thanks! |
ba845ae to
5c3ab3d
Compare
|
gtest runs |
|
@jkhaliqi Thanks for the review. I have resolved the review comments and has added the non-point testing scenarios. Please have a look. |
|
Thanks @jkhaliqi. I have ensured the changes are made. Please check. Thanks |
| .. function:: google_polyline_encode(points: array(Geometry)) -> encoded: varchar | ||
|
|
||
| Encodes an array of Point geometries into a Google Polyline encoded string. | ||
| Uses the default precision exponent of 5. The precision used for decoding is 10^precision_exponent. |
There was a problem hiding this comment.
Typo: decoding -> encoding?
Could we not say directly that it encodes with precision 10^5 as there is no way to override the (default) precision for this function?
Same for the decode function?
There was a problem hiding this comment.
Sure. The statements are updated to add only the relevant information wrt specific function signature.
| #include "velox/functions/prestosql/types/BingTileType.h" | ||
| #include "velox/functions/prestosql/types/GeometryRegistration.h" | ||
| #include "velox/functions/prestosql/types/SphericalGeographyRegistration.h" | ||
| #include "velox/functions/prestosql/GooglePolylineFunctions.h" |
There was a problem hiding this comment.
Move to the line below GeometryFunctions.h. (the includes are sorted alphabetically).
There was a problem hiding this comment.
Updated the file inclusions to follow alphabetical order
| /// Encode a single delta value using Google Polyline encoding. | ||
| /// https://developers.google.com/maps/documentation/utilities/polylinealgorithm | ||
| /// Algorithm: | ||
| /// 1. Convert signed to unsigned. |
There was a problem hiding this comment.
Maybe mention the ZigZag encoding used here.
There was a problem hiding this comment.
Added detailed ZigZag encoding documentation in the algorithm description.
| } | ||
|
|
||
| while (unsignedDelta >= 0x20) { | ||
| int64_t nextChunk = (0x20 | (unsignedDelta & 0x1f)) + 63; |
There was a problem hiding this comment.
Please use a const for magic numbers like 63.
There was a problem hiding this comment.
All magic numbers have been replaced with named constants. The constants are properly scoped within the anonymous namespace since they're only used by the internal delta encoding/decoding functions.
| out_type<Varchar>& result, | ||
| const arg_type<Array<Geometry>>& points, | ||
| int64_t precisionExponent) { | ||
| VELOX_USER_CHECK_GE( |
There was a problem hiding this comment.
Is there also a maximum for the exponent? I image this is not arbitrary large either?
We need to prevent overflows.
There was a problem hiding this comment.
Java only validates minimum precision (>= 1). Should I add maximum validation (e.g., <= 18) as a safety improvement even though Java doesn't have it?
| out_type<Array<Geometry>>& result, | ||
| const arg_type<Varchar>& encoded, | ||
| int64_t precisionExponent) { | ||
| VELOX_USER_CHECK_GE( |
| FOLLY_ALWAYS_INLINE void call( | ||
| out_type<Varchar>& result, | ||
| const arg_type<Array<Geometry>>& points) { | ||
| callImpl(result, points, kDefaultPrecisionExponent); |
There was a problem hiding this comment.
We can save the calculation by hard coding the actual value for 10^5 and use that with a little variation. That way we don't have to re-calculate the same value over and over.
There was a problem hiding this comment.
Added precomputed constant kDefaultPrecision = 100000.0 . Both encode and decode functions now check if precisionExponent == kDefaultPrecisionExponent and use the hard-coded value directly, avoiding the std::pow() calculation for the common case (default precision of 10^5).
| kMinimumPrecisionExponent); | ||
|
|
||
| double precision = std::pow(10.0, static_cast<double>(precisionExponent)); | ||
| std::string encodedStr(encoded.data(), encoded.size()); |
There was a problem hiding this comment.
Do we need to make a copy or could we not operate on StringView directly?
There was a problem hiding this comment.
You're right.the copy is unnecessary; I'll remove it and use the StringView directly since decodeNextDelta only reads from it.
| int64_t b; | ||
|
|
||
| do { | ||
| VELOX_USER_CHECK_LT( |
There was a problem hiding this comment.
We don't want to throw an exception for bad user input. Instead, we want to use the Status return. (unwinding the exception is expensive).
Can we refactor this so the caller can use the Status as a return of the call function in case a user error occurs - in this case a bad input with missing data.
Check the other geometry function implementations.
There was a problem hiding this comment.
Thank you very much @czentgr for the review!
I agree and also like to implement the approach with status return.Request your guidance regarding the error-handling approach.
Java reference (GeoFunctions.java:1394) also throws exceptions for invalid input. Should I convert to Status returns (better C++ performance, matches other Velox geometry functions) or keep exceptions to match Java?
czentgr
left a comment
There was a problem hiding this comment.
Please also run the pre-commit workflow.
ba00006 to
9aaa667
Compare
|
Hi @czentgr, |
| auto arrayVec = makeNullableArrayVector<std::string>({{points}}); | ||
| auto input = makeRowVector({arrayVec}); | ||
| std::optional<std::string> result = evaluateOnce<std::string>( | ||
| "google_polyline_encode(transform(c0, x -> ST_GEOMETRYFROMTEXT(x)))", |
There was a problem hiding this comment.
Why not use google_polyline_encode(ST_GeometryFromText(c0)) directly? No need to use the transform function. Let's use ST_GeometryFromText also to align with the other usage (even though it is not case sensitive). Same with the other occurrences.
There was a problem hiding this comment.
ST_GeometryFromText do not seem to have an array overload and only returns single Geometry and encode functions requires Array<Geometry> which is why transform is used here and this was done in accordance with the existing pattern in the tests in the same file like in GEOMETRY_UNION, ST_LineString, ST_MultiPoint. The other alternative to possibly avoid transform fully that I could try on is to create the geometry array but it would need multiple evaluateOnce calls which seemed inefficient.
ST_GeometryFromText
FOLLY_ALWAYS_INLINE Status
call(out_type<Geometry>& result, const arg_type<Varchar>& wkt)
google_polyline_encode
call(out_type<Varchar>& result, const arg_type<Array<Geometry>>& points)
|
|
||
| const auto testCustomEncode = [&](const std::optional<std::vector< | ||
| std::optional<std::string>>>& points, | ||
| const int32_t precision, |
|
|
||
| const auto testCustomDecode = | ||
| [&](const std::optional<std::string>& encoded, | ||
| const int32_t precision, |
| expectedPoints) { | ||
| auto encodeVec = makeFlatVector<std::string>({encoded.value()}); | ||
| auto input = makeRowVector({encodeVec}); | ||
| auto output = evaluate( |
There was a problem hiding this comment.
Can we use evaluateOnce instead?
There was a problem hiding this comment.
The decode function returns an array of strings/points, but evaluateOnce<std::vectorstd::string> do not seem to work. When I tried to use it, it gave the compilation error
error: no type named 'NativeType' in 'facebook::velox::CppToType<std::vector<std::string>>'
28 | using Type = typename CppToType<T>::NativeType;
Also when I checked the existing test in such scenarios like ST_Points, ST_Geometries all seem to use the evaluate().
| auto arrayVec = makeNullableArrayVector<std::string>({{points}}); | ||
| auto precisionVec = makeFlatVector<int32_t>({precision}); | ||
| auto input = makeRowVector({arrayVec, precisionVec}); | ||
| std::optional<std::string> result = evaluateOnce<std::string>( |
There was a problem hiding this comment.
Loos like a lot of copy&paste for encode and decode versions. The only difference is the additional c1 columns.
Can precision be a nullopt as well from a function point of view (aka is NULL allowed as column value for precision?). If it is not allowed to be NULL we can use a single lambda with a std::optional for precision and if it is not set use the standard version, otherwise pass the precision.
There was a problem hiding this comment.
Java implementation and C++ implementation are both using primitive types that cannot be null and there seems to have No @Nullable annotation in Java and have combined the encode versions and decode versions together as advised.
| "POLYGON((-135 85, -45 85, 45 85, 135 85, -135 85))", 619.00E9); | ||
| } | ||
|
|
||
| TEST_F(GeometryFunctionsTest, testGooglePolylineFunctions) { |
There was a problem hiding this comment.
One more thing. Can we add a test that does "testDecode(testEncode(data))". It is a bit tricky with the current way the lambdas work because each function also validates. And also vice versa testEncode(testDecode).
There was a problem hiding this comment.
Hi Christian, Thanks for the review. Added bidirectional round-trip tests for testEncode(testDecode(encoded)) and testDecode(testEncode(points)) .Both test functions now support an optional compose parameter with default value as false. When compose=true, the function returns its result without validation, enabling composition. The outer function validates the complete round-trip, confirming encode and decode are inverse operations. Tested with precision values: default, 1, 6, and 16.
Making common impl private Test file for the new function Using similar points check for encoding as in presto side brace mismatch More test cases for validation wrt Java test scenarios Custom encode/decode tests Added additonal test scenarios from Java Added additonal test scenarios from Java
4b210fa to
b7359c4
Compare
b7359c4 to
64a88ae
Compare
64a88ae to
14daa74
Compare
Abstract
This PR adds the scalar function and unit testing for the function that implements google's polyline encoding and decoding logic based on the already available function implementation in presto as part of #16041