Skip to content

Conversation

@antongrbin
Copy link

@antongrbin antongrbin commented Jun 20, 2022

Motivation

Tech doc:
https://docs.google.com/document/d/1p5LUSTWrVBcT9F2wXBgHBDT3OJyOm5BpXAyTpwwDVts/edit#

Changes

In this PR we actually fix the Python protobuf library not to break when unknown enum value is encountered if ignoreUnknownFields is true.

The design principles for the change were:

  • make the change as small as possible, so it's easy to keep up to date with the upstream
  • if one element is unknown in an array of elements, other elements should still parse (see unit tests)

Changes:

  • Change _ConvertScalarFieldValue so it throws a special exception when unknown enum string value is encountered.
  • Wrap every invocation of _ConvertScalarFieldValue to optionally ignore the new exception thrown.
  • Add unit tests

To write this PR, I reached out for input on #python-club on slack

Tested

Executed the test plan from here: #1 to test this within the protobuf repo.

Also tested as part of noom-contracts repo: https://github.com/noom/noom-contracts/pull/293


def testParseUnknownEnumStringValueRepeatedProto3(self):
message = json_format_proto3_pb2.TestMessage()
text = '{"repeatedEnumValue": ["UNKNOWN_STRING_VALUE", "FOO", "BAR"]}'

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't an array of enums banned?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good question :)
We banned the array of enums in noom-contracts to avoid people getting confused with this behavior. The array length is 2 although there were 3 fields (one of them unknown).

In this repository (protobuf) I am still testing that the behavior that we have with repeated enums is consistent with the behavior in the binary serialization (regardless of the fact that the behavior is confusing).

I'll add the comment in the code.


def testParseUnknownEnumStringValueProto2(self):
message = json_format_pb2.TestNumbers()
text = '{"a": "UNKNOWN_STRING_VALUE"}'

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

curious why the key is a here instead of enumValue like in other tests?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch, I verified that this is OK and added comment.

    # Note that in src/google/protobuf/util/json_format.proto::TestNumbers
    # has a field 'optional MyType a = 1', which is different
    # from the test below that are using
    # src/google/protobuf/util/json_format_proto3.proto::TestMessage.

"""Thrown in case of parsing error."""


class _UnknownEnumStringValueParseError(ParseError):

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this class inherit from ParseError (it looks like it does)? So we don't break code that tries to catch ParseError

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Correct. Added a comment about that.

Although I think the implementation will never leak _UnknownEnumStringValueParseError, it still gets renamed to ParseError in _ConvertFieldValuePair.. but I wouldn't rely on that, the code is very hard to follow :) It's safer if I just inherit and then I know I didn't change any public interface

'for enum type protobuf_unittest.TestAllTypes.NestedEnum.',
json_format.Parse, '{"optionalNestedEnum": 12345}', message)

def testParseUnknownEnumStringValueProto3(self):

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These tests only test deserialisation. Since we are modifying code that can also set values, should we add tests for serialisation as well?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am unaware that I am changing the code that sets values.

The code changes are in the following functions:

_ConvertScalarFieldValue
_ConvertStructMessage
_ConvertWrapperMessage
_ConvertMapFieldValue

These functions are called from the following functions transitively:
_ConvertFieldValuePair
Parser::ConvertMessage
_ConvertAnyMessage
ParseDict

This is only parsing of JSON, I never change serializing to JSON or setting the values.
Did I miss something?

Copy link
Author

@antongrbin antongrbin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the review!


def testParseUnknownEnumStringValueProto2(self):
message = json_format_pb2.TestNumbers()
text = '{"a": "UNKNOWN_STRING_VALUE"}'
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch, I verified that this is OK and added comment.

    # Note that in src/google/protobuf/util/json_format.proto::TestNumbers
    # has a field 'optional MyType a = 1', which is different
    # from the test below that are using
    # src/google/protobuf/util/json_format_proto3.proto::TestMessage.


def testParseUnknownEnumStringValueRepeatedProto3(self):
message = json_format_proto3_pb2.TestMessage()
text = '{"repeatedEnumValue": ["UNKNOWN_STRING_VALUE", "FOO", "BAR"]}'
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good question :)
We banned the array of enums in noom-contracts to avoid people getting confused with this behavior. The array length is 2 although there were 3 fields (one of them unknown).

In this repository (protobuf) I am still testing that the behavior that we have with repeated enums is consistent with the behavior in the binary serialization (regardless of the fact that the behavior is confusing).

I'll add the comment in the code.

'for enum type protobuf_unittest.TestAllTypes.NestedEnum.',
json_format.Parse, '{"optionalNestedEnum": 12345}', message)

def testParseUnknownEnumStringValueProto3(self):
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am unaware that I am changing the code that sets values.

The code changes are in the following functions:

_ConvertScalarFieldValue
_ConvertStructMessage
_ConvertWrapperMessage
_ConvertMapFieldValue

These functions are called from the following functions transitively:
_ConvertFieldValuePair
Parser::ConvertMessage
_ConvertAnyMessage
ParseDict

This is only parsing of JSON, I never change serializing to JSON or setting the values.
Did I miss something?

"""Thrown in case of parsing error."""


class _UnknownEnumStringValueParseError(ParseError):
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Correct. Added a comment about that.

Although I think the implementation will never leak _UnknownEnumStringValueParseError, it still gets renamed to ParseError in _ConvertFieldValuePair.. but I wouldn't rely on that, the code is very hard to follow :) It's safer if I just inherit and then I know I didn't change any public interface

@antongrbin antongrbin merged commit 12610cd into json_unknown_enum_fork Jun 21, 2022
antongrbin pushed a commit that referenced this pull request Apr 25, 2023
We would like for upb_Map_Delete() to optionally return the deleted value.
Unfortunately this will require several steps since we are crossing repos.
Step #4: advance the upb version used by protobuf and update PHP/Ruby accordingly.

PiperOrigin-RevId: 498426185
antongrbin pushed a commit that referenced this pull request Jan 10, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants