-
Notifications
You must be signed in to change notification settings - Fork 205
[Coral-Schema] default value lowercasing #560
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
[Coral-Schema] default value lowercasing #560
Conversation
aastha25
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks for picking this up! seems like an interesting bug!
| * @return The default value with all field names lowercased | ||
| */ | ||
| @SuppressWarnings("unchecked") | ||
| private Object lowercaseDefaultValue(Object defaultValue, Schema schema) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: rename the arguments to (fieldDefaultValue, fieldSchema) to capture that this is a per field level method.
| // Handle union types - get the actual schema based on the default value type | ||
| if (schema.getType() == Schema.Type.UNION) { | ||
| // For unions, the default value corresponds to the first type in the union | ||
| actualSchema = schema.getTypes().get(0); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this assumption is not true. we 'expect' it to be the first type, but it can either.
There's actually some utility method in coral-schema which is specifically designed to fetch the 'underlying schema' from the union type - that can be used here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is actually according to Avro specifications: https://avro.apache.org/docs/1.11.2/specification/
under "Unions":
(Note that when a default value is specified for a record field whose type is a union, the type of the default value must match the first element of the union. Thus, for unions containing “null”, the “null” is usually listed first, since the default value of such unions is typically null.)
I'll update it to at least handle nullable defaults though
|
|
||
| case MAP: | ||
| // For maps, lowercase the keys and recursively process values | ||
| if (defaultValue instanceof Map) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why do we need this if condition if its already inside the CASE? can this be simplified here and other places?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the case statement is checking the case according to the schema, but this is just to make sure that the defaultValue is actually in accordance to the schema
| Object lowercasedValue = lowercaseDefaultValue(entry.getValue(), valueSchema); | ||
| lowercasedMap.put(lowercasedKey, lowercasedValue); | ||
| } | ||
| return lowercasedMap; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if the default value was [null, map], are we returning map by any chance? where's the logic to construct the union schema back? let's check this for other types too and ensure the spec of this method is returning the same schema as the input, just lower-cased.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if the default value was [null, map]
do you mean "if the field type was [null, map]"?
According to avro specs, in this case the default value SHOULD be null, but I'm going to add something under case UNION so that it can still parse properly if the default value is the non-null type. However, I'm not going to implement the complex union version of it
Not sure what you mean by "where's the logic to construct the union schema back", we aren't constructing any schemas, this is just using the schema to figure out how to parse the default values properly so the relevant parts of the default values can be lowercased
| } | ||
|
|
||
| switch (actualSchema.getType()) { | ||
| case RECORD: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do we need the handling for complex union also? I understand we're stripping away the nulls for the union, but we could still have the a nullable- complex union?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
for complex unions, we rely on the fact that the schema supplied is valid according to specs, see #560 (comment)
b2fe0f7 to
db4a5ee
Compare
What changes are proposed in this pull request, and why are they necessary?
This PR fixes a bug in
ToLowercaseSchemaVisitorwhere field names within complex default values were not being lowercased, causing schema validation errors when creating fields with lowercased schemas.Given an input schema with a record field that has a default value:
Before the fix:
Result:
AvroTypeException: Invalid default for field struct_fieldAfter the fix:
The Solution:
Added
lowercaseDefaultValue()method that recursively transforms default values based on the schema type:The implementation handles both
GenericData.RecordandMap-based default value representationsHow was this patch tested?
Added a new test
testLowercaseSchemaWithComplexDefaultValues()inSchemaUtilitiesTeststhat verifies the fix handles:The test uses input/expected schema files:
testLowercaseSchemaWithDefaultValues-input.avsc- Contains fields with various default values using mixed casetestLowercaseSchemaWithDefaultValues-expected.avsc- Contains the fully lowercased expected outputAll existing tests continue to pass, confirming backward compatibility.