Description
Originally reported in #3224 , split into a separate issue following the discussion in #3235
When the glue serde deserializes to json from avro, it includes the record namespaces and types in the case of union. This is the first time I'm encountering the behaviour since the python deserializer or the one used in kafka-connect don't follow this behavior
Example:
Original msg:
{"name": {"first": "ron", "last": "serruya", "full": "ron serruya"}, "ids1": [5,6], "ids2": ["abc", 123]}
schema used:
{
"type": "record",
"name": "generation",
"namespace": "top_level",
"fields": [
{
"name": "name",
"type": [
{
"type": "record",
"name": "name",
"namespace": "top_level.generation",
"fields": [
{
"name": "raw",
"type": [
"string",
"null"
]
},
{
"name": "first",
"type": "string"
},
{
"name": "full",
"type": "string"
},
{
"name": "last",
"type": ["string"]
}
]
},
"null"
]
},
{
"name": "ids1",
"type": {"type": "array", "items": "int"}
},
{
"name": "ids2",
"type": {"type": "array", "items": ["string", "int"]}
}
]
}
base64 encoded avro msg (just the msg, without the glue-related bytes at the start)
AAIGcm9uFnJvbiBzZXJydXlhAA5zZXJydXlhBAoMAAQABmFiYwL2AQA=
The current glue deserializer shows this msg as:
{
"name": {
"top_level.generation.name": {
"raw": null,
"first": "ron",
"full": "ron serruya",
"last": {
"string": "serruya"
}
}
},
"ids1": [
5,
6
],
"ids2": [
{
"string": "abc"
},
{
"int": 123
}
]
}
As you can see it adds string
, int
, or the record namespace top_level.generation.name
I fixed this issue locally by adding this line: encoder.setIncludeNamespace(false);
in the avroRecordToJson method
But according to the comment in #3235 , that's not a completely valid fix since it can break other stuff?