Skip to content

Glue serde avro to json deserialization includes namespaces and union types  #3237

Open
@Ronserruya

Description

@Ronserruya

Originally reported in #3224 , split into a separate issue following the discussion in #3235

When the glue serde deserializes to json from avro, it includes the record namespaces and types in the case of union. This is the first time I'm encountering the behaviour since the python deserializer or the one used in kafka-connect don't follow this behavior

Example:

Original msg:

{"name": {"first": "ron", "last": "serruya", "full": "ron serruya"}, "ids1": [5,6], "ids2": ["abc", 123]}

schema used:

{
  "type": "record",
  "name": "generation",
  "namespace": "top_level",
  "fields": [
    {
      "name": "name",
      "type": [
        {
          "type": "record",
          "name": "name",
          "namespace": "top_level.generation",
          "fields": [
            {
              "name": "raw",
              "type": [
                "string",
                "null"
              ]
            },
            {
              "name": "first",
              "type": "string"
            },
            {
              "name": "full",
              "type": "string"
            },
            {
              "name": "last",
              "type": ["string"]
            }
          ]
        },
        "null"
      ]
    },
    {
      "name": "ids1",
      "type": {"type": "array", "items": "int"}
    },
    {
      "name": "ids2",
      "type": {"type": "array", "items": ["string", "int"]}
    }
  ]
}

base64 encoded avro msg (just the msg, without the glue-related bytes at the start)
AAIGcm9uFnJvbiBzZXJydXlhAA5zZXJydXlhBAoMAAQABmFiYwL2AQA=

The current glue deserializer shows this msg as:

{
  "name": {
    "top_level.generation.name": {
      "raw": null,
      "first": "ron",
      "full": "ron serruya",
      "last": {
        "string": "serruya"
      }
    }
  },
  "ids1": [
    5,
    6
  ],
  "ids2": [
    {
      "string": "abc"
    },
    {
      "int": 123
    }
  ]
}

As you can see it adds string, int, or the record namespace top_level.generation.name

I fixed this issue locally by adding this line: encoder.setIncludeNamespace(false); in the avroRecordToJson method

But according to the comment in #3235 , that's not a completely valid fix since it can break other stuff?

Before and after the fix:
Screen Shot 2023-01-15 at 15 48 52
Screen Shot 2023-01-15 at 15 45 44

Metadata

Metadata

Assignees

Labels

area/serdeSerialization & Deserialization (plugins)scope/backendstatus/acceptedAn issue which has passed triage and has been acceptedtype/bugSomething isn't working

Type

No type

Projects

No projects

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions