Skip to content

Fetch writer schema to decode Avro messages #119

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged

Conversation

BewareMyPower
Copy link
Contributor

Fixes #108

Motivation

Currently the Python client uses the reader schema, which is the schema of the consumer, to decode Avro messages. However, when the writer schema is different from the reader schema, the decode will fail.

Modifications

Add attach_client method to Schema and call it when creating consumers and readers. This method stores a reference to a _pulsar.Client instance, which leverages the C++ APIs added in apache/pulsar-client-cpp#257 to fetch schema info. The AvroSchema class fetches and caches the writer schema if it is not cached, then use both the writer schema and reader schema to decode messages.

Add test_schema_evolve to test consumers or readers can decode any message whose writer schema is different with the reader schema.

Fixes apache#108

### Motivation

Currently the Python client uses the reader schema, which is the schema
of the consumer, to decode Avro messages. However, when the writer
schema is different from the reader schema, the decode will fail.

### Modifications

Add `attach_client` method to `Schema` and call it when creating
consumers and readers. This method stores a reference to a
`_pulsar.Client` instance, which leverages the C++ APIs added in
apache/pulsar-client-cpp#257 to fetch schema
info. The `AvroSchema` class fetches and caches the writer schema if it
is not cached, then use both the writer schema and reader schema to
decode messages.

Add `test_schema_evolve` to test consumers or readers can decode
any message whose writer schema is different with the reader schema.
@BewareMyPower BewareMyPower added the enhancement New feature or request label May 22, 2023
@BewareMyPower BewareMyPower added this to the 3.2.0 milestone May 22, 2023
@BewareMyPower BewareMyPower self-assigned this May 22, 2023
@shibd
Copy link
Member

shibd commented May 24, 2023

Use this patch. Although flowing define will create two schemas, that's okay, right? It will use write schema of writing that message to deserialize the data.

class User(Record):
    name = String()
    age = Integer()
    @AllArgsConstructor
    @Getter
    static class User {
        private final String name;
        private final int age;
    }

Do we need to continue to solve this problem?
#108 (comment)

@BewareMyPower
Copy link
Contributor Author

BewareMyPower commented May 24, 2023

Use this patch. Although flowing define will create two schemas, that's okay, right? It will use write schema of writing that message to deserialize the data.

Yes, it will create two schemas. But modifying the _sorted_fields and _required fields will cause breaking changes. If we have ways to avoid the breaking changes, maybe we don't need to make these changes. Or we can make the changes in the next release after the discussion in the mail list.

@shibd shibd merged commit d2fac8f into apache:main May 25, 2023
@BewareMyPower BewareMyPower deleted the bewaremypower/writer-schema-download branch May 25, 2023 02:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Python Avro consumer cannot consume non-union fields
2 participants