-
Notifications
You must be signed in to change notification settings - Fork 3.6k
[fix][broker]fix memory leak, messages lost, incorrect replication state if using multiple versions schema #24178
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
[fix][broker]fix memory leak, messages lost, incorrect replication state if using multiple versions schema #24178
Conversation
/pulsarbot rerun-failure-checks |
pulsar-client/src/main/java/org/apache/pulsar/client/impl/GeoReplicationProducerImpl.java
Outdated
Show resolved
Hide resolved
pulsar-client/src/main/java/org/apache/pulsar/client/impl/ProducerImpl.java
Outdated
Show resolved
Hide resolved
pulsar-client/src/main/java/org/apache/pulsar/client/impl/GeoReplicationProducerImpl.java
Outdated
Show resolved
Hide resolved
pulsar-client/src/main/java/org/apache/pulsar/client/impl/ProducerImpl.java
Outdated
Show resolved
Hide resolved
pulsar-client/src/main/java/org/apache/pulsar/client/impl/ProducerImpl.java
Outdated
Show resolved
Hide resolved
pulsar-client/src/main/java/org/apache/pulsar/client/impl/ProducerImpl.java
Outdated
Show resolved
Hide resolved
pulsar-client/src/main/java/org/apache/pulsar/client/impl/GeoReplicationProducerImpl.java
Outdated
Show resolved
Hide resolved
…ate if using multiple versions schema
…ucerImpl.java Co-authored-by: Lari Hotari <[email protected]>
…ucerImpl.java Co-authored-by: Lari Hotari <[email protected]>
…ucerImpl.java Co-authored-by: Lari Hotari <[email protected]>
…eplicationProducerImpl.java Co-authored-by: Lari Hotari <[email protected]>
af91444
to
30a488c
Compare
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #24178 +/- ##
============================================
+ Coverage 73.57% 74.28% +0.70%
+ Complexity 32624 32161 -463
============================================
Files 1877 1865 -12
Lines 139502 144644 +5142
Branches 15299 16519 +1220
============================================
+ Hits 102638 107443 +4805
+ Misses 28908 28731 -177
- Partials 7956 8470 +514
Flags with carried forward coverage won't be shown. Click here to find out more.
🚀 New features to boost your workflow:
|
@@ -200,12 +200,13 @@ public static String getStringSchemaVersion(byte[] schemaVersionBytes) { | |||
* @param schemaInfo the schema info | |||
* @return the jsonified schema info | |||
*/ | |||
public static String jsonifySchemaInfo(SchemaInfo schemaInfo) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since there might be some external usage of this method, it could be useful to add a method with the original method signature for backwards compatibility with the prettyPrinting argument set to true
.
@@ -117,6 +117,6 @@ public SchemaHash getSchemaHash() { | |||
|
|||
@Override | |||
public String toString() { | |||
return SchemaUtils.jsonifySchemaInfo(this); | |||
return SchemaUtils.jsonifySchemaInfo(this, false); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is it necessary to change the behavior in this PR? It seems that previously the output was pretty printed, but this changes it to be without pretty printing.
@@ -327,7 +327,7 @@ public KeyValue<SchemaInfo, SchemaInfo> decodeKeyValueSchemaInfo(SchemaInfo sche | |||
* @return the jsonified schema info | |||
*/ | |||
public String jsonifySchemaInfo(SchemaInfo schemaInfo) { | |||
return SchemaUtils.jsonifySchemaInfo(schemaInfo); | |||
return SchemaUtils.jsonifySchemaInfo(schemaInfo, false); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is it necessary to change the behavior in this PR? It seems that previously the output was pretty printed, but this changes it to be without pretty printing.
Motivation
Issue 1: memory leak when publishing messages with a broken schema
testBrokenSchema
.Issue 2: incorrect replication/producer state
Ready -> RegisteringSchema
.replication.connected
orproducer.connected
showsfalse
testProducerConnectStateWhenRegisteringSchema
Issue 3: replication lost messages or is out of order
time / task
internal producer of replicator
msg1
with a compatible schemamsg2
with a incompatible schemamsg1
with a compatible schemaResult:
msg1
was sentmsg2
was discarded due to incompatible schemamsg3
was sentIssue 4: Reused a recycled SendCallback, which causes a dangerous issue
time / task
client: publish with a broken schema
broker-side: handle schema
broker-side: disconnect
client: close producer
metadata store threads
orBookie client threads
Broken schema
op
is still inproducer.pendingMessages
[1]op.callback
was recycled`client-side
: disconnectedfailPendingMessages
, which provides a failed callback for all pending messagesAt
step 10
, the producer will call a failed callback on a recycledSendCallback
which has been recycled atstep 8
, but the objectSendCallback
may be used by others, which will cause unexpected and dangenrous issues.[1] https://github.com/apache/pulsar/blob/master/pulsar-client/src/main/java/org/apache/pulsar/client/impl/ProducerImpl.java#L894
You can reproduce
issues 3 and 4
with the new testtestIncompatibleMultiVersionSchema
, and you will get following various errors, but the test is not in order to reproduce a special case.Modifications
Documentation
doc
doc-required
doc-not-needed
doc-complete
Matching PR in forked repository
PR in forked repository: x