-
Notifications
You must be signed in to change notification settings - Fork 2.5k
[SparkMicroBatchStream] Executors prematurely close I/O client during Spark broadcast cleanup #12858
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
@bk-mz is this issue happening for SparkMicroBatchStream or anything using SerializableTableWithSize ? |
@singhpk234 Hey. This happens now only in iceberg->iceberg and only during high memory usage and high load. I.e. small or moderate loads does not cause it, or maybe i wasn't able to find exact condition that triggers that closable. So when it triggers, all is dead. Note: we got a decent number of iceberg applications, most of them kafka->iceberg streaming. There's no issue there, even with high-load. |
I see this was ideally intended to be called when done processing at executor end, If GC is triggering this we need a way to protect this from these case if we are still need it (5a98aef) is there a way to ensure that the io() is eventually closed ?
Is this share between tasks or executors ? do we need to implement reference count ? before we called close of IO ? |
b/w tasks.
TBH I dunno -> the matter looks quite complicated to have simple solution. Just commenting io.close works and it seems there's no distinctive memory leak. Though yeah, just commenting this out is not a proper solution. |
Hi @singhpk234, after looking into the code, I believe the issue is with the underlying HTTP connection pool being shared. This #12868 should fix the problem. Could you help take a look? Thanks! |
@xiaoxuandev correct, this for sure will fix it. |
Apache Iceberg version
1.8.1 (latest release)
Query engine
Spark
Please describe the bug 🐞
Affected Code
In
SerializableTableWithSize
(andSerializableMetadataTableWithSize
),serializationMarker
istransient
. Deserialized copies on executors see it asnull
and, when Spark triggers broadcast cleanup, callio().close()
.Observed Behavior
SerializableTableWithSize.close()
on an executor.Workaround
Disable executor‐side
io().close()
:Logs
Willingness to contribute
The text was updated successfully, but these errors were encountered: