-
Notifications
You must be signed in to change notification settings - Fork 2.5k
SparkExecutorCache causes slowness of RewriteDataFilesSparkAction #11648
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
@davseitsev I think this is a good topic to bring up on the Dev mailing list to reach a broader audience and facilitate a discussion |
@davseitsev can you please also try setting to true ?
please ref : #9563 (comment) Also for this :
we can increase this via : iceberg/spark/v3.5/spark/src/main/java/org/apache/iceberg/spark/SparkSQLProperties.java Line 86 in f356087
Do you have memory profiles / heap dump for it ? 128 MB is decent, what I am a bit skeptical of is it's just that delete from diff partitions landed in same executor so above might help as to process it, as cache hit would minimized. as @nastra recommended good discussion on mailing chain. |
Having as much details as possible about the state of the table would help (e.g. the total file size, the total record count, file-scoped vs partition-scoped deletes, the total number of delete files, the number of delete files per partition). |
Unfortunately, we haven't heard back. That said, I may have a guess. I believe it is related to the connection pool we use for reading deletes. The rewrite action submits multiple actions at the same time and they may use the same connection pool. Overall, I agree we shouldn't be using the executor cache for this action as each partition is rewritten separately. We need a config to disable executor cache for deletes and the action should automatically set it. |
I will work on this. I will create a config for disabling executor cache for deletes. Thanks for the bug report and inputs. |
Apache Iceberg version
1.7.0 (latest release)
Query engine
Spark
Please describe the bug 🐞
We have a scheduled Spark maintenance jobs which runs all necessary Spark actions to keep our data lake clean and healthy. Recently we have upgraded iceberg from 1.1.0 to 1.7.0 and enabled deletes compaction.
We noticed that data compaction of tables with position deletes is really slow and it blocks almost all other data compaction jobs on the cluster. And here slow means it almost never ends, you can wait a few hours and se zero progress.
Investigation showed that there are some threads on Spark workers which consumes all the CPU called

iceberg-delete-worker-pool-%
. Here is flame graph:Thread dump example:
After some research I figured out that there is a cache of delete files which loads the whole delete file even if you need position deletes only for single data file. As far as I understand in our case the size of the cache is really small and it's constantly evicted. It dramatically slows down reading deletes. When I turned it off
spark.sql.iceberg.executor-cache.enabled=false
, jobs which ran for a few hours without progress started to finish in about 1 minute.I don't see any benefit from having this cache for RewriteDataFilesSparkAction and I suggest to disable it by default for this action.
Willingness to contribute
The text was updated successfully, but these errors were encountered: