-
Notifications
You must be signed in to change notification settings - Fork 20.8k
deadlock issue of v1.15.6 on eth_getLogs for filter & thus shutdown corruption #31700
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
other calls are fine without slowdown |
The range of block is
|
+1 |
1 similar comment
+1 |
think this one might be important |
+1 is this issue releated to #31589 ? |
Is the timeout a repeatable issue, i.e. does it always occur with the same query? |
definitely reproducible , i can see it on 4 different nodes with different filters (even change to single block range ,or a lower traffic address) , it stucks easily feel like some deadlocks thing May be not first appearing on 1.15.9 (our upgrade path was alltools-v1.14.13 -> alltools-v1.15.9 docker) Sounds similar to #31589 mentioned |
@jun0tpyrc
We can get the result instantly for this query:
|
it just being same as your above one , on different geth api provider (checked by rpc method) or erigon we could also get the [] (empty result ) instantly but not for the geth nodes we have now Furthermore I just found shutdown peacefully becomes a problem on this version - took like 20min and stuck without the final peaceful
main running flag
|
1.15.9 cannot shut down normally (after 25min forcefully stop) This sounds be |
@jun0tpyrc Could you do something for us please. While your query is hanging, please attach to Geth via console ( |
a almost 20MB one this was while running
|
Thank you! |
Thanks for the detailed report and the stack dump, I found the issue. A deadlock indeed, happening when a search is happening while an old part of the chain gets unindexed. Also responsible for the shutdown issue. |
these are non-archive node with old leveldb datadir |
Thanks, leveldb does cause the history unindexing to run in a fallback mode that wasn't tested properly and interfered with a log search. This PR should fix the issue: #31704 |
This PR fixes a deadlock situation is deleteTailEpoch that might arise when range delete is running in iterator based fallback mode (either using leveldb database or the hashdb state storage scheme). In this case a stopCb callback is called periodically that does check events, including matcher sync requests, in which case it tries to acquire indexLock for read access, while deleteTailEpoch already held it for write access. This pull request removes the indexLock acquiring in `FilterMapsMatcherBackend.synced` as this function is only called in the indexLoop. Fixes #31700
System information
Expected behaviour
can answer in reasonable time
Actual behaviour
slow ,timeout for 1minute, the same call can finish on geth 1.15.3 on some api provider / erigon3.0.2 returning immediately
Steps to reproduce the behaviour
Backtrace
When submitting logs: please submit them as text and not screenshots.
The text was updated successfully, but these errors were encountered: