Skip to content

core/filtermaps: hashdb safe delete range #31525

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 6 commits into from
Mar 31, 2025

Conversation

zsfelfoldi
Copy link
Contributor

This PR adds rawdb.SafeDeleteRange and uses it for range deletion in core/filtermaps. This includes deleting the old bloombits database, resetting the log index database and removing index data for unindexed tail epochs (which previously weren't properly implemented for the fallback case).
SafeDeleteRange either calls ethdb.DeleteRange if the node uses the new path based state scheme or uses an iterator based fallback method that safely skips trie nodes in the range if the old hash based state scheme is used. Note that ethdb.DeleteRange also has its own iterator based fallback implementation in ethdb/leveldb. If a path based state scheme is used and the backing db is pebble (as it is on the majority of new nodes) then rawdb.SafeDeleteRange uses the fast native range delete.
Also note that rawdb.SafeDeleteRange has different semantics from ethdb.DeleteRange, it does not automatically return if the operation takes a long time. Instead it receives a stopCallback that can interrupt the process if necessary. This is because in the safe mode potentially a lot of entries are iterated without being deleted (this is definitely the case when deleting the old bloombits database which has a single byte prefix) and therefore restarting the process every time a fixed number of entries have been iterated would result in a quadratic run time in the number of skipped entries.

When running in safe mode, unindexing an epoch takes about a second, removing bloombits takes around 10s while resetting a full log index might take a few minutes. If a range delete operation takes a significant amount of time then log messages are printed. Also, any range delete operation can be interrupted by shutdown (tail uinindexing can also be interrupted by head indexing, similarly to how tail indexing works). If the last unindexed epoch might have "dirty" index data left then the indexed map range points to the first valid epoch and cleanedEpochsBefore points to the previous, potentially dirty one. At startup it is always assumed that the epoch before the first fully indexed one might be dirty. New tail maps are never rendered and also no further maps are unindexed before the previous unindexing is properly cleaned up.

if stopCallback() {
return ErrDeleteRangeInterrupted
}
start = append(bytes.Clone(it.Key()), 0)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a bit strange to append the 0 here.

I think it's fine to construct the iterator using it.Key

  • if the entry is already deleted in the last round, the position of new iterator
    will be shifted automatically
  • if the entry is skipped for deletion, it's fine to reprocess it

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

append(key, 0) literally comes right next after key in a lexicographic ordering. Sure, it's not a big performance loss to start from key either but why not move to the next possible key here? I feel it's just nicer and makes it more obvious what we are doing here. I will add a comment for now but if I there are more downvotes then I will remove it :)

if err := db.DeleteRange(filterMapRowKey(mapRows.First(), false), filterMapRowKey(mapRows.AfterLast(), false)); err != nil {
log.Crit("Failed to delete range of filter map rows", "err", err)
}
func DeleteFilterMapRows(db ethdb.KeyValueStore, mapRows common.Range[uint64], hashDbSafe bool, stopCallback func() bool) error {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we do the normal deletion rather than using range deleter?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will just copy my response from discord here:
We could, I did consider it, rejected it in the end. The thing is that deleting everything by key takes a significant amount of time. Also, the rows are stored in a way that in most cases 32 adjacent rows are stored under a single key ("base layer" row data) while a few rows have larger amount of data related to higher mapping layers, addressed under individual keys. So if we do not iterate, just delete all keys where there can be data then we have to delete 10x-30x more keys than the actually existing number. Now the epoch delete takes 1-2 seconds, with blind delete it would take significantly more.
Furthermore, it cannot be done atomically anyways, both because of batch memory req and also just cannot block the indexer for very long. The whole process should be easily and quickly interruptable. Now we could do the "blind delete" approach in multiple batches, remembering where we aborted last time, but then we are at a more complex solution that is also slower.
Also, in the most typical case (pebble db and path based state) we can use native DeleteRange and I didn't want to lose this advantage, so it's another reason why it's easier to use the range delete approach in the fallback case too.

if err := db.DeleteRange(filterMapLastBlockKey(maps.First()), filterMapLastBlockKey(maps.AfterLast())); err != nil {
log.Crit("Failed to delete range of filter map last block pointers", "err", err)
}
func DeleteFilterMapLastBlocks(db ethdb.KeyValueStore, maps common.Range[uint32], hashDbSafe bool, stopCallback func() bool) error {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we do the normal deletion rather than using range deleter?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In case of map and block pointers of course a normal delete loop would be fine too, actually it does not matter so much performance wise since 99% of the data is the filter rows. But the thing is that for the rows it does make a lot of sense to use SafeDeleteRange and once we do that for the rows, it also makes sense to not have any different logic for the pointers. If native DeleteRange can be used then it's faster, if the fallback has to be used then the same interrupt/clean up later logic works here too. It's just less extra logic.

if err := db.DeleteRange(filterMapBlockLVKey(blocks.First()), filterMapBlockLVKey(blocks.AfterLast())); err != nil {
log.Crit("Failed to delete range of block log value pointers", "err", err)
}
func DeleteBlockLvPointers(db ethdb.KeyValueStore, blocks common.Range[uint64], hashDbSafe bool, stopCallback func() bool) error {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we do the normal deletion rather than using range deleter?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See above.

logPrinted bool
lastLogPrinted = start
)
switch err := deleteFn(f.db, f.hashScheme, func() bool {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's very weird to print logs if there are nothing to delete,

e.g.

INFO [03-31|15:07:25.452] Removing old bloom bits database in progress... elapsed=17.430236ms
INFO [03-31|15:07:25.644] Removing old bloom bits database finished elapsed=209.842669ms


These two logs are printed even the legacy bloom data are not existent

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

@zsfelfoldi
Copy link
Contributor Author

Moved ErrTooManyKeys to package ethdb because the linter failed on the core/rawdb -> ethdb/leveldb import.

f.filterMapCache.Remove(mapIndex)
}
if err := rawdb.DeleteFilterMapLastBlocks(f.db, common.NewRange(firstMap, f.mapsPerEpoch-1), hashScheme, stopCb); err != nil { // keep last enrty
return err
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why the counter is f.mapsPerEpoch-1? Shouldn't it be f.mapsPerEpoch?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

comment says // keep last enrty

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if this is intended, it might be easier to see if the delete range is assigned into a variable on its own line

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it is intentional, last entry of each epoch should always be available. It's even added for past epochs based on the checkpoints.

for mapIndex := firstMap; mapIndex < firstMap+f.mapsPerEpoch-1; mapIndex++ {
f.lastBlockCache.Remove(mapIndex)
}
if err := rawdb.DeleteBlockLvPointers(f.db, common.NewRange(firstBlock, lastBlock-firstBlock), hashScheme, stopCb); err != nil { // keep last enrty
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't the counter be lastBlock-firstblock+1?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again, intentional, keeping the last entry.

@fjl fjl merged commit 14d576c into ethereum:master Mar 31, 2025
3 of 4 checks passed
@fjl fjl added this to the 1.15.7 milestone Mar 31, 2025
thinkAfCod pushed a commit to thinkAfCod/go-ethereum that referenced this pull request Apr 2, 2025
This PR adds `rawdb.SafeDeleteRange` and uses it for range deletion in
`core/filtermaps`. This includes deleting the old bloombits database,
resetting the log index database and removing index data for unindexed
tail epochs (which previously weren't properly implemented for the
fallback case).
`SafeDeleteRange` either calls `ethdb.DeleteRange` if the node uses the
new path based state scheme or uses an iterator based fallback method
that safely skips trie nodes in the range if the old hash based state
scheme is used. Note that `ethdb.DeleteRange` also has its own iterator
based fallback implementation in `ethdb/leveldb`. If a path based state
scheme is used and the backing db is pebble (as it is on the majority of
new nodes) then `rawdb.SafeDeleteRange` uses the fast native range
delete.
Also note that `rawdb.SafeDeleteRange` has different semantics from
`ethdb.DeleteRange`, it does not automatically return if the operation
takes a long time. Instead it receives a `stopCallback` that can
interrupt the process if necessary. This is because in the safe mode
potentially a lot of entries are iterated without being deleted (this is
definitely the case when deleting the old bloombits database which has a
single byte prefix) and therefore restarting the process every time a
fixed number of entries have been iterated would result in a quadratic
run time in the number of skipped entries.

When running in safe mode, unindexing an epoch takes about a second,
removing bloombits takes around 10s while resetting a full log index
might take a few minutes. If a range delete operation takes a
significant amount of time then log messages are printed. Also, any
range delete operation can be interrupted by shutdown (tail uinindexing
can also be interrupted by head indexing, similarly to how tail indexing
works). If the last unindexed epoch might have "dirty" index data left
then the indexed map range points to the first valid epoch and
`cleanedEpochsBefore` points to the previous, potentially dirty one. At
startup it is always assumed that the epoch before the first fully
indexed one might be dirty. New tail maps are never rendered and also no
further maps are unindexed before the previous unindexing is properly
cleaned up.

---------

Co-authored-by: Gary Rong <[email protected]>
Co-authored-by: Felix Lange <[email protected]>
sivaratrisrinivas pushed a commit to sivaratrisrinivas/go-ethereum that referenced this pull request Apr 21, 2025
This PR adds `rawdb.SafeDeleteRange` and uses it for range deletion in
`core/filtermaps`. This includes deleting the old bloombits database,
resetting the log index database and removing index data for unindexed
tail epochs (which previously weren't properly implemented for the
fallback case).
`SafeDeleteRange` either calls `ethdb.DeleteRange` if the node uses the
new path based state scheme or uses an iterator based fallback method
that safely skips trie nodes in the range if the old hash based state
scheme is used. Note that `ethdb.DeleteRange` also has its own iterator
based fallback implementation in `ethdb/leveldb`. If a path based state
scheme is used and the backing db is pebble (as it is on the majority of
new nodes) then `rawdb.SafeDeleteRange` uses the fast native range
delete.
Also note that `rawdb.SafeDeleteRange` has different semantics from
`ethdb.DeleteRange`, it does not automatically return if the operation
takes a long time. Instead it receives a `stopCallback` that can
interrupt the process if necessary. This is because in the safe mode
potentially a lot of entries are iterated without being deleted (this is
definitely the case when deleting the old bloombits database which has a
single byte prefix) and therefore restarting the process every time a
fixed number of entries have been iterated would result in a quadratic
run time in the number of skipped entries.

When running in safe mode, unindexing an epoch takes about a second,
removing bloombits takes around 10s while resetting a full log index
might take a few minutes. If a range delete operation takes a
significant amount of time then log messages are printed. Also, any
range delete operation can be interrupted by shutdown (tail uinindexing
can also be interrupted by head indexing, similarly to how tail indexing
works). If the last unindexed epoch might have "dirty" index data left
then the indexed map range points to the first valid epoch and
`cleanedEpochsBefore` points to the previous, potentially dirty one. At
startup it is always assumed that the epoch before the first fully
indexed one might be dirty. New tail maps are never rendered and also no
further maps are unindexed before the previous unindexing is properly
cleaned up.

---------

Co-authored-by: Gary Rong <[email protected]>
Co-authored-by: Felix Lange <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants