Description
EDIT: When deployed on August 24, 2022, the PR reduced peak RAM use by over 200GB (out of over 300GB total reduction). Initial estimate of -150GB was based on old checkpoint file. By August, checkpoint file grew substantially so memory savings were better. Duration is about 16 minutes today (Sep 7), it was 46-58 minutes in mid-August, and it was 11-17 hours in Dec 2021 depending on system load.
Problem
Recent increase in transactions is causing WAL files to get created more frequently, causing checkpoints to happen more frequently, increasing checkpoint file size, and increasing ledger state size in memory. These increases are causing checkpointing to consume too much RAM and take more than 2x longer than earlier this year.
File Size | Checkpoint Frequency | |
---|---|---|
Early 2022 | 53 GB | 0-2 times per day |
July 8, 2022 | 126 GB | every 2 hours |
Without PR #1944 the system checkpointing would currently be:
- taking well over 20-30 hours each time, making it impossible to complete every 2 hours
- requiring more operational RAM, making OOM crashes very frequent
- creating billions more allocations and gc pressure, consuming CPU cycles and slowing down EN
After PR #1944 reduced Mtrie flattening and serialization phase to under 5 minutes (which sometimes took 17 hours on mainnet16), creating a separate MTrie state currently accounts for most of the duration and memory used by checkpointing. This opens up new possibilities such as reusing ledger state to significantly reduce duration and operational RAM of checkpointing again.
Updates epic #1744
The Proposed Solution
We can avoid creating a separate MTrie state during checkpoint creation. This can reduce peak RAM use by (very roughly) about 150GB and reduce checkpoint duration by 24 minutes (estimates based on snapshot of July 8, 2022). Memory savings will increase over time.
Determine if it's feasible to avoid creating a separate MTrie state during checkpoint creation. If the poof-of-concept doesn't reveal showstoppers then proceed with new PR.
-
Proof-of-concept [EN Performance] [POC] Reduce operational RAM by 152+ GB and checkpoint duration by 24 mins by reusing ledger state #2770 - show it is feasible to avoid creating separate MTrie state.
-
PR [EN Performance] Reuse ledger state for about -200GB peak RAM, -160GB disk i/o, and about -32 minutes duration #2792 This has fewer edge cases to handle than PR 2770 and looser coupling between layers but may require extra locks compared.