Description
Discussion thread: https://lists.apache.org/thread/j92bzsby9n2ozc9gcw5psgcy2026l1wm
Motivation
The cursor data is managed by ZooKeeper/etcd metadata store. When cursor data becomes more and more, the data size will increase and will take a lot of time to pull the data. Therefore, it is necessary to add compression for the cursor, which can reduce the size of data and reduce the time of pulling data.
Goal
Support use the LZ4/ZLIB/ZSTD/SNAPPY to compress the ManagedCursorInfo.
Implementation
CursorInfo compression format
[MAGIC_NUMBER] + [METADATA_SIZE] + [METADATA_PAYLOAD] + [MANAGED_CURSOR_INFO_PAYLOAD]
-
MAGIC_NUMBER
Use 0x4778, it is the same as the magic number of ledger info. -
METADATA
Add a namedManagedCursorInfoMetadata
message toMLDataFormats.proto
message ManagedCursorInfoMetadata {
required CompressionType compressionType = 1;
required int32 uncompressedSize = 2;
}
CursorInfo compression and decompression design
Currently, these compressions types have been defined and implemented by Pulsar, we only need to deal with compression and decompression of the ManagedCursorInfo
data:
-
Get CursorInfo from the metadata store
We will check the cursor data header, if it is compressed, we will parse the bytes data by compressed format, otherwise we will parse the cursor data directly by the original way. -
Add/Update CursorInfo to the metadata store
The default is to use compression if the compression type is specified, otherwise we will put this data to the metadata store directly.
CursorInfo compression type configuration
Add managedCursorInfoCompressionType
in org.apache.pulsar.broker.ServiceConfiguration
and org.apache.bookkeeper.mledger.ManagedLedgerFactoryConfig
.
Compatibility
- The compression is disabled by default
- Data can be upgraded safely or downgraded. When enabled, we can migrate the old data to new data with compression metadata in updating action. When disabled, we can revert this data to the previous version in update action and can parse the compressed data by the compression metadata in getting action