Skip to content

PIP-146: ManagedCursorInfo compression #14529

Closed
@nodece

Description

@nodece

Discussion thread: https://lists.apache.org/thread/j92bzsby9n2ozc9gcw5psgcy2026l1wm

Motivation

The cursor data is managed by ZooKeeper/etcd metadata store. When cursor data becomes more and more, the data size will increase and will take a lot of time to pull the data. Therefore, it is necessary to add compression for the cursor, which can reduce the size of data and reduce the time of pulling data.

Goal

Support use the LZ4/ZLIB/ZSTD/SNAPPY to compress the ManagedCursorInfo.

Implementation

CursorInfo compression format

[MAGIC_NUMBER] + [METADATA_SIZE] + [METADATA_PAYLOAD] + [MANAGED_CURSOR_INFO_PAYLOAD]

  • MAGIC_NUMBER
    Use 0x4778, it is the same as the magic number of ledger info.

  • METADATA
    Add a named ManagedCursorInfoMetadata message to MLDataFormats.proto

message ManagedCursorInfoMetadata {
    required CompressionType compressionType = 1;
    required int32 uncompressedSize = 2;
}

CursorInfo compression and decompression design

Currently, these compressions types have been defined and implemented by Pulsar, we only need to deal with compression and decompression of the ManagedCursorInfo data:

  • Get CursorInfo from the metadata store
    We will check the cursor data header, if it is compressed, we will parse the bytes data by compressed format, otherwise we will parse the cursor data directly by the original way.

  • Add/Update CursorInfo to the metadata store
    The default is to use compression if the compression type is specified, otherwise we will put this data to the metadata store directly.

CursorInfo compression type configuration

Add managedCursorInfoCompressionType in org.apache.pulsar.broker.ServiceConfiguration and org.apache.bookkeeper.mledger.ManagedLedgerFactoryConfig.

Compatibility

  1. The compression is disabled by default
  2. Data can be upgraded safely or downgraded. When enabled, we can migrate the old data to new data with compression metadata in updating action. When disabled, we can revert this data to the previous version in update action and can parse the compressed data by the compression metadata in getting action

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions