Skip to content

Eliminate repeated linear searches in BlockManager #268

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

ozkoca
Copy link
Contributor

@ozkoca ozkoca commented May 9, 2025

Description of change

Currently, BlockManager performs three linear searches in BlockStore when handling a sync read that continues from a previous block:
i. To check if the read is sequential
ii. Again inside getGeneration to verify sequentiality
iii. To fetch the generation of the previous block

This PR reduces the overhead by collapsing these into a single lookup: it retrieves the previous block once and determines both sequentiality and generation based on its existence.

Relevant issues

N/A

Does this contribution introduce any breaking changes to the existing APIs or behaviors?

No

Does this contribution introduce any new public APIs or behaviors?

No

How was the contribution tested?

Executed unit tests

Does this contribution need a changelog entry?

  • No

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license and I agree to the terms of the Developer Certificate of Origin (DCO).

@ozkoca ozkoca had a problem deploying to integration-tests May 9, 2025 13:35 — with GitHub Actions Failure
* For now, pattern detection is very simple: if the previous byte is already stored by a block,
* then we conclude that the byte is requested by a sequential read. In the future we should
* probably extend this to work with some kind of tolerance or radius (i.e., look back more than a
* single byte).
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we wanted to extend this class in future. I don't think we should remove it and take the logic to BlockManager.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is a good optimisation,
It reduces BlockStore linear search lookups from 3 to 1, improving performance and we can still easily add future enhancements, like read tolerance

Something like this

long tolerance = configuration.getSequentialReadTolerance();
generation = blockStore.getBlock(pos - 1, tolerance)
    .map(block -> block.getGeneration() + 1)
    .orElse(0L);

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rajdchak in its current form, it's introducing a performance bottleneck, which is why I opted to simplify it and move the logic into BlockManager. My thinking is that we can always reintroduce a well-designed, extensible version of the class later if and when we need it

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

keeping SequentialPatternDetector around might be helpful. I can easily see how we are going to extend these patterns to cover nearly sequential reads (sequential read with wholes).

generation = 0;
} else {
// Check if it is sequential pattern
generation = blockStore.getBlock(pos - 1).map(block -> block.getGeneration() + 1).orElse(0L);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice!

Copy link
Collaborator

@ahmarsuhail ahmarsuhail left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks good, think we're losing out on some test coverage though?

do we still have tests that will verify if a block with pos-1 exists, then it's generation will be increased?

@ozkoca ozkoca requested a review from ahmarsuhail May 9, 2025 16:15
@ozkoca ozkoca had a problem deploying to integration-tests May 12, 2025 09:28 — with GitHub Actions Failure
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants