Adding blog for date histogram optimizations #3782

jainankitk · 2025-05-18T07:30:31Z

Description

The journey of optimizing the date histogram aggregation

Issues Resolved

Closes #3758

Check List

Commits are signed per the DCO using --signoff

By submitting this pull request, I confirm that my contribution is made under the terms of the BSD-3-Clause License.

Signed-off-by: Ankit Jain <[email protected]>

pajuric · 2025-05-22T22:46:34Z

@kolchfa-aws - Please go ahead and provide a technical review for this blog

bowenlan-amzn · 2025-05-27T18:16:38Z

_posts/2025-05-05-date-histogram-optimizations.md

+## Understanding the Aggregation Types
+
+Before diving into optimizations, let's quickly cover the fundamentals. OpenSearch supports three main types of aggregations:
+1. **Metric Aggregations**: Used for computing statistics like min, max, sum, or average
+2. **Bucket Aggregations**: Groups documents together (by date, terms, or ranges)
+3. **Pipeline Aggregations**: Combines multiple aggregations, using output from one as input to another
+As part of this blog, we will mainly focus on Date Histogram aggregations which is subtype of Bucket Aggregations


Seems redundant, the function of date histogram is already clear in the first paragraph and below section

bowenlan-amzn · 2025-05-30T20:09:28Z

_posts/2025-05-05-date-histogram-optimizations.md

+
+When we began analyzing performance, we started single date histogram aggregation query execution in loop and collected flame graphs during the execution [9310](https://github.com/opensearch-project/OpenSearch/issues/9310). The flame graph primarily indicated towards two key limitations:
+1. **Data Volume Dependency**: Query latency was directly proportional to data volume. For example, a one-month aggregation taking 1 second would take 12 seconds for a year's worth of data.
+2. **Bucket Count Impact**: Large numbers of buckets (like in minute-level aggregations) led to hash collision issues, further degrading performance.


Not aware of hash collision issues before, did we know this is the issue?

bowenlan-amzn · 2025-05-30T20:11:52Z

_posts/2025-05-05-date-histogram-optimizations.md

+
+By default, OpenSearch processes aggregations by first evaluating each query condition on every shard using Lucene, which builds iterators that identify matching document IDs. These iterators are then combined (e.g., using logical AND) to find documents that satisfy all query filters. The resulting set of matching document IDs is streamed to the aggregation framework, where each document flows through one or more aggregators. These aggregators use Lucene’s doc values (explained above) to efficiently retrieve the field values needed for computation (e.g., for calculating averages or counts). This streaming model is hierarchical—documents pass through a pipeline of aggregators, allowing them to be grouped into top-level and nested buckets simultaneously. For example, a document can first be bucketed by month, then further aggregated by HTTP status code within that month. This design enables OpenSearch to process complex, multi-level aggregations efficiently in a single pass over the matching documents.
+
+## Understanding the setup


I feel it maybe better to organize
How numeric data is stored in OpenSearch
How Aggregation works in OpenSearch
under this section

bowenlan-amzn · 2025-05-30T20:15:50Z

_posts/2025-05-05-date-histogram-optimizations.md

+}
+```
+
+This date histogram aggregation query generates filter corresponding to each bucket and leverages Lucene’s Points Index, which uses BKD trees, to significantly optimize the aggregation. This tree-based structure organizes data into nodes representing value ranges with associated document counts, enabling efficient traversal. By skipping irrelevant subtrees and leveraging early termination, the system reduces unnecessary disk reads and avoids visiting individual documents. which are executed in similar manner as range query using the index tree to determine counts for each bucket faster than document value iteration. Similar approach was also applied to the auto date histogram, composite aggregation on date histogram source, and, later on, numeric range aggregation with consecutive ranges (TODO: Why do we need consecutive ranges??).


Why do we need consecutive ranges
It's the outcome of a date histogram.

does numeric range aggregation query also need that? IMO, if consecutive ranges is not the requirement of this approach, we need not make that classification

Range agg doesn't have such requirement - consecutive ranges.
The multi-range traversal algorithm should still be able to support in-consecutive ranges case with some modification, like maintain a list of ranges that are active regarding the current travelling point.

Are you confusing consecutive with overlapping? Consecutive example is (0,10) (10,20) (20,30)...., Non-consecutive example is (0,10) (20,30) (40,50).....

bowenlan-amzn · 2025-05-30T20:18:48Z

_posts/2025-05-05-date-histogram-optimizations.md

+
+![Performance_Improvements_Initial.png](../assets/media/blog-images/2025-05-05-date-histogram-optimizations/Performance_Improvements_Initial.png)
+
+The following diagrams illustrate how documents are counted per histogram bucket using the index tree. To efficiently count documents matching a range (e.g., 351 to 771), the traversal begins at the root, checking whether the target range intersects with the node’s range. If it does, the algorithm recursively explores the left and right subtrees. An important optimization is skipping entire subtrees: if a node’s range falls completely outside the query range (e.g., 1–200), it is ignored. Conversely, if a node’s range is fully contained within the query range (e.g., 401–600), the algorithm can return the document count from that node directly without traversing its children. This allows the engine to avoid visiting all leaf nodes, focusing only on nodes with partial overlaps. As a result, the operation becomes significantly faster—reducing complexity from linear (O(N)) to logarithmic (O(log N)) in many cases—since it leverages the hierarchical structure to skip large irrelevant portions of the tree and aggregate counts efficiently.


I don't feel (O(N)) to (O(log N)) is a good way to tell the difference. The real time complexity is much complicated than that.
I think use the tree to quickly find the target range start, and skip sub trees are good and simple explanation already.

bowenlan-amzn · 2025-05-30T20:25:52Z

_posts/2025-05-05-date-histogram-optimizations.md

+![MRT_Initial.png](../assets/media/blog-images/2025-05-05-date-histogram-optimizations/MRT_Initial.png)
+![MRT_Middle.png](../assets/media/blog-images/2025-05-05-date-histogram-optimizations/MRT_Middle.png)
+![MRT_Final.png](../assets/media/blog-images/2025-05-05-date-histogram-optimizations/MRT_Final.png)


Seems there are no go-through explanation of these diagrams

Yeah, @kolchfa-aws also added that feedback. Let me work on addressing that

kolchfa-aws · 2025-06-02T17:23:25Z

@jainankitk Please let me know when you address tech review comments and this PR is ready for my review. Thanks!

Adding blog for date histogram optimizations

efc9afa

Signed-off-by: Ankit Jain <[email protected]>

jainankitk requested review from elfisher, AMoo-Miki, nknize, krisfreedain, peterzhuamazon, CEHENKLE, dtaivpp, kolchfa-aws, nateynateynate and natebower as code owners May 18, 2025 07:30

jainankitk mentioned this pull request May 18, 2025

Date histogram improvement blog #3759

Closed

1 task

kolchfa-aws self-assigned this May 23, 2025

bowenlan-amzn reviewed May 30, 2025

View reviewed changes

kolchfa-aws added the Tech review The blog is under tech review label Jun 3, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Adding blog for date histogram optimizations #3782

Adding blog for date histogram optimizations #3782

Uh oh!

jainankitk commented May 18, 2025

Uh oh!

pajuric commented May 22, 2025

Uh oh!

bowenlan-amzn May 27, 2025

Uh oh!

bowenlan-amzn May 30, 2025

Uh oh!

bowenlan-amzn May 30, 2025

Uh oh!

bowenlan-amzn May 30, 2025

Uh oh!

jainankitk Jun 3, 2025

Uh oh!

bowenlan-amzn Jun 3, 2025

Uh oh!

jainankitk Jun 3, 2025

Uh oh!

bowenlan-amzn May 30, 2025

Uh oh!

bowenlan-amzn May 30, 2025

Uh oh!

jainankitk Jun 3, 2025

Uh oh!

kolchfa-aws commented Jun 2, 2025

Uh oh!

Uh oh!


		By default, OpenSearch processes aggregations by first evaluating each query condition on every shard using Lucene, which builds iterators that identify matching document IDs. These iterators are then combined (e.g., using logical AND) to find documents that satisfy all query filters. The resulting set of matching document IDs is streamed to the aggregation framework, where each document flows through one or more aggregators. These aggregators use Lucene’s doc values (explained above) to efficiently retrieve the field values needed for computation (e.g., for calculating averages or counts). This streaming model is hierarchical—documents pass through a pipeline of aggregators, allowing them to be grouped into top-level and nested buckets simultaneously. For example, a document can first be bucketed by month, then further aggregated by HTTP status code within that month. This design enables OpenSearch to process complex, multi-level aggregations efficiently in a single pass over the matching documents.

		## Understanding the setup


		![Performance_Improvements_Initial.png](../assets/media/blog-images/2025-05-05-date-histogram-optimizations/Performance_Improvements_Initial.png)

		The following diagrams illustrate how documents are counted per histogram bucket using the index tree. To efficiently count documents matching a range (e.g., 351 to 771), the traversal begins at the root, checking whether the target range intersects with the node’s range. If it does, the algorithm recursively explores the left and right subtrees. An important optimization is skipping entire subtrees: if a node’s range falls completely outside the query range (e.g., 1–200), it is ignored. Conversely, if a node’s range is fully contained within the query range (e.g., 401–600), the algorithm can return the document count from that node directly without traversing its children. This allows the engine to avoid visiting all leaf nodes, focusing only on nodes with partial overlaps. As a result, the operation becomes significantly faster—reducing complexity from linear (O(N)) to logarithmic (O(log N)) in many cases—since it leverages the hierarchical structure to skip large irrelevant portions of the tree and aggregate counts efficiently.

Adding blog for date histogram optimizations #3782

Are you sure you want to change the base?

Adding blog for date histogram optimizations #3782

Uh oh!

Conversation

jainankitk commented May 18, 2025

Description

Issues Resolved

Check List

Uh oh!

pajuric commented May 22, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

kolchfa-aws commented Jun 2, 2025

Uh oh!

Uh oh!