Skip to content

Commit d1310b8

Browse files
jainankitkrayshrey
authored andcommitted
Improving the performance of date histogram aggregation (without any sub-aggregation) (opensearch-project#11083)
* Adding filter based optimization logic to date histogram aggregation Signed-off-by: Ankit Jain <[email protected]> * Reading the field name for aggregation correctly Signed-off-by: Ankit Jain <[email protected]> * Adding the limit on number of buckets for filter aggregation Signed-off-by: Ankit Jain <[email protected]> * Applying the optimizations for match all query as well Signed-off-by: Ankit Jain <[email protected]> * Handling the unwrapped match all query Signed-off-by: Ankit Jain <[email protected]> * Adding logic for recursively unwrapping the query Signed-off-by: Ankit Jain <[email protected]> * Restructuring the code for making it more reusable and unit testable Signed-off-by: Ankit Jain <[email protected]> * Adding javadocs for fixing build failure Signed-off-by: Ankit Jain <[email protected]> * Fixing minor bugs in refactoring Signed-off-by: Ankit Jain <[email protected]> * Adding logic for optimizing auto date histogram Signed-off-by: Ankit Jain <[email protected]> * Fixing bugs and passing unit tests for date histogram Signed-off-by: Ankit Jain <[email protected]> * Temporarily reverting auto date histogram changes Signed-off-by: Ankit Jain <[email protected]> * Fixing spotless check bugs Signed-off-by: Ankit Jain <[email protected]> * Adding back auto date histogram and passing all unit tests Signed-off-by: Ankit Jain <[email protected]> * Fixing the integration tests for reduced collector work Signed-off-by: Ankit Jain <[email protected]> * Fixing the integration test regression Signed-off-by: Ankit Jain <[email protected]> * Addressing code review comments Signed-off-by: Ankit Jain <[email protected]> * Fixing hardbound, missing and script test cases Signed-off-by: Ankit Jain <[email protected]> * Removing collect_count validation to prevent backward compatibility tests from failing Signed-off-by: Ankit Jain <[email protected]> * Finally fixing hardbounds test case Signed-off-by: Ankit Jain <[email protected]> * Refactoring code for reusability Signed-off-by: Ankit Jain <[email protected]> --------- Signed-off-by: Ankit Jain <[email protected]>
1 parent 4387cb8 commit d1310b8

File tree

8 files changed

+456
-20
lines changed

8 files changed

+456
-20
lines changed

CHANGELOG.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -146,6 +146,7 @@ The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/),
146146
- Disallow removing some metadata fields by remove ingest processor ([#10895](https://github.com/opensearch-project/OpenSearch/pull/10895))
147147
- Refactor common parts from the Rounding class into a separate 'round' package ([#11023](https://github.com/opensearch-project/OpenSearch/issues/11023))
148148
- Performance improvement for MultiTerm Queries on Keyword fields ([#7057](https://github.com/opensearch-project/OpenSearch/issues/7057))
149+
- Performance improvement for date histogram aggregations without sub-aggregations ([#11083](https://github.com/opensearch-project/OpenSearch/pull/11083))
149150
- Disable concurrent aggs for Diversified Sampler and Sampler aggs ([#11087](https://github.com/opensearch-project/OpenSearch/issues/11087))
150151
- Made leader/follower check timeout setting dynamic ([#10528](https://github.com/opensearch-project/OpenSearch/pull/10528))
151152
- Improve boolean parsing performance ([#11308](https://github.com/opensearch-project/OpenSearch/pull/11308))

rest-api-spec/src/main/resources/rest-api-spec/test/search.aggregation/10_histogram.yml

Lines changed: 0 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -598,7 +598,6 @@ setup:
598598
- match: { aggregations.histo.buckets.0.doc_count: 2 }
599599
- match: { profile.shards.0.aggregations.0.type: DateHistogramAggregator }
600600
- match: { profile.shards.0.aggregations.0.description: histo }
601-
- match: { profile.shards.0.aggregations.0.breakdown.collect_count: 4 }
602601
- match: { profile.shards.0.aggregations.0.debug.total_buckets: 3 }
603602

604603
---

server/src/main/java/org/opensearch/common/Rounding.java

Lines changed: 27 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -98,7 +98,7 @@ long roundFloor(long utcMillis) {
9898
}
9999

100100
@Override
101-
long extraLocalOffsetLookup() {
101+
public long extraLocalOffsetLookup() {
102102
return extraLocalOffsetLookup;
103103
}
104104
},
@@ -109,7 +109,7 @@ long roundFloor(long utcMillis) {
109109
return DateUtils.roundYear(utcMillis);
110110
}
111111

112-
long extraLocalOffsetLookup() {
112+
public long extraLocalOffsetLookup() {
113113
return extraLocalOffsetLookup;
114114
}
115115
},
@@ -120,7 +120,7 @@ long roundFloor(long utcMillis) {
120120
return DateUtils.roundQuarterOfYear(utcMillis);
121121
}
122122

123-
long extraLocalOffsetLookup() {
123+
public long extraLocalOffsetLookup() {
124124
return extraLocalOffsetLookup;
125125
}
126126
},
@@ -131,7 +131,7 @@ long roundFloor(long utcMillis) {
131131
return DateUtils.roundMonthOfYear(utcMillis);
132132
}
133133

134-
long extraLocalOffsetLookup() {
134+
public long extraLocalOffsetLookup() {
135135
return extraLocalOffsetLookup;
136136
}
137137
},
@@ -140,7 +140,7 @@ long roundFloor(long utcMillis) {
140140
return DateUtils.roundFloor(utcMillis, this.ratio);
141141
}
142142

143-
long extraLocalOffsetLookup() {
143+
public long extraLocalOffsetLookup() {
144144
return ratio;
145145
}
146146
},
@@ -149,7 +149,7 @@ long roundFloor(long utcMillis) {
149149
return DateUtils.roundFloor(utcMillis, ratio);
150150
}
151151

152-
long extraLocalOffsetLookup() {
152+
public long extraLocalOffsetLookup() {
153153
return ratio;
154154
}
155155
},
@@ -164,7 +164,7 @@ long roundFloor(long utcMillis) {
164164
return DateUtils.roundFloor(utcMillis, ratio);
165165
}
166166

167-
long extraLocalOffsetLookup() {
167+
public long extraLocalOffsetLookup() {
168168
return ratio;
169169
}
170170
},
@@ -179,7 +179,7 @@ long roundFloor(long utcMillis) {
179179
return DateUtils.roundFloor(utcMillis, ratio);
180180
}
181181

182-
long extraLocalOffsetLookup() {
182+
public long extraLocalOffsetLookup() {
183183
return ratio;
184184
}
185185
};
@@ -216,7 +216,7 @@ long extraLocalOffsetLookup() {
216216
* look up so that we can see transitions that we might have rounded
217217
* down beyond.
218218
*/
219-
abstract long extraLocalOffsetLookup();
219+
public abstract long extraLocalOffsetLookup();
220220

221221
public byte getId() {
222222
return id;
@@ -487,7 +487,7 @@ public double roundingSize(long utcMillis, DateTimeUnit timeUnit) {
487487
*
488488
* @opensearch.internal
489489
*/
490-
static class TimeUnitRounding extends Rounding {
490+
public static class TimeUnitRounding extends Rounding {
491491
static final byte ID = 1;
492492

493493
private final DateTimeUnit unit;
@@ -515,6 +515,14 @@ public byte id() {
515515
return ID;
516516
}
517517

518+
public DateTimeUnit getUnit() {
519+
return this.unit;
520+
}
521+
522+
public ZoneId getTimeZone() {
523+
return this.timeZone;
524+
}
525+
518526
private LocalDateTime truncateLocalDateTime(LocalDateTime localDateTime) {
519527
switch (unit) {
520528
case SECOND_OF_MINUTE:
@@ -945,7 +953,7 @@ public final long nextRoundingValue(long utcMillis) {
945953
*
946954
* @opensearch.internal
947955
*/
948-
static class TimeIntervalRounding extends Rounding {
956+
public static class TimeIntervalRounding extends Rounding {
949957
static final byte ID = 2;
950958

951959
private final long interval;
@@ -972,6 +980,14 @@ public byte id() {
972980
return ID;
973981
}
974982

983+
public long getInterval() {
984+
return this.interval;
985+
}
986+
987+
public ZoneId getTimeZone() {
988+
return this.timeZone;
989+
}
990+
975991
@Override
976992
public Prepared prepare(long minUtcMillis, long maxUtcMillis) {
977993
long minLookup = minUtcMillis - interval;

server/src/main/java/org/opensearch/index/mapper/DateFieldMapper.java

Lines changed: 10 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -409,6 +409,16 @@ public long parse(String value) {
409409
return resolution.convert(DateFormatters.from(dateTimeFormatter().parse(value), dateTimeFormatter().locale()).toInstant());
410410
}
411411

412+
public long convertNanosToMillis(long nanoSecondsSinceEpoch) {
413+
if (resolution.numericType.equals(NumericType.DATE_NANOSECONDS)) return DateUtils.toMilliSeconds(nanoSecondsSinceEpoch);
414+
return nanoSecondsSinceEpoch;
415+
}
416+
417+
public long convertRoundedMillisToNanos(long milliSecondsSinceEpoch) {
418+
if (resolution.numericType.equals(NumericType.DATE_NANOSECONDS)) return DateUtils.toNanoSeconds(milliSecondsSinceEpoch);
419+
return milliSecondsSinceEpoch;
420+
}
421+
412422
@Override
413423
public ValueFetcher valueFetcher(QueryShardContext context, SearchLookup searchLookup, String format) {
414424
DateFormatter defaultFormatter = dateTimeFormatter();

server/src/main/java/org/opensearch/search/aggregations/bucket/histogram/AutoDateHistogramAggregator.java

Lines changed: 85 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -34,13 +34,15 @@
3434
import org.apache.lucene.index.LeafReaderContext;
3535
import org.apache.lucene.index.SortedNumericDocValues;
3636
import org.apache.lucene.search.ScoreMode;
37+
import org.apache.lucene.search.Weight;
3738
import org.apache.lucene.util.CollectionUtil;
3839
import org.opensearch.common.Rounding;
3940
import org.opensearch.common.Rounding.Prepared;
4041
import org.opensearch.common.lease.Releasables;
4142
import org.opensearch.common.util.IntArray;
4243
import org.opensearch.common.util.LongArray;
4344
import org.opensearch.core.common.util.ByteArray;
45+
import org.opensearch.index.mapper.DateFieldMapper;
4446
import org.opensearch.search.DocValueFormat;
4547
import org.opensearch.search.aggregations.Aggregator;
4648
import org.opensearch.search.aggregations.AggregatorFactories;
@@ -125,9 +127,13 @@ static AutoDateHistogramAggregator build(
125127
* {@link MergingBucketsDeferringCollector#mergeBuckets(long[])}.
126128
*/
127129
private MergingBucketsDeferringCollector deferringCollector;
130+
private final Weight[] filters;
131+
private final DateFieldMapper.DateFieldType fieldType;
128132

129133
protected final RoundingInfo[] roundingInfos;
130134
protected final int targetBuckets;
135+
protected int roundingIdx;
136+
protected Rounding.Prepared preparedRounding;
131137

132138
private AutoDateHistogramAggregator(
133139
String name,
@@ -148,8 +154,51 @@ private AutoDateHistogramAggregator(
148154
this.formatter = valuesSourceConfig.format();
149155
this.roundingInfos = roundingInfos;
150156
this.roundingPreparer = roundingPreparer;
157+
this.preparedRounding = prepareRounding(0);
158+
159+
FilterRewriteHelper.FilterContext filterContext = FilterRewriteHelper.buildFastFilterContext(
160+
parent(),
161+
subAggregators.length,
162+
context,
163+
b -> getMinimumRounding(b[0], b[1]),
164+
// Passing prepared rounding as supplier to ensure the correct prepared
165+
// rounding is set as it is done during getMinimumRounding
166+
() -> preparedRounding,
167+
valuesSourceConfig,
168+
fc -> FilterRewriteHelper.getAggregationBounds(context, fc.field())
169+
);
170+
if (filterContext != null) {
171+
fieldType = filterContext.fieldType;
172+
filters = filterContext.filters;
173+
} else {
174+
fieldType = null;
175+
filters = null;
176+
}
151177
}
152178

179+
private Rounding getMinimumRounding(final long low, final long high) {
180+
// max - min / targetBuckets = bestDuration
181+
// find the right innerInterval this bestDuration belongs to
182+
// since we cannot exceed targetBuckets, bestDuration should go up,
183+
// so the right innerInterval should be an upper bound
184+
long bestDuration = (high - low) / targetBuckets;
185+
while (roundingIdx < roundingInfos.length - 1) {
186+
final RoundingInfo curRoundingInfo = roundingInfos[roundingIdx];
187+
final int temp = curRoundingInfo.innerIntervals[curRoundingInfo.innerIntervals.length - 1];
188+
// If the interval duration is covered by the maximum inner interval,
189+
// we can start with this outer interval for creating the buckets
190+
if (bestDuration <= temp * curRoundingInfo.roughEstimateDurationMillis) {
191+
break;
192+
}
193+
roundingIdx++;
194+
}
195+
196+
preparedRounding = prepareRounding(roundingIdx);
197+
return roundingInfos[roundingIdx].rounding;
198+
}
199+
200+
protected abstract LongKeyedBucketOrds getBucketOrds();
201+
153202
@Override
154203
public final ScoreMode scoreMode() {
155204
if (valuesSource != null && valuesSource.needsScores()) {
@@ -176,7 +225,32 @@ public final LeafBucketCollector getLeafCollector(LeafReaderContext ctx, LeafBuc
176225
if (valuesSource == null) {
177226
return LeafBucketCollector.NO_OP_COLLECTOR;
178227
}
179-
return getLeafCollector(valuesSource.longValues(ctx), sub);
228+
229+
final SortedNumericDocValues values = valuesSource.longValues(ctx);
230+
final LeafBucketCollector iteratingCollector = getLeafCollector(values, sub);
231+
232+
// Need to be declared as final and array for usage within the
233+
// LeafBucketCollectorBase subclass below
234+
final boolean[] useOpt = new boolean[1];
235+
useOpt[0] = filters != null;
236+
237+
return new LeafBucketCollectorBase(sub, values) {
238+
@Override
239+
public void collect(int doc, long owningBucketOrd) throws IOException {
240+
// Try fast filter aggregation if the filters have been created
241+
// Skip if tried before and gave incorrect/incomplete results
242+
if (useOpt[0]) {
243+
useOpt[0] = FilterRewriteHelper.tryFastFilterAggregation(ctx, filters, fieldType, (key, count) -> {
244+
incrementBucketDocCount(
245+
FilterRewriteHelper.getBucketOrd(getBucketOrds().add(owningBucketOrd, preparedRounding.round(key))),
246+
count
247+
);
248+
});
249+
}
250+
251+
iteratingCollector.collect(doc, owningBucketOrd);
252+
}
253+
};
180254
}
181255

182256
protected final InternalAggregation[] buildAggregations(
@@ -247,8 +321,6 @@ protected final void merge(long[] mergeMap, long newNumBuckets) {
247321
* @opensearch.internal
248322
*/
249323
private static class FromSingle extends AutoDateHistogramAggregator {
250-
private int roundingIdx;
251-
private Rounding.Prepared preparedRounding;
252324
/**
253325
* Map from value to bucket ordinals.
254326
* <p>
@@ -286,10 +358,14 @@ private static class FromSingle extends AutoDateHistogramAggregator {
286358
metadata
287359
);
288360

289-
preparedRounding = prepareRounding(0);
290361
bucketOrds = new LongKeyedBucketOrds.FromSingle(context.bigArrays());
291362
}
292363

364+
@Override
365+
protected LongKeyedBucketOrds getBucketOrds() {
366+
return bucketOrds;
367+
}
368+
293369
@Override
294370
protected LeafBucketCollector getLeafCollector(SortedNumericDocValues values, LeafBucketCollector sub) throws IOException {
295371
return new LeafBucketCollectorBase(sub, values) {
@@ -507,6 +583,11 @@ private static class FromMany extends AutoDateHistogramAggregator {
507583
liveBucketCountUnderestimate = context.bigArrays().newIntArray(1, true);
508584
}
509585

586+
@Override
587+
protected LongKeyedBucketOrds getBucketOrds() {
588+
return bucketOrds;
589+
}
590+
510591
@Override
511592
protected LeafBucketCollector getLeafCollector(SortedNumericDocValues values, LeafBucketCollector sub) throws IOException {
512593
return new LeafBucketCollectorBase(sub, values) {

0 commit comments

Comments
 (0)