Skip to content

Commit 79957d5

Browse files
committed
Add docs for accept multiple ha pairs
Signed-off-by: SungJin1212 <[email protected]>
1 parent a5c4905 commit 79957d5

File tree

1 file changed

+113
-18
lines changed

1 file changed

+113
-18
lines changed

docs/guides/ha-pair-handling.md

Lines changed: 113 additions & 18 deletions
Original file line numberDiff line numberDiff line change
@@ -7,21 +7,35 @@ slug: ha-pair-handling
77

88
## Context
99

10-
You can have more than a single Prometheus monitoring and ingesting the same metrics for redundancy. Cortex already does replication for redundancy, and it doesn't make sense to ingest the same data twice. So in Cortex, we made sure we can dedupe the data we receive from HA Pairs of Prometheus. We do this via the following:
11-
12-
Assume that there are two teams, each running their own Prometheus, monitoring different services. Let's call the Prometheus T1 and T2. Now, if the teams are running HA pairs, let's call the individual Prometheus, T1.a, T1.b, and T2.a and T2.b.
13-
14-
In Cortex, we make sure we only ingest from one of T1.a and T1.b, and only from one of T2.a and T2.b. We do this by electing a leader replica for each cluster of Prometheus. For example, in the case of T1, let it be T1.a. As long as T1.a is the leader, we drop the samples sent by T1.b. And if Cortex sees no new samples from T1.a for a short period (30s by default), it'll switch the leader to be T1.b.
15-
16-
This means if T1.a goes down for a few minutes, Cortex's HA sample handling will have switched and elected T1.b as the leader. This failover timeout is what enables us to only accept samples from a single replica at a time, but ensure we don't drop too much data in case of issues. Note that with the default scrape period of 15s, and the default timeouts in Cortex, in most cases, you'll only lose a single scrape of data in the case of a leader election failover. For any rate queries, the rate window should be at least 4x the scrape period to account for any of these failover scenarios, for example, with the default scrape period of 15s, then you should calculate rates over at least 1m periods.
10+
You can have more than a single Prometheus monitoring and ingesting the same metrics for redundancy. Cortex already does
11+
replication for redundancy, and it doesn't make sense to ingest the same data twice. So in Cortex, we made sure we can
12+
dedupe the data we receive from HA Pairs of Prometheus. We do this via the following:
13+
14+
Assume that there are two teams, each running their own Prometheus, monitoring different services. Let's call the
15+
Prometheus T1 and T2. Now, if the teams are running HA pairs, let's call the individual Prometheus, T1.a, T1.b, and T2.a
16+
and T2.b.
17+
18+
In Cortex, we make sure we only ingest from one of T1.a and T1.b, and only from one of T2.a and T2.b. We do this by
19+
electing a leader replica for each cluster of Prometheus. For example, in the case of T1, let it be T1.a. As long as
20+
T1.a is the leader, we drop the samples sent by T1.b. And if Cortex sees no new samples from T1.a for a short period (
21+
30s by default), it'll switch the leader to be T1.b.
22+
23+
This means if T1.a goes down for a few minutes, Cortex's HA sample handling will have switched and elected T1.b as the
24+
leader. This failover timeout is what enables us to only accept samples from a single replica at a time, but ensure we
25+
don't drop too much data in case of issues. Note that with the default scrape period of 15s, and the default timeouts in
26+
Cortex, in most cases, you'll only lose a single scrape of data in the case of a leader election failover. For any rate
27+
queries, the rate window should be at least 4x the scrape period to account for any of these failover scenarios, for
28+
example, with the default scrape period of 15s, then you should calculate rates over at least 1m periods.
1729

1830
Now we do the same leader election process for T2.
1931

2032
## Config
2133

2234
### Client Side
2335

24-
So for Cortex to achieve this, we need 2 identifiers for each process, one identifier for the cluster (T1 or T2, etc.) and one identifier to identify the replica in the cluster (a or b). The easiest way to do this is by setting external labels; the default labels are `cluster` and `__replica__`. For example:
36+
So for Cortex to achieve this, we need 2 identifiers for each process, one identifier for the cluster (T1 or T2, etc.)
37+
and one identifier to identify the replica in the cluster (a or b). The easiest way to do this is by setting external
38+
labels; the default labels are `cluster` and `__replica__`. For example:
2539

2640
```
2741
cluster: prom-team1
@@ -37,18 +51,23 @@ __replica__: replica2
3751

3852
Note: These are external labels and have nothing to do with remote_write config.
3953

40-
These two label names are configurable per-tenant within Cortex and should be set to something sensible. For example, the cluster label is already used by some workloads, and you should set the label to be something else that uniquely identifies the cluster. Good examples for this label-name would be `team`, `cluster`, `prometheus`, etc.
54+
These two label names are configurable per-tenant within Cortex and should be set to something sensible. For example,
55+
the cluster label is already used by some workloads, and you should set the label to be something else that uniquely
56+
identifies the cluster. Good examples for this label-name would be `team`, `cluster`, `prometheus`, etc.
4157

42-
The replica label should be set so that the value for each prometheus is unique in that cluster. Note: Cortex drops this label when ingesting data but preserves the cluster label. This way, your timeseries won't change when replicas change.
58+
The replica label should be set so that the value for each prometheus is unique in that cluster. Note: Cortex drops this
59+
label when ingesting data but preserves the cluster label. This way, your timeseries won't change when replicas change.
4360

4461
### Server Side
4562

4663
The minimal configuration requires:
4764

4865
* Enabling the HA tracker via `-distributor.ha-tracker.enable=true` CLI flag (or its YAML config option)
49-
* Configuring the KV store for the ring (See: [Ring/HA Tracker Store](../configuration/arguments.md#ringha-tracker-store)). Only Consul and etcd are currently supported. Multi should be used for migration purposes only.
50-
* Setting the limits configuration to accept samples via `-distributor.ha-tracker.enable-for-all-users` (or its YAML config option).
51-
66+
* Configuring the KV store for the ring (
67+
See: [Ring/HA Tracker Store](../configuration/arguments.md#ringha-tracker-store)). Only Consul and etcd are currently
68+
supported. Multi should be used for migration purposes only.
69+
* Setting the limits configuration to accept samples via `-distributor.ha-tracker.enable-for-all-users` (or its YAML
70+
config option).
5271

5372
The following configuration snippet shows an example of the HA tracker config via YAML config file:
5473

@@ -63,15 +82,18 @@ distributor:
6382
enable_ha_tracker: true
6483
...
6584
kvstore:
66-
[store: <string> | default = "consul"]
67-
[consul | etcd: <config>]
68-
...
85+
[ store: <string> | default = "consul" ]
86+
[ consul | etcd: <config> ]
87+
...
6988
...
7089
```
7190

72-
For further configuration file documentation, see the [distributor section](../configuration/config-file-reference.md#distributor_config) and [Ring/HA Tracker Store](../configuration/arguments.md#ringha-tracker-store).
91+
For further configuration file documentation, see
92+
the [distributor section](../configuration/config-file-reference.md#distributor_config)
93+
and [Ring/HA Tracker Store](../configuration/arguments.md#ringha-tracker-store).
7394

74-
For flag configuration, see the [distributor flags](../configuration/arguments.md#ha-tracker) having `ha-tracker` in them.
95+
For flag configuration, see the [distributor flags](../configuration/arguments.md#ha-tracker) having `ha-tracker` in
96+
them.
7597

7698
## Remote Read
7799

@@ -109,3 +131,76 @@ Cortex will not return any data.
109131

110132
Therefore, the `__replica__` label should only be added for remote write.
111133

134+
## Accept multiple HA pairs in single request
135+
Let's assume there are two teams (T1 and T2), and each team operates two Prometheus for the HA (T1.a, T1.b for T1 and
136+
T2.a, T2.b for T2).
137+
They want to operate another Prometheus, receiving whole Prometheus requests and sending write request to the
138+
Distributor.
139+
140+
The write request flow is as follows: T1.a, T1.b, T2.a, T2.b -> Prometheus -> Distributor which means the Distributor's
141+
incoming write request contains time series of T1.a, T1.b, T2.a, and T2.b.
142+
In other words, there are two HA pairs in a single write request, and the expected push result is to accept each
143+
Prometheus leader replicas (example: T1.a, T2.b for each team).
144+
145+
## Config
146+
### Client side
147+
The client setting is the same as a single HA pair.
148+
For example:
149+
150+
For T1.a
151+
```
152+
cluster: prom-team1
153+
__replica__: replica1 (or pod-name)
154+
```
155+
156+
For T1.b
157+
158+
```
159+
cluster: prom-team1
160+
__replica__: replica2 (or pod-name)
161+
```
162+
163+
For T2.a
164+
165+
```
166+
cluster: prom-team2
167+
__replica__: replica1 (or pod-name)
168+
```
169+
170+
For T2.b
171+
172+
```
173+
cluster: prom-team2
174+
__replica__: replica2 (or pod-name)
175+
```
176+
177+
### Server side
178+
179+
One additional setting is needed to accept multiple HA pairs; it is enabled via
180+
`--experimental.distributor.ha-tracker.mixed-ha-samples=true` (or its YAML config option).
181+
182+
The following configuration snippet shows an example of accepting multiple HA pairs config via the YAML config file:
183+
184+
```yaml
185+
limits:
186+
...
187+
accept_ha_samples: true
188+
accept_mixed_ha_samples: true
189+
...
190+
distributor:
191+
...
192+
ha_tracker:
193+
enable_ha_tracker: true
194+
...
195+
kvstore:
196+
[ store: <string> | default = "consul" ]
197+
[ consul | etcd: <config> ]
198+
...
199+
...
200+
```
201+
202+
For further configuration file documentation, see
203+
the [limits section](../configuration/config-file-reference.md#limits_config).
204+
205+
206+

0 commit comments

Comments
 (0)