[Feature Request] Paginating _wlm/stats API

### Is your feature request related to a problem? Please describe

The current _wlm/stats API in OpenSearch provides query group statistics across nodes in a single response, which scales poorly as cluster size increases. Similar to _cat APIs (e.g., _cat/indices, _cat/shards), this API suffers from large response sizes, high latency, and increased CPU/memory consumption. This makes it difficult for users to efficiently retrieve and process query group statistics, especially in large clusters.

The need for pagination arises to:

1. Limit response size, reducing memory usage and response latency.
2. Prevent unnecessary aggregation of statistics for all nodes at once.
3. Enable efficient navigation of query group statistics, similar to paginated APIs like _list/indices and _list/shards.

The issues and approaches discussed in the following OpenSearch GitHub issues are particularly relevant:

[OpenSearch Issue #14257](https://github.com/opensearch-project/OpenSearch/issues/14257): Discusses pagination for _cat APIs, highlighting the impact of large responses on cluster performance.
[OpenSearch Issue #15014](https://github.com/opensearch-project/OpenSearch/issues/15014): Tracks the introduction of _list APIs to replace _cat APIs, ensuring efficient pagination with next_token.
[OpenSearch Issue #14258](https://github.com/opensearch-project/OpenSearch/issues/14258): Discusses pagination strategies, emphasizing deterministic sorting keys for stable pagination behavior.

### Describe the solution you'd like

<p data-start="80" data-end="575">To address the issues of large response sizes and high resource consumption in <code data-start="163" data-end="175">_wlm/stats</code>, we propose introducing a <strong data-start="202" data-end="243">new API endpoint (<code data-start="222" data-end="240">/_list/wlm_stats</code>)</strong> with <strong data-start="249" data-end="275">token-based pagination</strong>. This follows the approach used in <strong data-start="311" data-end="403"><a data-start="313" data-end="401" rel="noopener" target="_new" href="https://github.com/opensearch-project/OpenSearch/issues/14257">OpenSearch Issue #14257</a></strong> and <strong data-start="408" data-end="500"><a data-start="410" data-end="498" rel="noopener" target="_new" href="https://github.com/opensearch-project/OpenSearch/issues/15014">OpenSearch Issue #15014</a></strong>, where <code data-start="508" data-end="515">_list</code> APIs were introduced for paginating large <code data-start="558" data-end="564">_cat</code> responses.</p>

<h3 data-start="582" data-end="602"><strong data-start="586" data-end="602">Key Features</strong></h3>
<ol data-start="603" data-end="1098">
<li data-start="603" data-end="741"><strong data-start="606" data-end="647">Token-Based Pagination (<code data-start="632" data-end="644">next_token</code>)</strong>: Users can fetch query group statistics in smaller chunks, reducing resource consumption.</li>
<li data-start="742" data-end="871"><strong data-start="745" data-end="764">Sorting Support</strong>: Users can sort results by <strong data-start="792" data-end="818">Node ID or Query Group</strong>, ensuring a stable and predictable pagination order.</li>
<li data-start="872" data-end="987"><strong data-start="875" data-end="893">Tabular Output</strong>: The response is structured similarly to <code data-start="935" data-end="941">_cat</code> APIs, making it easy to read and process.</li>
<li data-start="988" data-end="1098"><strong data-start="991" data-end="1006">Scalability</strong>: Limits the amount of data retrieved per request, preventing excessive load on the cluster.</li>
</ol>

<h3 data-start="1105" data-end="1128"><strong data-start="1109" data-end="1128">Sorting Options</strong></h3>
<p data-start="1129" data-end="1344">Since CPU and memory usage fluctuate frequently, sorting by these values is not supported because it would cause inconsistent pagination results. Instead, sorting will be restricted to stable attributes:</p>

1. **node_id (Default)**: Sorts results lexicographically by Node ID, then by Query Group. Ensures structured browsing.
2. **query_group**: Groups results by Query Group, useful for analyzing workload behavior.

</div>

<h3 data-start="1644" data-end="1669"><strong data-start="1648" data-end="1669">Example API Calls</strong></h3>
<h5 data-start="1670" data-end="1720"><strong data-start="1676" data-end="1720">Fetch First Page (Sorted by Query Group)</strong></h5>
<pre class="!overflow-visible" data-start="1721" data-end="1784"><div class="contain-inline-size rounded-md border-[0.5px] border-token-border-medium relative bg-token-sidebar-surface-primary"><div class="flex items-center text-token-text-secondary px-4 py-2 text-xs font-sans justify-between h-9 bg-token-sidebar-surface-primary dark:bg-token-main-surface-secondary select-none rounded-t-[5px]"></div><div class="sticky top-9"><div class="absolute bottom-0 right-0 flex h-9 items-center pr-2"><div class="flex items-center rounded bg-token-sidebar-surface-primary px-2 font-sans text-xs text-token-text-secondary dark:bg-token-main-surface-secondary"><span class="" data-state="closed"><button class="flex gap-1 items-center select-none px-4 py-1" aria-label="Copy"><svg width="24" height="24" viewBox="0 0 24 24" fill="none" xmlns="http://www.w3.org/2000/svg" class="icon-xs"><path fill-rule="evenodd" clip-rule="evenodd" d="M7 5C7 3.34315 8.34315 2 10 2H19C20.6569 2 22 3.34315 22 5V14C22 15.6569 20.6569 17 19 17H17V19C17 20.6569 15.6569 22 14 22H5C3.34315 22 2 20.6569 2 19V10C2 8.34315 3.34315 7 5 7H7V5ZM9 7H14C15.6569 7 17 8.34315 17 10V15H19C19.5523 15 20 14.5523 20 14V5C20 4.44772 19.5523 4 19 4H10C9.44772 4 9 4.44772 9 5V7ZM5 9C4.44772 9 4 9.44772 4 10V19C4 19.5523 4.44772 20 5 20H14C14.5523 20 15 19.5523 15 19V10C15 9.44772 14.5523 9 14 9H5Z" fill="currentColor"></path></svg></button></span><span class="" data-state="closed"><button class="flex select-none items-center gap-1 px-4 py-1"><svg width="24" height="24" viewBox="0 0 24 24" fill="none" xmlns="http://www.w3.org/2000/svg" class="icon-xs"><path d="M2.5 5.5C4.3 5.2 5.2 4 5.5 2.5C5.8 4 6.7 5.2 8.5 5.5C6.7 5.8 5.8 7 5.5 8.5C5.2 7 4.3 5.8 2.5 5.5Z" fill="currentColor" stroke="currentColor" stroke-linecap="round" stroke-linejoin="round"></path><path d="M5.66282 16.5231L5.18413 19.3952C5.12203 19.7678 5.09098 19.9541 5.14876 20.0888C5.19933 20.2067 5.29328 20.3007 5.41118 20.3512C5.54589 20.409 5.73218 20.378 6.10476 20.3159L8.97693 19.8372C9.72813 19.712 10.1037 19.6494 10.4542 19.521C10.7652 19.407 11.0608 19.2549 11.3343 19.068C11.6425 18.8575 11.9118 18.5882 12.4503 18.0497L20 10.5C21.3807 9.11929 21.3807 6.88071 20 5.5C18.6193 4.11929 16.3807 4.11929 15 5.5L7.45026 13.0497C6.91175 13.5882 6.6425 13.8575 6.43197 14.1657C6.24513 14.4392 6.09299 14.7348 5.97903 15.0458C5.85062 15.3963 5.78802 15.7719 5.66282 16.5231Z" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round"></path><path d="M14.5 7L18.5 11" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round"></path></svg></button></span></div></div></div><div class="overflow-y-auto p-4" dir="ltr"><code class="!whitespace-pre"><span><span><span class="hljs-keyword">GET</span></span><span> </span><span><span class="hljs-operator">/</span></span><span>_list</span><span><span class="hljs-operator">/</span></span><span>wlm_stats?size</span><span><span class="hljs-operator">=</span></span><span><span class="hljs-number">50</span></span><span><span class="hljs-operator">&amp;</span></span><span>sort</span><span><span class="hljs-operator">=</span></span><span>query_group</span><span><span class="hljs-operator">&amp;</span></span><span><span class="hljs-keyword">order</span></span><span><span class="hljs-operator">=</span></span><span><span class="hljs-keyword">asc</span></span><span>
</span></span></code></div></div></pre>
<p data-start="1785" data-end="1880">Returns results <strong data-start="1803" data-end="1829">grouped by Query Group</strong>, making it easier to analyze workload performance.</p>
<h5 data-start="1882" data-end="1928"><strong data-start="1888" data-end="1928">Fetch First Page (Sorted by Node ID)</strong></h5>
<pre class="!overflow-visible" data-start="1929" data-end="1988"><div class="contain-inline-size rounded-md border-[0.5px] border-token-border-medium relative bg-token-sidebar-surface-primary"><div class="flex items-center text-token-text-secondary px-4 py-2 text-xs font-sans justify-between h-9 bg-token-sidebar-surface-primary dark:bg-token-main-surface-secondary select-none rounded-t-[5px]"></div><div class="sticky top-9"><div class="absolute bottom-0 right-0 flex h-9 items-center pr-2"><div class="flex items-center rounded bg-token-sidebar-surface-primary px-2 font-sans text-xs text-token-text-secondary dark:bg-token-main-surface-secondary"><span class="" data-state="closed"><button class="flex gap-1 items-center select-none px-4 py-1" aria-label="Copy"><svg width="24" height="24" viewBox="0 0 24 24" fill="none" xmlns="http://www.w3.org/2000/svg" class="icon-xs"><path fill-rule="evenodd" clip-rule="evenodd" d="M7 5C7 3.34315 8.34315 2 10 2H19C20.6569 2 22 3.34315 22 5V14C22 15.6569 20.6569 17 19 17H17V19C17 20.6569 15.6569 22 14 22H5C3.34315 22 2 20.6569 2 19V10C2 8.34315 3.34315 7 5 7H7V5ZM9 7H14C15.6569 7 17 8.34315 17 10V15H19C19.5523 15 20 14.5523 20 14V5C20 4.44772 19.5523 4 19 4H10C9.44772 4 9 4.44772 9 5V7ZM5 9C4.44772 9 4 9.44772 4 10V19C4 19.5523 4.44772 20 5 20H14C14.5523 20 15 19.5523 15 19V10C15 9.44772 14.5523 9 14 9H5Z" fill="currentColor"></path></svg></button></span><span class="" data-state="closed"><button class="flex select-none items-center gap-1 px-4 py-1"><svg width="24" height="24" viewBox="0 0 24 24" fill="none" xmlns="http://www.w3.org/2000/svg" class="icon-xs"><path d="M2.5 5.5C4.3 5.2 5.2 4 5.5 2.5C5.8 4 6.7 5.2 8.5 5.5C6.7 5.8 5.8 7 5.5 8.5C5.2 7 4.3 5.8 2.5 5.5Z" fill="currentColor" stroke="currentColor" stroke-linecap="round" stroke-linejoin="round"></path><path d="M5.66282 16.5231L5.18413 19.3952C5.12203 19.7678 5.09098 19.9541 5.14876 20.0888C5.19933 20.2067 5.29328 20.3007 5.41118 20.3512C5.54589 20.409 5.73218 20.378 6.10476 20.3159L8.97693 19.8372C9.72813 19.712 10.1037 19.6494 10.4542 19.521C10.7652 19.407 11.0608 19.2549 11.3343 19.068C11.6425 18.8575 11.9118 18.5882 12.4503 18.0497L20 10.5C21.3807 9.11929 21.3807 6.88071 20 5.5C18.6193 4.11929 16.3807 4.11929 15 5.5L7.45026 13.0497C6.91175 13.5882 6.6425 13.8575 6.43197 14.1657C6.24513 14.4392 6.09299 14.7348 5.97903 15.0458C5.85062 15.3963 5.78802 15.7719 5.66282 16.5231Z" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round"></path><path d="M14.5 7L18.5 11" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round"></path></svg></button></span></div></div></div><div class="overflow-y-auto p-4" dir="ltr"><code class="!whitespace-pre"><span><span><span class="hljs-keyword">GET</span></span><span> </span><span><span class="hljs-operator">/</span></span><span>_list</span><span><span class="hljs-operator">/</span></span><span>wlm_stats?size</span><span><span class="hljs-operator">=</span></span><span><span class="hljs-number">50</span></span><span><span class="hljs-operator">&amp;</span></span><span>sort</span><span><span class="hljs-operator">=</span></span><span>node_id</span><span><span class="hljs-operator">&amp;</span></span><span><span class="hljs-keyword">order</span></span><span><span class="hljs-operator">=</span></span><span><span class="hljs-keyword">asc</span></span><span>
</span></span></code></div></div></pre>
<p data-start="1989" data-end="2061">Sorts results by <strong data-start="2008" data-end="2019">Node ID</strong>, providing a stable, structured overview.</p>
<h5 data-start="2063" data-end="2088"><strong data-start="2069" data-end="2088">Fetch Next Page</strong></h5>
<pre class="!overflow-visible" data-start="2089" data-end="2179"><div class="contain-inline-size rounded-md border-[0.5px] border-token-border-medium relative bg-token-sidebar-surface-primary"><div class="flex items-center text-token-text-secondary px-4 py-2 text-xs font-sans justify-between h-9 bg-token-sidebar-surface-primary dark:bg-token-main-surface-secondary select-none rounded-t-[5px]"></div><div class="sticky top-9"><div class="absolute bottom-0 right-0 flex h-9 items-center pr-2"><div class="flex items-center rounded bg-token-sidebar-surface-primary px-2 font-sans text-xs text-token-text-secondary dark:bg-token-main-surface-secondary"><span class="" data-state="closed"><button class="flex gap-1 items-center select-none px-4 py-1" aria-label="Copy"><svg width="24" height="24" viewBox="0 0 24 24" fill="none" xmlns="http://www.w3.org/2000/svg" class="icon-xs"><path fill-rule="evenodd" clip-rule="evenodd" d="M7 5C7 3.34315 8.34315 2 10 2H19C20.6569 2 22 3.34315 22 5V14C22 15.6569 20.6569 17 19 17H17V19C17 20.6569 15.6569 22 14 22H5C3.34315 22 2 20.6569 2 19V10C2 8.34315 3.34315 7 5 7H7V5ZM9 7H14C15.6569 7 17 8.34315 17 10V15H19C19.5523 15 20 14.5523 20 14V5C20 4.44772 19.5523 4 19 4H10C9.44772 4 9 4.44772 9 5V7ZM5 9C4.44772 9 4 9.44772 4 10V19C4 19.5523 4.44772 20 5 20H14C14.5523 20 15 19.5523 15 19V10C15 9.44772 14.5523 9 14 9H5Z" fill="currentColor"></path></svg></button></span><span class="" data-state="closed"><button class="flex select-none items-center gap-1 px-4 py-1"><svg width="24" height="24" viewBox="0 0 24 24" fill="none" xmlns="http://www.w3.org/2000/svg" class="icon-xs"><path d="M2.5 5.5C4.3 5.2 5.2 4 5.5 2.5C5.8 4 6.7 5.2 8.5 5.5C6.7 5.8 5.8 7 5.5 8.5C5.2 7 4.3 5.8 2.5 5.5Z" fill="currentColor" stroke="currentColor" stroke-linecap="round" stroke-linejoin="round"></path><path d="M5.66282 16.5231L5.18413 19.3952C5.12203 19.7678 5.09098 19.9541 5.14876 20.0888C5.19933 20.2067 5.29328 20.3007 5.41118 20.3512C5.54589 20.409 5.73218 20.378 6.10476 20.3159L8.97693 19.8372C9.72813 19.712 10.1037 19.6494 10.4542 19.521C10.7652 19.407 11.0608 19.2549 11.3343 19.068C11.6425 18.8575 11.9118 18.5882 12.4503 18.0497L20 10.5C21.3807 9.11929 21.3807 6.88071 20 5.5C18.6193 4.11929 16.3807 4.11929 15 5.5L7.45026 13.0497C6.91175 13.5882 6.6425 13.8575 6.43197 14.1657C6.24513 14.4392 6.09299 14.7348 5.97903 15.0458C5.85062 15.3963 5.78802 15.7719 5.66282 16.5231Z" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round"></path><path d="M14.5 7L18.5 11" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round"></path></svg></button></span></div></div></div><div class="overflow-y-auto p-4" dir="ltr"><code class="!whitespace-pre"><span><span><span class="hljs-keyword">GET</span></span><span> </span><span><span class="hljs-operator">/</span></span><span>_list</span><span><span class="hljs-operator">/</span></span><span>wlm_stats?size</span><span><span class="hljs-operator">=</span></span><span><span class="hljs-number">50</span></span><span><span class="hljs-operator">&amp;</span></span><span>sort</span><span><span class="hljs-operator">=</span></span><span>node_id</span><span><span class="hljs-operator">&amp;</span></span><span><span class="hljs-keyword">order</span></span><span><span class="hljs-operator">=</span></span><span><span class="hljs-keyword">asc</span></span><span><span class="hljs-operator">&amp;</span></span><span>next_token</span><span><span class="hljs-operator">=</span></span><span>Base64EncodedCursor
</span></span></code></div></div></pre>
<p data-start="2180" data-end="2251"> Uses <code data-start="2187" data-end="2199">next_token</code> to fetch <strong data-start="2209" data-end="2250">the next 50 results in a stable order</strong>.



### Related component

Search

### Describe alternatives you've considered

An alternative solution is to enhance the existing _wlm/stats API with filtering options, ensuring that only the most relevant statistics are retrieved.

### Key Features

1. **Targeted Data Retrieval**: Users can filter results by **Node ID, Query Group, CPU Usage, and Memory Usage** to retrieve only relevant information.
2. **Sorting Support**: Supports sorting by CPU Usage, Memory Usage, Node ID, and Query Group for better analysis.
3. **Tabular Output**: Maintains structured, easy-to-read output similar to _cat APIs.
4. **Performance Optimization**: Eliminates unnecessary data retrieval, improving query response times.

### Example API Calls
**Fetch Nodes with CPU Usage Above 50%**

```bash
GET/_wlm/stats?cpu_threshold=50
```
Returns only nodes consuming more than 50% CPU.

**Fetch Nodes with High Memory Usage**

```bash
GET/_wlm/stats?memory_threshold=70
```
Retrieves only nodes using more than 70% memory.

**Fetch Query Groups for a Specific Node**

```bash
GET/_wlm/stats?node_id=jPPwGjW-TA2NZB6Gn7RZtg
```
Returns query group statistics for the given node.

### Additional context

_No response_

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Feature Request] Paginating _wlm/stats API #17592

Is your feature request related to a problem? Please describe

Describe the solution you'd like

Key Features

Sorting Options

Example API Calls

Fetch First Page (Sorted by Query Group)

Fetch First Page (Sorted by Node ID)

Fetch Next Page

Related component

Describe alternatives you've considered

Key Features

Example API Calls

Additional context

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[Feature Request] Paginating _wlm/stats API #17592

Description

Is your feature request related to a problem? Please describe

Describe the solution you'd like

Key Features

Sorting Options

Example API Calls

Fetch First Page (Sorted by Query Group)

Fetch First Page (Sorted by Node ID)

Fetch Next Page

Related component

Describe alternatives you've considered

Key Features

Example API Calls

Additional context

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions