CNDB-13583: Add vector ann and brute force metrics #1683

michaeljmarshall · 2025-04-08T22:47:35Z

What is the issue

https://github.com/riptano/cndb/issues/13583

What does this PR fix and why was it fixed

This PR adds comprehensive metrics for Storage Attached Indexes (SAI) vector search operations, providing crucial insights into both ANN (Approximate Nearest Neighbor) graph searches and brute force operations.

New Vector Search Metrics:

Search Operation Counters:

ANNNodesVisited: Total number of nodes visited during ANN searches (this is equivalent to approximate similarity score computations)
ANNNodesReranked: Number of nodes that underwent exact distance computation for reranking (this is equivalent to exact similarity score computations)
ANNNodesExpanded: Total number of nodes whose edges were explored
ANNNodesExpandedBaseLayer: Number of nodes expanded in the base layer of the graph
ANNGraphSearches: Count of new graph searches initiated
ANNGraphResumes: Count of resumed graph searches (when a search continues from previous results)
ANNGraphSearchLatency: Timer measuring individual graph search latency (Note: This measures per-graph search time, not total query time which may involve multiple graphs)

Brute Force Operation Counters:

BruteForceNodesVisited: Number of nodes visited during brute force searches (approximate similarity comparisons)
BruteForceNodesReranked: Number of nodes that underwent exact similarity computation during brute force searches

Memory Usage Tracking:

quantizationMemoryBytes: Current memory usage by the quantization (PQ or BQ) data structures
ordinalsMapMemoryBytes: Current memory usage by ordinals mapping structures (only matters in some cases)
onDiskGraphsCount: Number of currently loaded graph segments
onDiskGraphVectorsCount: Total number of vectors in currently loaded graphs

These metrics will help us:

Understand if we are performing more comparisons than expected
Get insight into number of graphs queried
Get insight into the brute force vs graph query path
Understand current memory utilization

The counters provide operations/second metrics, allowing calculation of per-query averages by dividing by the number of queries. The memory tracking metrics help monitor resource usage across graph segments as they are loaded and unloaded.

github-actions · 2025-04-08T22:47:51Z

sonarqubecloud · 2025-04-09T23:04:38Z

Quality Gate passed

Issues
1 New issue
0 Accepted issues

Measures
0 Security Hotspots
95.6% Coverage on New Code
0.0% Duplication on New Code

See analysis details on SonarQube Cloud

cassci-bot · 2025-04-09T23:09:08Z

✔️ Build ds-cassandra-pr-gate/PR-1683 approved by Butler

Approved by Butler
See build details here

eolivelli

+1

### What is the issue riptano/cndb#13583 ### What does this PR fix and why was it fixed This PR adds comprehensive metrics for Storage Attached Indexes (SAI) vector search operations, providing crucial insights into both ANN (Approximate Nearest Neighbor) graph searches and brute force operations. New Vector Search Metrics: Search Operation Counters: - `ANNNodesVisited`: Total number of nodes visited during ANN searches (this is equivalent to approximate similarity score computations) - `ANNNodesReranked`: Number of nodes that underwent exact distance computation for reranking (this is equivalent to exact similarity score computations) - `ANNNodesExpanded`: Total number of nodes whose edges were explored - `ANNNodesExpandedBaseLayer`: Number of nodes expanded in the base layer of the graph - `ANNGraphSearches`: Count of new graph searches initiated - `ANNGraphResumes`: Count of resumed graph searches (when a search continues from previous results) - `ANNGraphSearchLatency`: Timer measuring individual graph search latency (Note: This measures per-graph search time, not total query time which may involve multiple graphs) Brute Force Operation Counters: - `BruteForceNodesVisited`: Number of nodes visited during brute force searches (approximate similarity comparisons) - `BruteForceNodesReranked`: Number of nodes that underwent exact similarity computation during brute force searches Memory Usage Tracking: - `quantizationMemoryBytes`: Current memory usage by the quantization (PQ or BQ) data structures - `ordinalsMapMemoryBytes`: Current memory usage by ordinals mapping structures (only matters in some cases) - `onDiskGraphsCount`: Number of currently loaded graph segments - `onDiskGraphVectorsCount`: Total number of vectors in currently loaded graphs These metrics will help us: 1. Understand if we are performing more comparisons than expected 2. Get insight into number of graphs queried 3. Get insight into the brute force vs graph query path 4. Understand current memory utilization The counters provide operations/second metrics, allowing calculation of per-query averages by dividing by the number of queries. The memory tracking metrics help monitor resource usage across graph segments as they are loaded and unloaded.

michaeljmarshall force-pushed the cndb-13583 branch from 90bfcc9 to f6828b0 Compare April 9, 2025 22:15

michaeljmarshall changed the title ~~CNDB-13583: save initial work in progress~~ CNDB-13583: Add vector ann and brute force metrics Apr 9, 2025

michaeljmarshall marked this pull request as ready for review April 9, 2025 22:15

michaeljmarshall self-assigned this Apr 9, 2025

CNDB-13583: Add vector ann and brute force metrics

8ed43a6

michaeljmarshall force-pushed the cndb-13583 branch from f6828b0 to 8ed43a6 Compare April 9, 2025 22:30

eolivelli approved these changes Apr 10, 2025

View reviewed changes

eolivelli merged commit c02e486 into main Apr 10, 2025
469 of 475 checks passed

eolivelli deleted the cndb-13583 branch April 10, 2025 15:25

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

CNDB-13583: Add vector ann and brute force metrics #1683

CNDB-13583: Add vector ann and brute force metrics #1683

Uh oh!

michaeljmarshall commented Apr 8, 2025 •

edited

Loading

Uh oh!

github-actions bot commented Apr 8, 2025

Uh oh!

sonarqubecloud bot commented Apr 9, 2025

Uh oh!

cassci-bot commented Apr 9, 2025

Uh oh!

eolivelli left a comment

Uh oh!

Uh oh!

Uh oh!

CNDB-13583: Add vector ann and brute force metrics #1683

CNDB-13583: Add vector ann and brute force metrics #1683

Uh oh!

Conversation

michaeljmarshall commented Apr 8, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What is the issue

What does this PR fix and why was it fixed

Uh oh!

github-actions bot commented Apr 8, 2025

Checklist before you submit for review

Uh oh!

sonarqubecloud bot commented Apr 9, 2025

Quality Gate passed

Uh oh!

cassci-bot commented Apr 9, 2025

✔️ Build ds-cassandra-pr-gate/PR-1683 approved by Butler

Uh oh!

eolivelli left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

michaeljmarshall commented Apr 8, 2025 •

edited

Loading