You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Most of the cost is from one day backfill as shown in the image below
However, we found another potential risk while investigating this issue.
This is due to the query below
SELECTbase.uri, base.post_id, base.canister_id, base.timestamp, distance FROM
VECTOR_SEARCH(
(
SELECT*FROM`hot-or-not-feed-intelligence.yral_ds.video_index`WHERE uri NOT IN<watch_history>AND is_nsfw = False AND nsfw_ec ='neutral'AND post_id is not nullAND canister_id is not nullAND TIMESTAMP_TRUNC(TIMESTAMP(SUBSTR(timestamp, 1, 26)), MICROSECOND) > TIMESTAMP_SUB(CURRENT_TIMESTAMP(), INTERVAL 2 DAY)
),
'embedding',
(
SELECT embedding
FROM`hot-or-not-feed-intelligence.yral_ds.video_index`WHERE uri IN<watch_history>AND is_nsfw = False AND nsfw_ec ='neutral'AND post_id is not nullAND canister_id is not null
),
top_k =>12,
options =>'{"fraction_lists_to_search":0.6}'-- CAUTION: This is high at the moment owing to the sparsity of the data, as an when we will have good number of recent uploads, this has to go down!
)
ORDER BY distance
This is scanning ~7.3 GB of data. The issue for this is due to the parameter fraction_list_to_search being 0.6
fraction_lists_to_search: This is a number that specifies the percentage of lists to search. For example, options => '{"fraction_lists_to_search":0.15}'. The fraction_lists_to_search value must be in the range 0.0 to 1.0, exclusive.
Specifying a higher percentage leads to higher recall and slower performance, and the converse is true when specifying a lower percentage.
Since the diversity in the existing data that we have is already less, having a lower fraction_lists_to_search would further reduce the recall. This is the reason for the fraction_lists_to_search is high.
Once we have more data to index, we can reduce the cost by safely reducing this parameter.
The text was updated successfully, but these errors were encountered: