You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
feat: allocate minimal blocks per window size (#3028)
* implement variable window attention by breaking the block manager into window block managers per window size
Signed-off-by: Netanel Haber <[email protected]>
* revert isCyclic to be true if the min attention window is reached, not per window size
Signed-off-by: Netanel Haber <[email protected]>
* add explanatory comment to mCyclicThreshold
Signed-off-by: Netanel Haber <[email protected]>
* load correct gemma config
Signed-off-by: Netanel Haber <[email protected]>
* don't shadow inputLength in addSequence - it should remain the function scope input length between window size loop iterations
Signed-off-by: Netanel Haber <[email protected]>
* fix KVCacheManagerVariableWindowAttentionWithReuseTest for multiple window block managers
Signed-off-by: Netanel Haber <[email protected]>
* if TYPE_CHECKING
Signed-off-by: Netanel Haber <[email protected]>
* set temp_attention_window_inputs to None explicitly
Signed-off-by: Netanel Haber <[email protected]>
* set temp_attention_window_inputs to None explicitly
Signed-off-by: Netanel Haber <[email protected]>
* pass dtype as well
Signed-off-by: Netanel Haber <[email protected]>
* test_gemma variable sliding window attention
Signed-off-by: Netanel Haber <[email protected]>
* allot a fraction of primary/secondaryBlocks to different window size heaps, depending on the window size's total contribution to the kvcache size (i.e., including all layers)
Signed-off-by: Netanel Haber <[email protected]>
* remove || mEnableBlockReuse which erroneously triggers beamsearch code for cyclic variable attention window code
Signed-off-by: Netanel Haber <[email protected]>
* turn off request delaying for MaxUtil
Signed-off-by: Netanel Haber <[email protected]>
* make comments better
Signed-off-by: Netanel Haber <[email protected]>
* windowSizesTotalSum using std::accumulate
Signed-off-by: Netanel Haber <[email protected]>
* fix error handling of forwardAsync - forwardAsync catch-all catch cleanup code that runs terminateRequest can also fail and must be caught
Signed-off-by: Netanel Haber <[email protected]>
* fix comments
Signed-off-by: Netanel Haber <[email protected]>
* remove assert that kills disagg tests, since it isn't necessary
Signed-off-by: Netanel Haber <[email protected]>
* fix corrupted expression: 'isNewTask && (peftCacheManager ?' -> '(isNewTask && peftCacheManager) ?' which caused boolean algebra. Main is correct
Signed-off-by: Netanel Haber <[email protected]>
* add Gemma3 to SUPPORTED_HF_ARCHITECTURES
Signed-off-by: Netanel Haber <[email protected]>
* support Gemma3
Signed-off-by: Netanel Haber <[email protected]>
* finally fix test_gemma - always spread at least {} into generate_summary_cmd, never None
Signed-off-by: Netanel Haber <[email protected]>
* finally fix test_gemma - always spread at least {} into generate_summary_cmd, never None
Signed-off-by: Netanel Haber <[email protected]>
* fix kvfactor field for deepseek
Signed-off-by: Netanel Haber <[email protected]>
* fix comment
Signed-off-by: Netanel Haber <[email protected]>
* fix gemma-3 entries in testlist to include vswa
Signed-off-by: Netanel Haber <[email protected]>
* only quantize gemma2 VSWA
Signed-off-by: Netanel Haber <[email protected]>
remove misleading comment
Signed-off-by: Netanel Haber <[email protected]>
fix test_gemma
Signed-off-by: Netanel Haber <[email protected]>
* fix test_gemma
Signed-off-by: Netanel Haber <[email protected]>
* fix test_gemma
Signed-off-by: Netanel Haber <[email protected]>
* in sendRequestInfo, fromOldAllocatedBlockIds->fromOldAllocatedBlockIds, like in main
Signed-off-by: Netanel Haber <[email protected]>
* fix: disable KV cache reuse if using attention sink (#3021)
* fix: disable KV cache reuse if using attention sink
Signed-off-by: Robin Kobus <[email protected]>
* fix: disable KV cache reuse if sink bubble
Signed-off-by: Robin Kobus <[email protected]>
* add comment
Signed-off-by: Robin Kobus <[email protected]>
---------
Signed-off-by: Robin Kobus <[email protected]>
---------
Signed-off-by: Netanel Haber <[email protected]>
Signed-off-by: Robin Kobus <[email protected]>
Co-authored-by: Robin Kobus <[email protected]>
0 commit comments