Closed
Description
... SDPA causal mask generation may be wrong for the mask generation.
transformers/src/transformers/modeling_attn_mask_utils.py
Lines 421 to 433 in 76fa17c
Will it be safe to just return None
for the else:
case?
For causal attention, we can just use _prepare_4d_causal_attention_mask_for_sdpa
Related issues:
pytorch/pytorch#108108
Dao-AILab/flash-attention@9e5e8bc
#28802
Metadata
Metadata
Assignees
Labels
No labels