feat: add flash_attn 2 to bert #27478

chiennv2000 · 2023-11-14T03:44:01Z

Feat: Add flash attention option for BERT
Usage:
model = BertModel.from_pretrained('bert-base-uncased', torch_dtype=torch.bfloat16, use_flash_attention_2=True)

This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
Did you read the contributor guideline,
Pull Request section?
Was this discussed/approved via a Github issue or the forum? Please add a link
to it if that's the case.
Did you make sure to update the documentation with your changes? Here are the
documentation guidelines, and
here are tips on formatting docstrings.
Did you write any new necessary tests?

@ArthurZucker and @younesbelkada

younesbelkada

Thanks a lot for your PR! In principle this looks great!
Many architecture uses BertAttention with # Copied from, therefore all these architectures could benefit from FA-2 for free, however you will need to apply _supports_flash_attn_2 = True on all these architectures. You need to
1- run make fix-copies
2- on modified architectures add the flag above + copy paste the BertFlashAttention class on all on them (with modified names).
Would you be happy to address these changes? Otherwise happy to help you!

src/transformers/models/bert/modeling_bert.py

chiennv2000 · 2023-11-14T17:15:31Z

Thanks a lot for your review and your suggestions @younesbelkada.
But I don't really familiar with make fix-copies command. Can you guide me on how to do that?

chiennv2000 · 2023-11-14T17:26:49Z

I appreciate your feedback. I'm happy to receive your assistance in implementing these changes.
If you could help me with other architectures, that would be fantastic. Additionally, I'm open to collaborating on extending this to the Roberta and XLMR model. @younesbelkada

younesbelkada · 2023-11-14T18:44:01Z

Perfect thanks!
As a first step, can you simply run make fix-copies and push the changes here? Then we'll take it over from there !

chiennv2000 · 2023-11-14T19:39:06Z

Thanks @younesbelkada , I did it

ArthurZucker · 2023-12-14T11:15:50Z

cc @younesbelkada

younesbelkada · 2024-01-08T08:12:10Z

Didn't had time to properly look into it, will do it asap!

kevinhu · 2024-01-31T18:11:46Z

Any updates on getting this PR merged?

hackyon · 2024-02-07T19:48:15Z

Hello there!

I'm working on integrating scaled_dot_product_attention to BERT #28802, and there might be some merge conflicts with this change. Mostly, if my changes go through, then we can get rid of most of the downstream dependencies from fix-copies.

Let me know if you have any questions. Happy to discuss and/or chat on the best way forward if necessary.

github-actions · 2024-03-03T08:05:28Z

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.

loswald · 2024-04-21T02:00:13Z

This would be a lifesaver for me, I hope merging this is prioritized! cc @younesbelkada
@chiennv2000 what kind of speedups are you observing with this?

hackyon · 2024-04-22T13:15:20Z

This would be a lifesaver for me, I hope merging this is prioritized! cc @younesbelkada @chiennv2000 what kind of speedups are you observing with this?

@loswald - you can see a quick estimate of the speedups in #28802. The pytorch SDPA implementation uses FA2 under the hood (if your hardware supports it). The PR is ready but we're just waiting on the HG team to merge it.

feat: add flash_attn 2 to bert

6097fa9

younesbelkada reviewed Nov 14, 2023

View reviewed changes

src/transformers/models/bert/modeling_bert.py Show resolved Hide resolved

add comment

df79d22

add make fix-copies

0393551

huggingface deleted a comment from github-actions bot Dec 14, 2023

huggingface deleted a comment from github-actions bot Jan 8, 2024

pommedeterresautee mentioned this pull request Feb 7, 2024

[BERT] Add support for sdpa #28802

Merged

5 tasks

github-actions bot closed this Mar 11, 2024

kevinhu mentioned this pull request Apr 2, 2024

Flash attention implementation with BERT base model #29129

Open

2 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat: add flash_attn 2 to bert #27478

feat: add flash_attn 2 to bert #27478

Uh oh!

chiennv2000 commented Nov 14, 2023 •

edited

Loading

Uh oh!

younesbelkada left a comment

Uh oh!

Uh oh!

chiennv2000 commented Nov 14, 2023

Uh oh!

chiennv2000 commented Nov 14, 2023

Uh oh!

younesbelkada commented Nov 14, 2023

Uh oh!

chiennv2000 commented Nov 14, 2023

Uh oh!

ArthurZucker commented Dec 14, 2023

Uh oh!

younesbelkada commented Jan 8, 2024

Uh oh!

kevinhu commented Jan 31, 2024

Uh oh!

hackyon commented Feb 7, 2024

Uh oh!

github-actions bot commented Mar 3, 2024

Uh oh!

loswald commented Apr 21, 2024 •

edited

Loading

Uh oh!

hackyon commented Apr 22, 2024

Uh oh!

Uh oh!

feat: add flash_attn 2 to bert #27478

feat: add flash_attn 2 to bert #27478

Uh oh!

Conversation

chiennv2000 commented Nov 14, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

younesbelkada left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

chiennv2000 commented Nov 14, 2023

Uh oh!

chiennv2000 commented Nov 14, 2023

Uh oh!

younesbelkada commented Nov 14, 2023

Uh oh!

chiennv2000 commented Nov 14, 2023

Uh oh!

ArthurZucker commented Dec 14, 2023

Uh oh!

younesbelkada commented Jan 8, 2024

Uh oh!

kevinhu commented Jan 31, 2024

Uh oh!

hackyon commented Feb 7, 2024

Uh oh!

github-actions bot commented Mar 3, 2024

Uh oh!

loswald commented Apr 21, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

hackyon commented Apr 22, 2024

Uh oh!

Uh oh!

chiennv2000 commented Nov 14, 2023 •

edited

Loading

loswald commented Apr 21, 2024 •

edited

Loading