-
Notifications
You must be signed in to change notification settings - Fork 1.4k
feat: [Deepseek] Redesign multi-stream API #3459
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
/bot run --disable-fail-fast |
PR_Github #1825 [ run ] triggered by Bot |
PR_Github #1825 [ run ] completed with state |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
/bot run --disable-fail-fast |
PR_Github #1967 [ run ] triggered by Bot |
Signed-off-by: Hao Lu <[email protected]>
PR_Github #1967 [ run ] completed with state |
This PR applied the new API to Deepseek only. Llama will be updated later.
Applied multi-stream to overlap bmm and concat in
MLA.forward_generation
.