Open
Description
Is your feature request related to a problem? Please describe.
No.
Describe the solution you'd like
DeepSpeed FastGen is an inference framework developed by MicroSoft. They claim that it's two times faster than vllm. https://github.com/microsoft/DeepSpeed/tree/master/blogs/deepspeed-fastgen
Describe alternatives you've considered
No.
Additional context
I haven't tested FastGen, just attracted by their blog. I searched in this repo, seems no one mentioned this framework yet, so I'd like to bring it to the attention of community.