Replies: 3 comments 3 replies
-
When setting beam size to 5, we will predict different output tokens for differnet beams. For such a case, in the decode phase, if batch_size=1, the equivalent batch will be changed from 1 to 5. So the latency will increase. |
Beta Was this translation helpful? Give feedback.
-
So i need to set batch size to 5 to reduce the increased latency? |
Beta Was this translation helpful? Give feedback.
-
TRT-LLM is 3 times faster than CT2 for whisper in almost all cases, you just need to give it enough inputs to see the difference, if you are using it for personal usage and not production, then stick to faster-whiper. |
Beta Was this translation helpful? Give feedback.
-
When I set the beam size to 1, TensorRT-LLM is about 50% faster than Faster Whisper. However, when I set the beam size to 5, the speeds are roughly the same. TensorRT-LLM's latency increase is significantly greater than Faster Whisper's. Any thoughts?
Beta Was this translation helpful? Give feedback.
All reactions