TensorRT-LLM Release 0.17.0 #2726
zeroepoch
announced in
Announcements
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Hi,
We are very pleased to announce the 0.17.0 version of TensorRT-LLM. This update includes:
Key Features and Enhancements
LLM
API andtrtllm-bench
command.tensorrt_llm._torch
. The following is a list of supported infrastructure, models, and features that can be used with the PyTorch workflow.LLM
API.examples/multimodal/README.md
.userbuffer
based AllReduce-Norm fusion kernel.executor
API.API Changes
paged_context_fmha
is enabled.--concurrency
support for thethroughput
subcommand oftrtllm-bench
.Fixed Issues
cluster_key
for auto parallelism feature. ([feature request] Can we add H200 in infer_cluster_key() method? #2552)__post_init__
function ofLLmArgs
Class. Thanks for the contribution from @topenkoff in Fix kwarg name #2691.Infrastructure Changes
nvcr.io/nvidia/pytorch:25.01-py3
.nvcr.io/nvidia/tritonserver:25.01-py3
.Known Issues
--extra-index-url https://pypi.nvidia.com
when runningpip install tensorrt-llm
due to new third-party dependencies.Beta Was this translation helpful? Give feedback.
All reactions