Weekly release: 0.19.0rc0 #3588
kaiyux
announced in
Announcements
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Hi,
The TensorRT-LLM team is pleased to announce that we have updated a weekly release
0.19.0rc0
, and pushed an update to the Triton backend this April 15, 2025.The
0.19.0rc0
dev release includes:examples/gemma/README.md
. (feat: Support gemma-3-1b-it #3247)ENABLE_MULTI_DEVICE
andENABLE_UCX
as CMake options (feat: register ENABLE_MULTI_DEVICE and ENABLE_UCX as CMake options #3343)PyExecutor
inference flow to estimatemax_num_tokens
forkv_cache_manager
(feat: Run PyExecutor's inference flow to estimate max_num_tokens for kv_cache_manager #3092)TLLM_OVERRIDE_LAYER_NUM
andTLLM_TRACE_MODEL_FORWARD
environment variables for debugging (feat: Support TLLM_OVERRIDE_LAYER_NUM and TLLM_TRACE_MODEL_FORWARD for debugging #3417)AutoTuner
to both Fused MoE and NVFP4 Linear operators (feat: Apply the new torch-flow compatible AutoTuner to both Fused MoE and NVFP4 Linear operators. #3151)UserBuffers
allocator for PyTorch flow (feat: Introduce UB allocator for pytorch flow #3257)init.py
(feat: Enhance the integrated robustness of scaffolding with __init__.… #3312)numNodes
toParallelConfig
(feat: Add numNodes to ParallelConfig #3346)KvCacheConfig
inexamples/gpqa_llmapi.py
(feat: add qwen2 moe to torch flow; fix wrong imported KvCacheConfig in gpqa… #3369)max_seq_len
inexecutor_config
(fix: fix max_seq_len in executor_config #3487)context_and_generation
request type in disaggregated overlap (fix: Allow context_and_generation request type in disagg overlap #3489)py_decoding_iter
update in the decoder (fix: fix the py_decoding_iter update in decoder #3297)FP4Linear
(fix [NVBUG 5208255] Fix missing bias add for FP4Linear. #3361)test_deepseek_allreduce.py
(fix: runtime error in test_deepseek_allreduce.py #3226)PyExecutor
and improved TP support (Fix torch nvsmall through pyexecutor and fix its TP support #3238)The cut-off commit to this release is 258ae9c. The code changes can be seen here: 5aeef6d...258ae9c.
Thanks,
The TensorRT-LLM Engineering Team
Beta Was this translation helpful? Give feedback.
All reactions