Skip to content

Commit c79853e

Browse files
committed
update readme for disaggregated
Signed-off-by: Chuang Zhu <[email protected]>
1 parent e232d03 commit c79853e

File tree

1 file changed

+11
-2
lines changed

1 file changed

+11
-2
lines changed

examples/disaggregated/README.md

+11-2
Original file line numberDiff line numberDiff line change
@@ -9,11 +9,18 @@ You can use multiple `trtllm-serve` commands to launch the context and generatio
99
for disaggregated serving. For example, you could launch two context servers and one generation servers as follows:
1010

1111
```
12+
echo -e "pytorch_backend_config:\n enable_overlap_scheduler: true" > extra-llm-api-config.yml
13+
1214
export TRTLLM_USE_UCX_KVCACHE=1
1315
#Context servers
14-
trtllm-serve TinyLlama/TinyLlama-1.1B-Chat-v1.0 --host localhsot --port 8001 --backend pytorch &> log_ctx_0 &
15-
trtllm-serve TinyLlama/TinyLlama-1.1B-Chat-v1.0 --host localhsot --port 8002 --backend pytorch &> log_ctx_1 &
16+
export CUDA_VISIBLE_DEVICES=0
17+
trtllm-serve TinyLlama/TinyLlama-1.1B-Chat-v1.0 --host localhsot --port 8001 --backend pytorch --extra_llm_api_options ./extra-llm-api-config.yml &> log_ctx_0 &
18+
19+
export CUDA_VISIBLE_DEVICES=1
20+
trtllm-serve TinyLlama/TinyLlama-1.1B-Chat-v1.0 --host localhsot --port 8002 --backend pytorch --extra_llm_api_options ./extra-llm-api-config.yml &> log_ctx_1 &
21+
1622
#Generation servers
23+
export CUDA_VISIBLE_DEVICES=2
1724
trtllm-serve TinyLlama/TinyLlama-1.1B-Chat-v1.0 --host localhsot --port 8003 --backend pytorch &> log_gen_0 &
1825
```
1926
Once the context and generation servers are launched, you can launch the disaggregated
@@ -30,10 +37,12 @@ hostname: localhost
3037
port: 8000
3138
backend: pytorch
3239
context_servers:
40+
num_instances: 2
3341
urls:
3442
- "localhost:8001"
3543
- "localhost:8002"
3644
generation_servers:
45+
num_instances: 1
3746
urls:
3847
- "localhost:8003"
3948
```

0 commit comments

Comments
 (0)