Skip to content

Commit f3237e5

Browse files
authored
update readme for disaggregated (#3323)
Signed-off-by: Chuang Zhu <[email protected]>
1 parent 3767310 commit f3237e5

File tree

1 file changed

+7
-3
lines changed

1 file changed

+7
-3
lines changed

examples/disaggregated/README.md

+7-3
Original file line numberDiff line numberDiff line change
@@ -9,12 +9,14 @@ You can use multiple `trtllm-serve` commands to launch the context and generatio
99
for disaggregated serving. For example, you could launch two context servers and one generation servers as follows:
1010

1111
```
12+
echo -e "pytorch_backend_config:\n enable_overlap_scheduler: False" > extra-llm-api-config.yml
13+
1214
export TRTLLM_USE_UCX_KVCACHE=1
1315
#Context servers
14-
trtllm-serve TinyLlama/TinyLlama-1.1B-Chat-v1.0 --host localhsot --port 8001 --backend pytorch &> log_ctx_0 &
15-
trtllm-serve TinyLlama/TinyLlama-1.1B-Chat-v1.0 --host localhsot --port 8002 --backend pytorch &> log_ctx_1 &
16+
CUDA_VISIBLE_DEVICES=0 trtllm-serve TinyLlama/TinyLlama-1.1B-Chat-v1.0 --host localhost --port 8001 --backend pytorch --extra_llm_api_options ./extra-llm-api-config.yml &> log_ctx_0 &
17+
CUDA_VISIBLE_DEVICES=1 trtllm-serve TinyLlama/TinyLlama-1.1B-Chat-v1.0 --host localhost --port 8002 --backend pytorch --extra_llm_api_options ./extra-llm-api-config.yml &> log_ctx_1 &
1618
#Generation servers
17-
trtllm-serve TinyLlama/TinyLlama-1.1B-Chat-v1.0 --host localhsot --port 8003 --backend pytorch &> log_gen_0 &
19+
CUDA_VISIBLE_DEVICES=2 trtllm-serve TinyLlama/TinyLlama-1.1B-Chat-v1.0 --host localhost --port 8003 --backend pytorch &> log_gen_0 &
1820
```
1921
Once the context and generation servers are launched, you can launch the disaggregated
2022
server, which will accept requests from clients and do the orchestration between context
@@ -30,10 +32,12 @@ hostname: localhost
3032
port: 8000
3133
backend: pytorch
3234
context_servers:
35+
num_instances: 2
3336
urls:
3437
- "localhost:8001"
3538
- "localhost:8002"
3639
generation_servers:
40+
num_instances: 1
3741
urls:
3842
- "localhost:8003"
3943
```

0 commit comments

Comments
 (0)