Does llama-batch-bench help to get first token latency and steady state latency then How? #14457

gsakkar · 2025-06-30T08:21:25Z

gsakkar
Jun 30, 2025

I would like to know the right way to get first token latency and steady state token latency using llama-batch-bench i prefer the llama-batched-bench over llama-bench since batched-bench can perform multiple streams

My understanding is that
Steady-state latency : The average time it takes to produce each token is once your model and data pipeline are fully warmed up and running continuously
First-token latency The time from triggering a model inference to seeing the very first token of its output, including any one-time setup costs.

i used the below code snippet getting 6 permutation npp * ntg for 4 (npl) stream is this right way way validate the model latency calculcation

Further help required to get the latency per token based on latency i will choose the right LLM model in my application

`TIME_FORMAT_USAGE="Memory: %M: kilobytes, CPU: %P: percent" OUTPUT_FILE="llama-batched-bench-log.txt"


/usr/local/bin/time -f 

"$TIME_FORMAT_USAGE" 

"$EXEC" 

-m "$MODEL_PATH" 

-pps --file prompts.txt 

-npp 128,256,512 

-ntg 128,256 

-npl 4 

-t "$THREADS" 

--temp 0.7 --repeat_penalty 1.1 

--output-format jsonl 

-v  >> "$OUTPUT_FILE" 2>&1

echo "completed all runs - check the log file $OUTPUT_FILE"`

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Does llama-batch-bench help to get first token latency and steady state latency then How? #14457

Uh oh!

{{title}}

Uh oh!

Replies: 0 comments

Select a reply

Uh oh!

Does llama-batch-bench help to get first token latency and steady state latency then How? #14457

Uh oh!

gsakkar Jun 30, 2025

Replies: 0 comments

gsakkar
Jun 30, 2025