-
-
Notifications
You must be signed in to change notification settings - Fork 7.1k
[Usage]: OpenAI Server API #17075
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Could you please share a reproducer? I was unable to reproduce this with: $ vllm serve meta-llama/Llama-3.2-1B-Instruct
...
INFO 04-24 10:39:33 [serving_chat.py:118] Using default chat sampling params from model: {'temperature': 0.6, 'top_p': 0.9}
... from openai import OpenAI
client = OpenAI(
api_key="NOTHING",
base_url="http://localhost:8000/v1",
)
client.chat.completions.create(
model="meta-llama/Llama-3.2-1B-Instruct",
messages=[{"role": "user", "content": "How are you?"}],
top_p=1,
extra_body={"top_k": 1},
) INFO 04-24 10:40:18 [logger.py:39] Received request chatcmpl-3a6246ebabfc4e99ab9142053e02a0f6:
prompt: '<|begin_of_text|><|start_header_id|>system<|end_header_id|>\n\nCutting Knowledge Date: December 2023\nToday Date: 24 Apr 2025\n\n<|eot_id|><|start_header_id|>user<|end_header_id|>\n\nHow are you?<|eot_id|><|start_header_id|>assistant<|end_header_id|>\n\n',
params: SamplingParams(
n=1, presence_penalty=0.0, frequency_penalty=0.0, repetition_penalty=1.0,
temperature=0.6, top_p=1.0, top_k=1, min_p=0.0, seed=None, stop=[],
stop_token_ids=[], bad_words=[], include_stop_str_in_output=False,
ignore_eos=False, max_tokens=131033, min_tokens=0, logprobs=None,
prompt_logprobs=None, skip_special_tokens=True, spaces_between_special_tokens=True,
truncate_prompt_tokens=None, guided_decoding=None, extra_args=None),
prompt_token_ids: None,
lora_request: None,
prompt_adapter_request: None. (formatted for ease of reading) |
@hmellor, thanks for your quick answer. I did not mention this in my previous message, but it seems relevant. |
Ah ok, yes that is relevant. When Lines 377 to 382 in 2bc0f72
Therefore, it was a coincidence that you were setting |
Thanks @hmellor! We can close this issue. I'll let you know if I have more questions. |
Your current environment
How would you like to use vllm
I've set up a VLLM server using
vllm serve
to provide an OpenAI-like API (as described here. We are using VLLM to give a custom model.The server starts with the default parameters:
I'm trying to send
top_k=1
andtop_p=1
to make some experiments. Since top_p is supported by the OpenAI client, there is no issue; however, as correctly explained in the documentation, I should useextra_body={"top_k":1}
to pass a custom value fortop_k
. I did so, and unfortunately, I've noticed that the parameter did not pass through, as the parameters received by the service are:As you can see,
top_p
is passed correctly, but nottop_k
. The same issue happens via cURL. Is it possible I'm not using the API as I should, or is there a bug somewhere in the VLLM server parameters?Before submitting a new issue...
The text was updated successfully, but these errors were encountered: