Open
Description
- Package Name: azure_ai_inference
- Package Version: 1.0.0b9.dist-info
- Operating System: Ubuntu (WSL2) 20.04.4 LTS
- Python Version: python 3.12.9
Describe the bug
I use 4.1 model that has long context, but received the message
To Reproduce
Steps to reproduce the behavior:
- Create a sample app.
- Provide a message that has more than 128000 token
- Get the error message.
Error Message
Error during LLM analysis for emergency issue: (context_length_exceeded) This model's maximum context length is 128000 tokens. However, you requested 1124171 tokens (124171 in the messages, 1000000 in the completion). Please reduce the length of the messages or completion.
Code: context_length_exceeded
Message: This model's maximum context length is 128000 tokens. However, you requested 1124171 tokens (124171 in the messages, 1000000 in the completion). Please reduce the length of the messages or completion.
Traceback (most recent call last):
File "/home/ushio/Code/Project/KustoAutogenExperiment/extract_emergency_issue.py", line 396, in analyze_with_llm
response = openai_client.complete(
^^^^^^^^^^^^^^^^^^^^^^^
File "/home/ushio/Code/Project/KustoAutogenExperiment/venv/lib/python3.12/site-packages/azure/ai/inference/_patch.py", line 738, in complete
raise HttpResponseError(response=response)
azure.core.exceptions.HttpResponseError: (context_length_exceeded) This model's maximum context length is 128000 tokens. However, you requested 1124171 tokens (124171 in the messages, 1000000 in the completion). Please reduce the length of the messages or completion.
Code: context_length_exceeded
Message: This model's maximum context length is 128000 tokens. However, you requested 1124171 tokens (124171 in the messages, 1000000 in the completion). Please reduce the length of the messages or completion.
Successfully processed ICM 626717326 for emergency issue information.
Emergency issue analysis failed: (context_length_exceeded) This model's maximum context length is 128000 tokens. However, you requested 1124171 tokens (124171 in the messages, 1000000 in the completion). Please reduce the length of the messages or completion.
Code: context_length_exceeded
Message: This model's maximum context length is 128000 tokens. However, you requested 1124171 tokens (124171 in the messages, 1000000 in the completion). Please reduce the length of the messages or completion.
All relevant files saved in the extracted_emergency_issues directory.
I change the max_tokens
settings but result is the same.
response = openai_client.complete(
model=self.openai_deployment,
messages=messages,
max_tokens=1000000,
temperature=0.1
)
Expected behavior
In case of long context model like 4.1 we can use the context window.
Metadata
Metadata
Assignees
Labels
Issues related to the client library for Azure AI Model Inference (\sdk\ai\azure-ai-inference)This issue points to a problem in the data-plane of the library.Workflow: This issue is responsible by Azure service team.Issues that are reported by GitHub users external to the Azure organization.Workflow: This issue needs attention from Azure service team or SDK teamThe issue doesn't require a change to the product in order to be resolved. Most issues start as that