Skip to content

(context_length_exceeded) This model's maximum context length is 128000 tokens for gpt-4.1 #40986

Open
@TsuyoshiUshio

Description

@TsuyoshiUshio
  • Package Name: azure_ai_inference
  • Package Version: 1.0.0b9.dist-info
  • Operating System: Ubuntu (WSL2) 20.04.4 LTS
  • Python Version: python 3.12.9

Describe the bug
I use 4.1 model that has long context, but received the message

To Reproduce
Steps to reproduce the behavior:

  1. Create a sample app.
  2. Provide a message that has more than 128000 token
  3. Get the error message.

Error Message

Error during LLM analysis for emergency issue: (context_length_exceeded) This model's maximum context length is 128000 tokens. However, you requested 1124171 tokens (124171 in the messages, 1000000 in the completion). Please reduce the length of the messages or completion.
Code: context_length_exceeded
Message: This model's maximum context length is 128000 tokens. However, you requested 1124171 tokens (124171 in the messages, 1000000 in the completion). Please reduce the length of the messages or completion.
Traceback (most recent call last):
  File "/home/ushio/Code/Project/KustoAutogenExperiment/extract_emergency_issue.py", line 396, in analyze_with_llm
    response = openai_client.complete(
               ^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/ushio/Code/Project/KustoAutogenExperiment/venv/lib/python3.12/site-packages/azure/ai/inference/_patch.py", line 738, in complete
    raise HttpResponseError(response=response)
azure.core.exceptions.HttpResponseError: (context_length_exceeded) This model's maximum context length is 128000 tokens. However, you requested 1124171 tokens (124171 in the messages, 1000000 in the completion). Please reduce the length of the messages or completion.
Code: context_length_exceeded
Message: This model's maximum context length is 128000 tokens. However, you requested 1124171 tokens (124171 in the messages, 1000000 in the completion). Please reduce the length of the messages or completion.
Successfully processed ICM 626717326 for emergency issue information.
Emergency issue analysis failed: (context_length_exceeded) This model's maximum context length is 128000 tokens. However, you requested 1124171 tokens (124171 in the messages, 1000000 in the completion). Please reduce the length of the messages or completion.
Code: context_length_exceeded
Message: This model's maximum context length is 128000 tokens. However, you requested 1124171 tokens (124171 in the messages, 1000000 in the completion). Please reduce the length of the messages or completion.
All relevant files saved in the extracted_emergency_issues directory.

I change the max_tokens settings but result is the same.

  response = openai_client.complete(
      model=self.openai_deployment,
      messages=messages,
      max_tokens=1000000,  
      temperature=0.1 
  )

Expected behavior
In case of long context model like 4.1 we can use the context window.

Metadata

Metadata

Assignees

Labels

AI Model InferenceIssues related to the client library for Azure AI Model Inference (\sdk\ai\azure-ai-inference)ClientThis issue points to a problem in the data-plane of the library.Service AttentionWorkflow: This issue is responsible by Azure service team.customer-reportedIssues that are reported by GitHub users external to the Azure organization.needs-team-attentionWorkflow: This issue needs attention from Azure service team or SDK teamquestionThe issue doesn't require a change to the product in order to be resolved. Most issues start as that

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions