(context_length_exceeded) This model's maximum context length is 128000 tokens for gpt-4.1

- **Package Name**: azure_ai_inference
- **Package Version**:  1.0.0b9.dist-info
- **Operating System**: Ubuntu (WSL2)  20.04.4 LTS
- **Python Version**:  python 3.12.9

**Describe the bug**
I use 4.1 model that has long context, but received the message

**To Reproduce**
Steps to reproduce the behavior:
1. Create a sample app.
2. Provide a message that has more than 128000 token
3. Get the error message. 

**Error Message**

``` 
Error during LLM analysis for emergency issue: (context_length_exceeded) This model's maximum context length is 128000 tokens. However, you requested 1124171 tokens (124171 in the messages, 1000000 in the completion). Please reduce the length of the messages or completion.
Code: context_length_exceeded
Message: This model's maximum context length is 128000 tokens. However, you requested 1124171 tokens (124171 in the messages, 1000000 in the completion). Please reduce the length of the messages or completion.
Traceback (most recent call last):
  File "/home/ushio/Code/Project/KustoAutogenExperiment/extract_emergency_issue.py", line 396, in analyze_with_llm
    response = openai_client.complete(
               ^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/ushio/Code/Project/KustoAutogenExperiment/venv/lib/python3.12/site-packages/azure/ai/inference/_patch.py", line 738, in complete
    raise HttpResponseError(response=response)
azure.core.exceptions.HttpResponseError: (context_length_exceeded) This model's maximum context length is 128000 tokens. However, you requested 1124171 tokens (124171 in the messages, 1000000 in the completion). Please reduce the length of the messages or completion.
Code: context_length_exceeded
Message: This model's maximum context length is 128000 tokens. However, you requested 1124171 tokens (124171 in the messages, 1000000 in the completion). Please reduce the length of the messages or completion.
Successfully processed ICM 626717326 for emergency issue information.
Emergency issue analysis failed: (context_length_exceeded) This model's maximum context length is 128000 tokens. However, you requested 1124171 tokens (124171 in the messages, 1000000 in the completion). Please reduce the length of the messages or completion.
Code: context_length_exceeded
Message: This model's maximum context length is 128000 tokens. However, you requested 1124171 tokens (124171 in the messages, 1000000 in the completion). Please reduce the length of the messages or completion.
All relevant files saved in the extracted_emergency_issues directory.
```

I change the `max_tokens` settings but result is the same.

```
  response = openai_client.complete(
      model=self.openai_deployment,
      messages=messages,
      max_tokens=1000000,  
      temperature=0.1 
  )
```

**Expected behavior**
In case of long context model like 4.1 we can use the context window.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

(context_length_exceeded) This model's maximum context length is 128000 tokens for gpt-4.1 #40986

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

(context_length_exceeded) This model's maximum context length is 128000 tokens for gpt-4.1 #40986

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions