-
Notifications
You must be signed in to change notification settings - Fork 942
Support Thinking part #1142
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Support Thinking part #1142
Conversation
Docs Preview
|
800710b
to
edffe70
Compare
I would like to support OpenAI responses API before working on this PR. |
Any progress on this or any release plan? |
I'll release this in some days. |
Very nice PR, I was wondering if we could split it into two parts and release the ThinkingPart first so developers can implement it for their own models first. For example, deepseek-r1 uses |
I think |
PR Change SummaryIntroduced support for the Thinking part, enhancing the reasoning capabilities of the model.
Added Files
How can I customize these reviews?Check out the Hyperlint AI Reviewer docs for more information on how to customize the review. If you just want to ignore it on this PR, you can add the Note specifically for link checks, we only check the first 30 links in a file and we cache the results for several hours (for instance, if you just added a page, you might experience this). Our recommendation is to add |
@@ -419,6 +426,9 @@ async def _map_messages( | |||
for item in m.parts: | |||
if isinstance(item, TextPart): | |||
content.append({'text': item.content}) | |||
elif isinstance(item, ThinkingPart): | |||
# NOTE: We don't pass the thinking part to Bedrock since it raises an error. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this mean that models won't be able to see their previous thinking if we try to pass previous messages in?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we need to use the <think>
tags here.
@Kludex is this PR ready to be merged or is more work to be done? |
Thank you for the hard work on this. But for Gemini (and other models?) you can still pay for thinking even if no "thinking parts" are in the response. For example: # Pricing
## gemini-2.5-flash-preview-04-17
## Input $0.15/1M
## Output $0.6/1M, Thinking $3.5/1M
total_usage = result.usage()
display(total_usage)
total_cost_usd = total_usage.request_tokens * 0.15 / 1e6 + total_usage.response_tokens * 0.6 / 1e6
print(f"total_cost using request+response tokens:{total_cost_usd}")
assumed_thinking_token_amount = total_usage.total_tokens - (total_usage.request_tokens + total_usage.response_tokens)
print(f"assumed_thinking_token_amount:{assumed_thinking_token_amount}")
thinking_output_cost_usd = assumed_thinking_token_amount * 0.35 / 1e6
total_cost_usd = total_cost_usd + thinking_output_cost_usd
print(f"total_cost using request+response+thinking tokens:{total_cost_usd}") Output:
Is it worth creating a small PR just to reflect this on the Usage "details", even just for Gemini? Or too messy to merge later? I think I can get this done quite quickly. |
[`ThinkingPart`][pydantic_ai.messages.ThinkingPart] into [`TextPart`][pydantic_ai.messages.TextPart]s using the | ||
`"<think>"` tag. | ||
|
||
If you want to proper emit thinking parts, you'd need to use the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you want to proper emit thinking parts, you'd need to use the | |
If you want to emit proper thinking parts, you'd need to use the |
or
If you want to proper emit thinking parts, you'd need to use the | |
If you want to properly emit thinking parts, you'd need to use the |
Differently than other providers, Anthropic sends back a signature in the thinking part. This signature | ||
is used to make sure that the thinking part was not tampered with. | ||
|
||
To enable the thinking part, use the `anthropic_thinking` field on the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Probably worth discussing compatibility with other models — does this mean you can't hand non-anthropic thinking parts to an anthropic model? And vice versa?
|
||
Thinking or reasoning is the process of using a model's capabilities to reason about a task. | ||
|
||
This capability is usually not enabled by default. It depends on the model. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This capability is usually not enabled by default. It depends on the model. | |
This capability is usually not enabled by default, and how to enable it depends on the model. |
# NOTE: There's no way to reach this part of the code, since we don't generate ThinkingPart on TestModel. | ||
pass # pragma: no cover |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
# NOTE: There's no way to reach this part of the code, since we don't generate ThinkingPart on TestModel. | |
pass # pragma: no cover | |
assert False, 'This should be unreachable — we don't generate ThinkingPart on TestModel.' |
from rich.pretty import pprint | ||
|
||
pprint(chunk) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
from rich.pretty import pprint | |
pprint(chunk) |
while START_THINK_TAG in content: | ||
before_think, content = content.split(START_THINK_TAG, 1) | ||
if before_think.strip(): | ||
parts.append(TextPart(content=before_think)) | ||
if END_THINK_TAG in content: | ||
think_content, content = content.split(END_THINK_TAG, 1) | ||
parts.append(ThinkingPart(content=think_content)) | ||
else: | ||
# We lose the `<think>` tag, but it shouldn't matter. | ||
parts.append(TextPart(content=content)) | ||
content = '' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
while START_THINK_TAG in content: | |
before_think, content = content.split(START_THINK_TAG, 1) | |
if before_think.strip(): | |
parts.append(TextPart(content=before_think)) | |
if END_THINK_TAG in content: | |
think_content, content = content.split(END_THINK_TAG, 1) | |
parts.append(ThinkingPart(content=think_content)) | |
else: | |
# We lose the `<think>` tag, but it shouldn't matter. | |
parts.append(TextPart(content=content)) | |
content = '' | |
start_index = content.find(START_THINK_TAG) | |
while start_index >= 0: | |
before_think, content = content[:start_index], content[start_index + len(START_THINK_TAG):] | |
if before_think.strip(): | |
parts.append(TextPart(content=before_think)) | |
end_index = content.find(END_THINK_TAG) | |
if end_index >= 0: | |
think_content, content = content[:end_index], content[end_index + len(END_THINK_TAG):] | |
parts.append(ThinkingPart(content=think_content)) | |
else: | |
# We lose the `<think>` tag, but it shouldn't matter. | |
parts.append(TextPart(content=content)) | |
content = '' | |
start_index = content.find(START_THINK_TAG) |
single pass over the string 🤷♂️. Also, I see you check for .strip()
, should we be applying .strip()
to the contents of the TextPart / ThinkingParts?
warnings.warn( | ||
'PydanticAI currently does not handle redacted thinking blocks. ' | ||
'If you have a suggestion on how we should handle them, please open an issue.', | ||
UserWarning, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this really be a UserWarning? Not sure what's going on to generate these but it feels to me like we should emit something easier to selectively suppress. But maybe this is good if it brings attention to the issue if anyone is using it..
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The one main point of feedback I have is that I think we should provide a convenient way to receive thinking parts in the responses (for the sake of observability, possibly even showing to users, etc.), but to also exclude them from subsequent requests for the sake of reducing token usage. Like, my guess is that in practice including the thinking parts doesn't usually improve subsequent chat completions (at least not by much), but does dramatically increase input token usage in longer conversations with a reasoning model. To be clear, I think there are times when you do want to include the thoughts in subsequent messages, so I don't think we should make that impossible, I just think there should be a way to configure it, probably via a kwarg to the agent or similar. (I'm happy for the default behavior to be whatever we think is best, I just think the option to disable it should exist..)
Otherwise, other than the comments I've added, the PR is looking pretty good
Excellent points. I do think most applications would like to access the thoughts as a separate part in the response, but surely would like to ignore it in the history due to both cost as well as the chance that long sequence of thoughts in the chat history can distract the agent. That being said, I'm eagerly waiting for the PR to be accepted. |
Checklist
id
in theThinkingPart