Skip to content

Support Thinking part #1142

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 42 commits into
base: main
Choose a base branch
from
Draft

Support Thinking part #1142

wants to merge 42 commits into from

Conversation

Kludex
Copy link
Member

@Kludex Kludex commented Mar 16, 2025

@Kludex Kludex marked this pull request as draft March 16, 2025 13:41
Copy link

github-actions bot commented Mar 16, 2025

Docs Preview

commit: 2e31e26
Preview URL: https://2577b205-pydantic-ai-previews.pydantic.workers.dev

@Kludex Kludex force-pushed the support-thinking branch from 800710b to edffe70 Compare March 22, 2025 10:36
@Kludex
Copy link
Member Author

Kludex commented Mar 27, 2025

I would like to support OpenAI responses API before working on this PR.

@sam-cobo
Copy link

Any progress on this or any release plan?

@Kludex
Copy link
Member Author

Kludex commented Apr 20, 2025

I'll release this in some days.

@Wh1isper
Copy link
Contributor

Very nice PR, I was wondering if we could split it into two parts and release the ThinkingPart first so developers can implement it for their own models first.

For example, deepseek-r1 uses reasoning_content to show the full content. this doesn't seem to be in the current primary support.

https://api-docs.deepseek.com/api/create-chat-completion
image

@Kludex Kludex marked this pull request as ready for review April 22, 2025 13:52
@Kludex
Copy link
Member Author

Kludex commented Apr 25, 2025

I think signature should only be available for Anthropic. I think it's confusing to convert id to signature.

Copy link
Contributor

hyperlint-ai bot commented Apr 25, 2025

PR Change Summary

Introduced support for the Thinking part, enhancing the reasoning capabilities of the model.

  • Added a new section on Thinking in the documentation
  • Closed multiple related issues regarding the Thinking feature
  • Included streaming support for the Thinking part

Added Files

  • docs/thinking.md

How can I customize these reviews?

Check out the Hyperlint AI Reviewer docs for more information on how to customize the review.

If you just want to ignore it on this PR, you can add the hyperlint-ignore label to the PR. Future changes won't trigger a Hyperlint review.

Note specifically for link checks, we only check the first 30 links in a file and we cache the results for several hours (for instance, if you just added a page, you might experience this). Our recommendation is to add hyperlint-ignore to the PR to ignore the link check for this PR.

@@ -419,6 +426,9 @@ async def _map_messages(
for item in m.parts:
if isinstance(item, TextPart):
content.append({'text': item.content})
elif isinstance(item, ThinkingPart):
# NOTE: We don't pass the thinking part to Bedrock since it raises an error.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this mean that models won't be able to see their previous thinking if we try to pass previous messages in?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we need to use the <think> tags here.

@DouweM DouweM marked this pull request as draft April 30, 2025 20:56
@aristideubertas
Copy link

@Kludex is this PR ready to be merged or is more work to be done?

@yoadsn
Copy link

yoadsn commented May 21, 2025

Thank you for the hard work on this.
I realize this is WIP and super involved/complex.

But for Gemini (and other models?) you can still pay for thinking even if no "thinking parts" are in the response.
Thus, cost calculation should include those tokens which are priced differently.
With Gemini, for example the total will include those thinking tokens, and you can imply them but this seems fragile.

For example:

# Pricing
## gemini-2.5-flash-preview-04-17
## Input $0.15/1M
## Output $0.6/1M, Thinking $3.5/1M
total_usage = result.usage()

display(total_usage)
total_cost_usd = total_usage.request_tokens * 0.15 / 1e6 + total_usage.response_tokens * 0.6 / 1e6
print(f"total_cost using request+response tokens:{total_cost_usd}")

assumed_thinking_token_amount = total_usage.total_tokens - (total_usage.request_tokens + total_usage.response_tokens)
print(f"assumed_thinking_token_amount:{assumed_thinking_token_amount}")
thinking_output_cost_usd = assumed_thinking_token_amount * 0.35 / 1e6

total_cost_usd = total_cost_usd + thinking_output_cost_usd
print(f"total_cost using request+response+thinking tokens:{total_cost_usd}")

Output:

total_cost using request+response tokens:0.00115425
assumed_thinking_token_amount:2118
total_cost using request+response+thinking tokens:0.0018955499999999998

Is it worth creating a small PR just to reflect this on the Usage "details", even just for Gemini? Or too messy to merge later? I think I can get this done quite quickly.

[`ThinkingPart`][pydantic_ai.messages.ThinkingPart] into [`TextPart`][pydantic_ai.messages.TextPart]s using the
`"<think>"` tag.

If you want to proper emit thinking parts, you'd need to use the
Copy link
Contributor

@dmontagu dmontagu Jun 10, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
If you want to proper emit thinking parts, you'd need to use the
If you want to emit proper thinking parts, you'd need to use the

or

Suggested change
If you want to proper emit thinking parts, you'd need to use the
If you want to properly emit thinking parts, you'd need to use the

Differently than other providers, Anthropic sends back a signature in the thinking part. This signature
is used to make sure that the thinking part was not tampered with.

To enable the thinking part, use the `anthropic_thinking` field on the
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably worth discussing compatibility with other models — does this mean you can't hand non-anthropic thinking parts to an anthropic model? And vice versa?


Thinking or reasoning is the process of using a model's capabilities to reason about a task.

This capability is usually not enabled by default. It depends on the model.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
This capability is usually not enabled by default. It depends on the model.
This capability is usually not enabled by default, and how to enable it depends on the model.

Comment on lines +264 to +265
# NOTE: There's no way to reach this part of the code, since we don't generate ThinkingPart on TestModel.
pass # pragma: no cover
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
# NOTE: There's no way to reach this part of the code, since we don't generate ThinkingPart on TestModel.
pass # pragma: no cover
assert False, 'This should be unreachable — we don't generate ThinkingPart on TestModel.'

Comment on lines +935 to +937
from rich.pretty import pprint

pprint(chunk)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
from rich.pretty import pprint
pprint(chunk)

Comment on lines +19 to +29
while START_THINK_TAG in content:
before_think, content = content.split(START_THINK_TAG, 1)
if before_think.strip():
parts.append(TextPart(content=before_think))
if END_THINK_TAG in content:
think_content, content = content.split(END_THINK_TAG, 1)
parts.append(ThinkingPart(content=think_content))
else:
# We lose the `<think>` tag, but it shouldn't matter.
parts.append(TextPart(content=content))
content = ''
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
while START_THINK_TAG in content:
before_think, content = content.split(START_THINK_TAG, 1)
if before_think.strip():
parts.append(TextPart(content=before_think))
if END_THINK_TAG in content:
think_content, content = content.split(END_THINK_TAG, 1)
parts.append(ThinkingPart(content=think_content))
else:
# We lose the `<think>` tag, but it shouldn't matter.
parts.append(TextPart(content=content))
content = ''
start_index = content.find(START_THINK_TAG)
while start_index >= 0:
before_think, content = content[:start_index], content[start_index + len(START_THINK_TAG):]
if before_think.strip():
parts.append(TextPart(content=before_think))
end_index = content.find(END_THINK_TAG)
if end_index >= 0:
think_content, content = content[:end_index], content[end_index + len(END_THINK_TAG):]
parts.append(ThinkingPart(content=think_content))
else:
# We lose the `<think>` tag, but it shouldn't matter.
parts.append(TextPart(content=content))
content = ''
start_index = content.find(START_THINK_TAG)

single pass over the string 🤷‍♂️. Also, I see you check for .strip(), should we be applying .strip() to the contents of the TextPart / ThinkingParts?

warnings.warn(
'PydanticAI currently does not handle redacted thinking blocks. '
'If you have a suggestion on how we should handle them, please open an issue.',
UserWarning,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this really be a UserWarning? Not sure what's going on to generate these but it feels to me like we should emit something easier to selectively suppress. But maybe this is good if it brings attention to the issue if anyone is using it..

Copy link
Contributor

@dmontagu dmontagu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The one main point of feedback I have is that I think we should provide a convenient way to receive thinking parts in the responses (for the sake of observability, possibly even showing to users, etc.), but to also exclude them from subsequent requests for the sake of reducing token usage. Like, my guess is that in practice including the thinking parts doesn't usually improve subsequent chat completions (at least not by much), but does dramatically increase input token usage in longer conversations with a reasoning model. To be clear, I think there are times when you do want to include the thoughts in subsequent messages, so I don't think we should make that impossible, I just think there should be a way to configure it, probably via a kwarg to the agent or similar. (I'm happy for the default behavior to be whatever we think is best, I just think the option to disable it should exist..)

Otherwise, other than the comments I've added, the PR is looking pretty good

@amiyapatanaik
Copy link

The one main point of feedback I have is that I think we should provide a convenient way to receive thinking parts in the responses (for the sake of observability, possibly even showing to users, etc.), but to also exclude them from subsequent requests for the sake of reducing token usage. Like, my guess is that in practice including the thinking parts doesn't usually improve subsequent chat completions (at least not by much), but does dramatically increase input token usage in longer conversations with a reasoning model. To be clear, I think there are times when you do want to include the thoughts in subsequent messages, so I don't think we should make that impossible, I just think there should be a way to configure it, probably via a kwarg to the agent or similar. (I'm happy for the default behavior to be whatever we think is best, I just think the option to disable it should exist..)

Otherwise, other than the comments I've added, the PR is looking pretty good

Excellent points. I do think most applications would like to access the thoughts as a separate part in the response, but surely would like to ignore it in the history due to both cost as well as the chance that long sequence of thoughts in the chat history can distract the agent.

That being said, I'm eagerly waiting for the PR to be accepted.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Reasoning response support How to include thoughts from the Gemini model in Pydantic output? Disable thinking mode in Gemini 2.5 flash
9 participants