Skip to content

fix thinking for gemini models #1113

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 3 commits into from
Jun 11, 2025

Conversation

narengogi
Copy link
Collaborator

@narengogi narengogi commented May 27, 2025

closes #1112

testing done:

  1. tested vertex and google providers with thinking enabled and disabled, in streaming mode and withouth streaming mode
  2. verified that caching is working as intended while streaming

example payload:

{
    "model": "gemini-2.5-flash-preview-04-17",
    "max_tokens": 1000,
    "thinking": {
        "budget_tokens": 100,
        "type": "enabled"
    },
    "stream": false,
    "messages": [
        {
            "role": "user",
            "content": [
                {
                    "type": "text",
                    "text": "if tomatina tomatino has fathered tralleliala trallela, and batatina batata is tomatinas sister, how is she related to trallelia?"
                }
            ]
        }
    ]
}

@narengogi narengogi requested a review from b4s36t4 May 27, 2025 07:49
Copy link

matter-code-review bot commented May 27, 2025

Code Quality bug fix

Description

# Summary By MatterAI MatterAI logo

🔄 What Changed

This pull request refactors the handling of 'thinking' (chain-of-thought) messages for Gemini models across both Google Vertex AI and Google Generative AI providers. Key changes include:

  • Token Count: Introduced thoughtsTokenCount in usageMetadata and mapped it to completion_tokens_details.reasoning_tokens for accurate token usage reporting.
  • Content Parsing: Modified GoogleChatCompleteResponseTransform and GoogleChatCompleteStreamChunkTransform to correctly parse and structure thought and text parts from the model's response into a new content_blocks array. This array is included in the message when strictOpenAiCompliance is false, providing a more granular representation of the content.
  • Configuration: Adjusted the transformGenerationConfig function to precisely control the include_thoughts parameter based on params.thinking.type being 'enabled' and the presence of budget_tokens.

🔍 Impact of the Change

This fix ensures that 'thinking' messages from Gemini models are correctly processed and displayed, separating them from the main content. It improves the accuracy of token usage reporting for reasoning steps and enhances the flexibility of the API response by providing structured content blocks when strict OpenAI compliance is not required. This directly addresses the issue of incorrect thinking output handling.

📁 Total Files Changed

4 files were changed in this pull request.

🧪 Test Added

Manual testing was performed to verify the changes:

  1. Provider Testing: Both Google Vertex AI and Google Generative AI providers were tested with the thinking feature enabled and disabled.
  2. Mode Testing: The functionality was verified in both streaming and non-streaming modes.
  3. Caching Verification: Caching behavior was confirmed to be working as intended during streaming operations.

🔒Security Vulnerabilities

No security vulnerabilities were detected in the changes.

Motivation

Closes #1112, which addresses a bug related to the incorrect handling of thinking output for Gemini models.

Type of Change

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • Documentation update
  • Refactoring (no functional changes)

How Has This Been Tested?

  • Unit Tests
  • Integration Tests
  • Manual Testing

Screenshots (if applicable)

N/A

Checklist

  • My code follows the style guidelines of this project
  • I have performed a self-review of my own code
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • My changes generate no new warnings
  • I have added tests that prove my fix is effective or that my feature works
  • New and existing unit tests pass locally with my changes

Related Issues

#1112

Tip

Quality Recommendations

  1. Consider adding specific unit tests for the content_blocks generation logic, covering various combinations of thought and text parts, and the impact of strictOpenAiCompliance on the final content and content_blocks structure. This would ensure robustness for future changes.

  2. The change in VertexAnthropicChatCompleteConfig.model.transform from (params: Params) => {} to () => {} is a minor cleanup. While harmless as it returns undefined, ensure this aligns with any broader pattern for function signatures in ProviderConfig.

Sequence Diagram

sequenceDiagram
    participant Client
    participant GatewayAPI as Gateway API (/chat/completions)
    participant GoogleVertexAIProvider as Google Vertex AI Provider
    participant GoogleProvider as Google Provider
    participant GoogleVertexAIAPI as Google Vertex AI API
    participant GoogleGenerativeAIAPI as Google Generative AI API

    Client->>GatewayAPI: POST /chat/completions (params: {..., thinking: {budget_tokens, type}, ...})
    GatewayAPI->>GoogleVertexAIProvider: chatComplete(params)
    GatewayAPI->>GoogleProvider: chatComplete(params)

    GoogleVertexAIProvider->>GoogleVertexAIProvider: transformGenerationConfig(params)
    GoogleProvider->>GoogleProvider: transformGenerationConfig(params)

    alt For Google Vertex AI
        GoogleVertexAIProvider->>GoogleVertexAIAPI: generateContent(generationConfig: {thinking_config: {include_thoughts, thinking_budget}})
        GoogleVertexAIAPI-->>GoogleVertexAIProvider: response (usageMetadata: {thoughtsTokenCount}, candidates: [{content: {parts: [{text, thought}, {functionCall}]}}])
        GoogleVertexAIProvider->>GoogleVertexAIProvider: GoogleChatCompleteResponseTransform(response)
        GoogleVertexAIProvider->>GoogleVertexAIProvider: GoogleChatCompleteStreamChunkTransform(parsedChunk)
    end

    alt For Google Generative AI
        GoogleProvider->>GoogleGenerativeAIAPI: generateContent(generationConfig: {thinking_config: {include_thoughts, thinking_budget}})
        GoogleGenerativeAIAPI-->>GoogleGenerativeAIAPI: response (usageMetadata: {thoughtsTokenCount}, candidates: [{content: {parts: [{text, thought}, {functionCall}]}}])
        GoogleProvider->>GoogleProvider: GoogleChatCompleteResponseTransform(response)
        GoogleProvider->>GoogleProvider: GoogleChatCompleteStreamChunkTransform(parsedChunk)
    end

    GoogleChatCompleteResponseTransform->>GoogleChatCompleteResponseTransform: Process content parts
    GoogleChatCompleteResponseTransform->>GoogleChatCompleteResponseTransform: Identify 'thought' and 'text' parts
    GoogleChatCompleteResponseTransform->>GoogleChatCompleteResponseTransform: Construct 'contentBlocks' array ({type: 'thinking', thinking: part.text} or {type: 'text', text: part.text})
    GoogleChatCompleteResponseTransform->>GoogleChatCompleteResponseTransform: Add 'content_blocks' to message if !strictOpenAiCompliance
    GoogleChatCompleteResponseTransform->>GoogleChatCompleteResponseTransform: Map tool_calls
    GoogleChatCompleteResponseTransform->>GoogleChatCompleteResponseTransform: Update usage with 'thoughtsTokenCount' in 'completion_tokens_details.reasoning_tokens'

    GoogleChatCompleteStreamChunkTransform->>GoogleChatCompleteStreamChunkTransform: Process content parts in stream
    GoogleChatCompleteStreamChunkTransform->>GoogleChatCompleteStreamChunkTransform: Identify 'thought' and 'text' parts
    GoogleChatCompleteStreamChunkTransform->>GoogleChatCompleteStreamChunkTransform: Construct 'contentBlocks' array ({index, delta: {thinking}} or {index, delta: {text}})
    GoogleChatCompleteStreamChunkTransform->>GoogleChatCompleteStreamChunkTransform: Add 'content_blocks' to message if !strictOpenAiCompliance
    GoogleChatCompleteStreamChunkTransform->>GoogleChatCompleteStreamChunkTransform: Update usage with 'thoughtsTokenCount' in 'completion_tokens_details.reasoning_tokens'

    GoogleVertexAIProvider-->>GatewayAPI: Transformed Response
    GoogleProvider-->>GatewayAPI: Transformed Response
    GatewayAPI-->>Client: Final API Response
Loading

Copy link

Important

PR Review Skipped

PR review skipped as per the configuration setting. Run a manually review by commenting /matter review

💡Tips to use Matter AI

Command List

  • /matter summary: Generate AI Summary for the PR
  • /matter review: Generate AI Reviews for the latest commit in the PR
  • /matter review-full: Generate AI Reviews for the complete PR
  • /matter release-notes: Generate AI release-notes for the PR
  • /matter : Chat with your PR with Matter AI Agent
  • /matter remember : Generate AI memories for the PR
  • /matter explain: Get an explanation of the PR
  • /matter help: Show the list of available commands and documentation
  • Need help? Join our Discord server: https://discord.gg/fJU5DvanU3

@narengogi narengogi requested a review from csgulati09 May 27, 2025 07:52
csgulati09
csgulati09 previously approved these changes May 27, 2025
Copy link

Important

PR Review Skipped

PR review skipped as per the configuration setting. Run a manually review by commenting /matter review

💡Tips to use Matter AI

Command List

  • /matter summary: Generate AI Summary for the PR
  • /matter review: Generate AI Reviews for the latest commit in the PR
  • /matter review-full: Generate AI Reviews for the complete PR
  • /matter release-notes: Generate AI release-notes for the PR
  • /matter : Chat with your PR with Matter AI Agent
  • /matter remember : Generate AI memories for the PR
  • /matter explain: Get an explanation of the PR
  • /matter help: Show the list of available commands and documentation
  • Need help? Join our Discord server: https://discord.gg/fJU5DvanU3

@narengogi narengogi requested a review from b4s36t4 June 10, 2025 13:34
Copy link

Important

PR Review Skipped

PR review skipped as per the configuration setting. Run a manually review by commenting /matter review

💡Tips to use Matter AI

Command List

  • /matter summary: Generate AI Summary for the PR
  • /matter review: Generate AI Reviews for the latest commit in the PR
  • /matter review-full: Generate AI Reviews for the complete PR
  • /matter release-notes: Generate AI release-notes for the PR
  • /matter : Chat with your PR with Matter AI Agent
  • /matter remember : Generate AI memories for the PR
  • /matter explain: Get an explanation of the PR
  • /matter help: Show the list of available commands and documentation
  • Need help? Join our Discord server: https://discord.gg/fJU5DvanU3

@VisargD VisargD merged commit fb39359 into Portkey-AI:main Jun 11, 2025
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Vertex thinking differs from the standardized thinking response from the gateway
4 participants