-
Notifications
You must be signed in to change notification settings - Fork 604
fix thinking for gemini models #1113
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix thinking for gemini models #1113
Conversation
Description# Summary By MatterAI
🔄 What ChangedThis pull request refactors the handling of 'thinking' (chain-of-thought) messages for Gemini models across both Google Vertex AI and Google Generative AI providers. Key changes include:
🔍 Impact of the ChangeThis fix ensures that 'thinking' messages from Gemini models are correctly processed and displayed, separating them from the main content. It improves the accuracy of token usage reporting for reasoning steps and enhances the flexibility of the API response by providing structured content blocks when strict OpenAI compliance is not required. This directly addresses the issue of incorrect thinking output handling. 📁 Total Files Changed4 files were changed in this pull request. 🧪 Test AddedManual testing was performed to verify the changes:
🔒Security VulnerabilitiesNo security vulnerabilities were detected in the changes. MotivationCloses #1112, which addresses a bug related to the incorrect handling of thinking output for Gemini models. Type of Change
How Has This Been Tested?
Screenshots (if applicable)N/A Checklist
Related IssuesTip Quality Recommendations
Sequence DiagramsequenceDiagram
participant Client
participant GatewayAPI as Gateway API (/chat/completions)
participant GoogleVertexAIProvider as Google Vertex AI Provider
participant GoogleProvider as Google Provider
participant GoogleVertexAIAPI as Google Vertex AI API
participant GoogleGenerativeAIAPI as Google Generative AI API
Client->>GatewayAPI: POST /chat/completions (params: {..., thinking: {budget_tokens, type}, ...})
GatewayAPI->>GoogleVertexAIProvider: chatComplete(params)
GatewayAPI->>GoogleProvider: chatComplete(params)
GoogleVertexAIProvider->>GoogleVertexAIProvider: transformGenerationConfig(params)
GoogleProvider->>GoogleProvider: transformGenerationConfig(params)
alt For Google Vertex AI
GoogleVertexAIProvider->>GoogleVertexAIAPI: generateContent(generationConfig: {thinking_config: {include_thoughts, thinking_budget}})
GoogleVertexAIAPI-->>GoogleVertexAIProvider: response (usageMetadata: {thoughtsTokenCount}, candidates: [{content: {parts: [{text, thought}, {functionCall}]}}])
GoogleVertexAIProvider->>GoogleVertexAIProvider: GoogleChatCompleteResponseTransform(response)
GoogleVertexAIProvider->>GoogleVertexAIProvider: GoogleChatCompleteStreamChunkTransform(parsedChunk)
end
alt For Google Generative AI
GoogleProvider->>GoogleGenerativeAIAPI: generateContent(generationConfig: {thinking_config: {include_thoughts, thinking_budget}})
GoogleGenerativeAIAPI-->>GoogleGenerativeAIAPI: response (usageMetadata: {thoughtsTokenCount}, candidates: [{content: {parts: [{text, thought}, {functionCall}]}}])
GoogleProvider->>GoogleProvider: GoogleChatCompleteResponseTransform(response)
GoogleProvider->>GoogleProvider: GoogleChatCompleteStreamChunkTransform(parsedChunk)
end
GoogleChatCompleteResponseTransform->>GoogleChatCompleteResponseTransform: Process content parts
GoogleChatCompleteResponseTransform->>GoogleChatCompleteResponseTransform: Identify 'thought' and 'text' parts
GoogleChatCompleteResponseTransform->>GoogleChatCompleteResponseTransform: Construct 'contentBlocks' array ({type: 'thinking', thinking: part.text} or {type: 'text', text: part.text})
GoogleChatCompleteResponseTransform->>GoogleChatCompleteResponseTransform: Add 'content_blocks' to message if !strictOpenAiCompliance
GoogleChatCompleteResponseTransform->>GoogleChatCompleteResponseTransform: Map tool_calls
GoogleChatCompleteResponseTransform->>GoogleChatCompleteResponseTransform: Update usage with 'thoughtsTokenCount' in 'completion_tokens_details.reasoning_tokens'
GoogleChatCompleteStreamChunkTransform->>GoogleChatCompleteStreamChunkTransform: Process content parts in stream
GoogleChatCompleteStreamChunkTransform->>GoogleChatCompleteStreamChunkTransform: Identify 'thought' and 'text' parts
GoogleChatCompleteStreamChunkTransform->>GoogleChatCompleteStreamChunkTransform: Construct 'contentBlocks' array ({index, delta: {thinking}} or {index, delta: {text}})
GoogleChatCompleteStreamChunkTransform->>GoogleChatCompleteStreamChunkTransform: Add 'content_blocks' to message if !strictOpenAiCompliance
GoogleChatCompleteStreamChunkTransform->>GoogleChatCompleteStreamChunkTransform: Update usage with 'thoughtsTokenCount' in 'completion_tokens_details.reasoning_tokens'
GoogleVertexAIProvider-->>GatewayAPI: Transformed Response
GoogleProvider-->>GatewayAPI: Transformed Response
GatewayAPI-->>Client: Final API Response
|
Important PR Review SkippedPR review skipped as per the configuration setting. Run a manually review by commenting /matter review 💡Tips to use Matter AICommand List
|
Important PR Review SkippedPR review skipped as per the configuration setting. Run a manually review by commenting /matter review 💡Tips to use Matter AICommand List
|
Important PR Review SkippedPR review skipped as per the configuration setting. Run a manually review by commenting /matter review 💡Tips to use Matter AICommand List
|
closes #1112
testing done:
example payload: