Skip to content

feature: inference profiles for bedrock #1118

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 6 commits into from
Jun 11, 2025

Conversation

narengogi
Copy link
Collaborator

@narengogi narengogi commented May 30, 2025

more on bedrock inference profiles:
https://docs.aws.amazon.com/bedrock/latest/userguide/inference-profiles.html

Testing done:

  • Tested with application inference profiles ex: arn:aws:bedrock:us-east-1:517194595696:application-inference-profile/s529qz7ddy06 (both URI encoded and as a regular string), verified that cost calculation is working as intended
  • Tested with regular models like anthropic.claude-3-haiku-20240307-v1:0
  • verified with cache

Guide

  1. Create an inference profile:
aws bedrock get-foundation-model --model-identifier anthropic.claude-v2:1
  • use the following command from the CLI to create an application inference profile
 aws bedrock create-inference-profile \
                             --inference-profile-name inference-profile-test \
                             --model-source copyFrom=arn:aws:bedrock:us-east-1::foundation-model/anthropic.claude-v2:1

Note

inference profiles are immutable

  1. Using the inference profile
  • send the generated inference profile id in your model parameter like (arn:aws:bedrock:us-east-1:517194595696:application-inference-profile/s529qz7ddy06)
  • this is cached for upto a day

snippets for testing

curl --location 'http://localhost:8787/v1/chat/completions' \
--header 'Content-Type: application/json' \
--header 'x-portkey-virtual-key: ' \
--header 'x-portkey-api-key: ' \
--data '{
    "messages": [
        {
            "role": "user",
            "content": "How are you doing sir?"
        },
        {
            "role": "assistant",
            "content": [
                    {
                        "type": "text",
                        "text": "\n\nThank you forsking! I'\''m just a program, so I don'\''t have feelings, but I'\''m here and ready to help with whatever you need. How can I assist you today? 😊"
                    }
                ]
        },
        {
            "role": "user",
            "content": "good good, you seem cherry"
        }
    ],
    "model": "arn:aws:bedrock:us-east-1:517194595696:application-inference-profile/s529qz7ddy06",
    "max_tokens": 3000,
    "stream": false
}'

@narengogi narengogi requested review from sk-portkey and VisargD May 30, 2025 14:14
Copy link

matter-code-review bot commented May 30, 2025

Code Quality new feature

Description

Summary By MatterAI MatterAI logo

🔄 What Changed

This pull request introduces support for AWS Bedrock inference profiles. The core change involves modifying the getBaseURL function within the Bedrock API configuration to intelligently resolve the underlying foundation model when an AWS ARN for an inference profile is provided as the model parameter. This resolution process includes a caching mechanism to improve performance for subsequent requests. Additionally, the model resolution logic in BedrockConfig has been updated to prioritize the newly resolved foundationModel.

🔍 Impact of the Change

This feature allows users to leverage AWS Bedrock's inference profiles, providing a more flexible and potentially managed way to access foundation models without directly specifying the model ID. By caching the inference profile lookup for up to a day, the change significantly reduces latency and API calls to AWS Bedrock for repeated requests using the same inference profile. This enhances the overall performance and usability for Bedrock users.

📁 Total Files Changed

8 files were changed in this pull request.

🧪 Test Added

  • Application Inference Profile Testing: The functionality was tested using an example application inference profile ARN (arn:aws:bedrock:us-east-1:517194595696:application-inference-profile/s529qz7ddy06). This test specifically verified that the cost calculation mechanism works correctly when an inference profile is used.
  • Regular Model Testing: The changes were also validated with standard Bedrock models (e.g., anthropic.claude-3-haiku-20240307-v1:0) to ensure that existing functionality remains intact and unaffected by the new inference profile logic.
  • Cache Verification: The caching mechanism implemented for inference profile lookups was explicitly verified to ensure that it correctly stores and retrieves the resolved foundation models, reducing redundant API calls to AWS Bedrock.

🔒Security Vulnerabilities

No security vulnerabilities were detected in the provided code patch.

Motivation

This feature was motivated by the need to support AWS Bedrock's inference profiles, which provide a more flexible and managed way to interact with foundation models. This allows users to specify an inference profile ARN instead of a direct model identifier, enhancing integration with AWS Bedrock's advanced features and potentially simplifying model management for users.

Type of Change

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • Documentation update
  • Refactoring (no functional changes)

How Has This Been Tested?

  • Unit Tests
  • Integration Tests
  • Manual Testing

Screenshots (if applicable)

N/A

Checklist

  • My code follows the style guidelines of this project
  • I have performed a self-review of my own code
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • My changes generate no new warnings
  • I have added tests that prove my fix is effective or that my feature works
  • New and existing unit tests pass locally with my changes

Related Issues

.

Tip

Quality Recommendations

  1. Ensure consistent error handling in getFoundationModelFromInferenceProfile. Currently, getInferenceProfile throws an error, but getFoundationModelFromInferenceProfile catches it and returns null. It might be better to re-throw a more specific GatewayError or log the error more verbosely before returning null to aid debugging.

  2. Consider adding a timeout to the fetch call within getInferenceProfile to prevent potential long-running requests to the AWS Bedrock API, which could impact performance or lead to hung connections.

  3. Add more detailed logging within getFoundationModelFromInferenceProfile to indicate cache hits/misses and when an external API call to Bedrock is made for an inference profile. This can be valuable for monitoring and debugging performance.

Sequence Diagram

sequenceDiagram
    participant Client
    participant HandlerUtils
    participant BedrockAPI
    participant BedrockUtils
    participant Cache
    participant AWSBedrockAPI

    Client->>HandlerUtils: POST /v1/chat/completions (model: inferenceProfileARN)
    HandlerUtils: tryPost(fn, c, gatewayRequestURL, params)
    HandlerUtils->>BedrockAPI: BedrockAPIConfig.getBaseURL({c, providerOptions, fn, gatewayRequestURL, params})
    BedrockAPI: getBaseURL is now async
    BedrockAPI->>BedrockAPI: Decode model from params (e.g., model = 'arn:aws:...')
    alt model is an ARN and includes 'arn:aws'
        BedrockAPI->>BedrockUtils: getFoundationModelFromInferenceProfile(c, model, providerOptions)
        BedrockUtils->>Cache: getFromCacheByKey(env(c), cacheKey)
        alt Cache Hit
            Cache-->>BedrockUtils: cachedFoundationModel
            BedrockUtils-->>BedrockAPI: cachedFoundationModel
        else Cache Miss
            BedrockUtils->>BedrockUtils: getInferenceProfile(inferenceProfileIdentifier, awsRegion, awsAccessKeyId, ...)
            BedrockUtils->>BedrockUtils: generateAWSHeaders(..., url, 'GET', 'bedrock', ...)
            BedrockUtils->>AWSBedrockAPI: GET /inference-profiles/{identifier} (with AWS headers)
            AWSBedrockAPI-->>BedrockUtils: BedrockInferenceProfile JSON
            BedrockUtils: Extract foundationModel from inferenceProfile.models[0].modelArn
            BedrockUtils->>Cache: putInCacheWithValue(env(c), cacheKey, foundationModel, 86400)
            BedrockUtils-->>BedrockAPI: foundationModel
        end
        BedrockAPI: Set params.foundationModel = foundationModel
    end
    BedrockAPI-->>HandlerUtils: baseURL
    HandlerUtils->>BedrockAPI: Continue with API call using baseURL
    BedrockAPI->>BedrockUtils: BedrockConfig (uses params.foundationModel if available)

    Note over BedrockAPI,BedrockUtils: Other handlers (getBatchOutput, retrieveFileContent) also await getBaseURL
Loading

Copy link

@matter-code-review matter-code-review bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This PR adds support for AWS Bedrock inference profiles, allowing the system to work with inference profile ARNs by extracting the underlying foundation model. The implementation is generally good with proper caching and error handling, but I've identified a few improvements that could enhance the code quality and reliability.

Comment on lines 453 to 482
try {
const getFromCacheByKey = c.get('getFromCacheByKey');
const putInCacheWithValue = c.get('putInCacheWithValue');
const cacheKey = `bedrock-inference-profile-${inferenceProfileIdentifier}`;
const cachedFoundationModel = getFromCacheByKey
? await getFromCacheByKey(env(c), cacheKey)
: null;
if (cachedFoundationModel) {
//update ttl, dont't await the result
putInCacheWithValue(env(c), cacheKey, cachedFoundationModel, 56400);
return cachedFoundationModel;
}

const inferenceProfile = await getInferenceProfile(
inferenceProfileIdentifier || '',
providerOptions.awsRegion || '',
providerOptions.awsAccessKeyId || '',
providerOptions.awsSecretAccessKey || '',
providerOptions.awsSessionToken || ''
);

// modelArn is always like arn:aws:bedrock:us-east-1::foundation-model/anthropic.claude-v2:1
const foundationModel = inferenceProfile?.models?.[0]?.modelArn
?.split('/')
?.pop();
putInCacheWithValue(env(c), cacheKey, foundationModel, 56400);
return foundationModel;
} catch (error) {
return null;
}

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Code Refactor

Issue: The error handling in getFoundationModelFromInferenceProfile silently returns null for any error, which could hide important issues.
Fix: Add more specific error handling and logging to help with debugging.
Impact: Improves troubleshooting and error visibility when inference profile resolution fails.

Suggested change
try {
const getFromCacheByKey = c.get('getFromCacheByKey');
const putInCacheWithValue = c.get('putInCacheWithValue');
const cacheKey = `bedrock-inference-profile-${inferenceProfileIdentifier}`;
const cachedFoundationModel = getFromCacheByKey
? await getFromCacheByKey(env(c), cacheKey)
: null;
if (cachedFoundationModel) {
//update ttl, dont't await the result
putInCacheWithValue(env(c), cacheKey, cachedFoundationModel, 56400);
return cachedFoundationModel;
}
const inferenceProfile = await getInferenceProfile(
inferenceProfileIdentifier || '',
providerOptions.awsRegion || '',
providerOptions.awsAccessKeyId || '',
providerOptions.awsSecretAccessKey || '',
providerOptions.awsSessionToken || ''
);
// modelArn is always like arn:aws:bedrock:us-east-1::foundation-model/anthropic.claude-v2:1
const foundationModel = inferenceProfile?.models?.[0]?.modelArn
?.split('/')
?.pop();
putInCacheWithValue(env(c), cacheKey, foundationModel, 56400);
return foundationModel;
} catch (error) {
return null;
}
try {
const getFromCacheByKey = c.get('getFromCacheByKey');
const putInCacheWithValue = c.get('putInCacheWithValue');
const cacheKey = `bedrock-inference-profile-${inferenceProfileIdentifier}`;
const cachedFoundationModel = getFromCacheByKey
? await getFromCacheByKey(env(c), cacheKey)
: null;
if (cachedFoundationModel) {
//update ttl, dont't await the result
putInCacheWithValue(env(c), cacheKey, cachedFoundationModel, 56400);
return cachedFoundationModel;
}
const inferenceProfile = await getInferenceProfile(
inferenceProfileIdentifier || '',
providerOptions.awsRegion || '',
providerOptions.awsAccessKeyId || '',
providerOptions.awsSecretAccessKey || '',
providerOptions.awsSessionToken || ''
);
// modelArn is always like arn:aws:bedrock:us-east-1::foundation-model/anthropic.claude-v2:1
const foundationModel = inferenceProfile?.models?.[0]?.modelArn
?.split('/')
?.pop();
if (!foundationModel) {
console.warn(`No foundation model found in inference profile: ${inferenceProfileIdentifier}`);
return null;
}
putInCacheWithValue(env(c), cacheKey, foundationModel, 56400);
return foundationModel;
} catch (error) {
console.error(`Error resolving foundation model from inference profile ${inferenceProfileIdentifier}:`, error);
return null;
}

Comment on lines +108 to +117
const foundationModel = model.includes('foundation-model/')
? model.split('/').pop()
: await getFoundationModelFromInferenceProfile(
c,
model,
providerOptions
);
if (foundationModel) {
params.foundationModel = foundationModel;
}

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Code Refactor

Issue: The code doesn't handle the case where foundationModel extraction fails but still attempts to use it.
Fix: Add a check to ensure foundationModel is defined before setting it in params.
Impact: Prevents potential undefined values from being used in the model parameter.

Suggested change
const foundationModel = model.includes('foundation-model/')
? model.split('/').pop()
: await getFoundationModelFromInferenceProfile(
c,
model,
providerOptions
);
if (foundationModel) {
params.foundationModel = foundationModel;
}
const foundationModel = model.includes('foundation-model/')
? model.split('/').pop()
: await getFoundationModelFromInferenceProfile(
c,
model,
providerOptions
);
if (foundationModel && foundationModel.length > 0) {
params.foundationModel = foundationModel;
}

Comment on lines +408 to +415
export const getInferenceProfile = async (
inferenceProfileIdentifier: string,
awsRegion: string,
awsAccessKeyId: string,
awsSecretAccessKey: string,
awsSessionToken?: string
) => {
const url = `https://bedrock.${awsRegion}.amazonaws.com/inference-profiles/${encodeURIComponent(decodeURIComponent(inferenceProfileIdentifier))}`;

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔒 Security Issue Fix

Issue: The getInferenceProfile function doesn't validate the inferenceProfileIdentifier before using it in the URL, which could potentially lead to URL manipulation issues.
Fix: Add validation to ensure the inferenceProfileIdentifier is a valid ARN format before using it.
Impact: Prevents potential security issues related to URL manipulation.

Suggested change
export const getInferenceProfile = async (
inferenceProfileIdentifier: string,
awsRegion: string,
awsAccessKeyId: string,
awsSecretAccessKey: string,
awsSessionToken?: string
) => {
const url = `https://bedrock.${awsRegion}.amazonaws.com/inference-profiles/${encodeURIComponent(decodeURIComponent(inferenceProfileIdentifier))}`;
export const getInferenceProfile = async (
inferenceProfileIdentifier: string,
awsRegion: string,
awsAccessKeyId: string,
awsSecretAccessKey: string,
awsSessionToken?: string
) => {
if (!inferenceProfileIdentifier || !inferenceProfileIdentifier.startsWith('arn:aws')) {
throw new Error('Invalid inference profile identifier format');
}
const url = `https://bedrock.${awsRegion}.amazonaws.com/inference-profiles/${encodeURIComponent(decodeURIComponent(inferenceProfileIdentifier))}`;

Copy link

Important

PR Review Skipped

PR review skipped as per the configuration setting. Run a manually review by commenting /matter review

💡Tips to use Matter AI

Command List

  • /matter summary: Generate AI Summary for the PR
  • /matter review: Generate AI Reviews for the latest commit in the PR
  • /matter review-full: Generate AI Reviews for the complete PR
  • /matter release-notes: Generate AI release-notes for the PR
  • /matter : Chat with your PR with Matter AI Agent
  • /matter remember : Generate AI memories for the PR
  • /matter explain: Get an explanation of the PR
  • /matter help: Show the list of available commands and documentation
  • Need help? Join our Discord server: https://discord.gg/fJU5DvanU3

@narengogi narengogi changed the title application inference profiles support for bedrock feature: application inference profiles support for bedrock Jun 2, 2025
@narengogi narengogi changed the title feature: application inference profiles support for bedrock feature: inference profiles for bedrock Jun 2, 2025
remove redundant cache write and check if cache function is available before invoking it
Copy link

Important

PR Review Skipped

PR review skipped as per the configuration setting. Run a manually review by commenting /matter review

💡Tips to use Matter AI

Command List

  • /matter summary: Generate AI Summary for the PR
  • /matter review: Generate AI Reviews for the latest commit in the PR
  • /matter review-full: Generate AI Reviews for the complete PR
  • /matter release-notes: Generate AI release-notes for the PR
  • /matter : Chat with your PR with Matter AI Agent
  • /matter remember : Generate AI memories for the PR
  • /matter explain: Get an explanation of the PR
  • /matter help: Show the list of available commands and documentation
  • Need help? Join our Discord server: https://discord.gg/fJU5DvanU3

Copy link

Important

PR Review Skipped

PR review skipped as per the configuration setting. Run a manually review by commenting /matter review

💡Tips to use Matter AI

Command List

  • /matter summary: Generate AI Summary for the PR
  • /matter review: Generate AI Reviews for the latest commit in the PR
  • /matter review-full: Generate AI Reviews for the complete PR
  • /matter release-notes: Generate AI release-notes for the PR
  • /matter : Chat with your PR with Matter AI Agent
  • /matter remember : Generate AI memories for the PR
  • /matter explain: Get an explanation of the PR
  • /matter help: Show the list of available commands and documentation
  • Need help? Join our Discord server: https://discord.gg/fJU5DvanU3

Copy link

Important

PR Review Skipped

PR review skipped as per the configuration setting. Run a manually review by commenting /matter review

💡Tips to use Matter AI

Command List

  • /matter summary: Generate AI Summary for the PR
  • /matter review: Generate AI Reviews for the latest commit in the PR
  • /matter review-full: Generate AI Reviews for the complete PR
  • /matter release-notes: Generate AI release-notes for the PR
  • /matter : Chat with your PR with Matter AI Agent
  • /matter remember : Generate AI memories for the PR
  • /matter explain: Get an explanation of the PR
  • /matter help: Show the list of available commands and documentation
  • Need help? Join our Discord server: https://discord.gg/fJU5DvanU3

@VisargD VisargD merged commit a6fe2d9 into Portkey-AI:main Jun 11, 2025
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants