feature: inference profiles for bedrock #1118

narengogi · 2025-05-30T14:14:39Z

more on bedrock inference profiles:
https://docs.aws.amazon.com/bedrock/latest/userguide/inference-profiles.html

Testing done:

Tested with application inference profiles ex: arn:aws:bedrock:us-east-1:517194595696:application-inference-profile/s529qz7ddy06 (both URI encoded and as a regular string), verified that cost calculation is working as intended
Tested with regular models like anthropic.claude-3-haiku-20240307-v1:0
verified with cache

Guide

Create an inference profile:

your IAM role should have the requisite permissions: https://docs.aws.amazon.com/bedrock/latest/userguide/inference-profiles-prereq.html
Fetch the ARN of the foundation model (or cross region inference profile) that you want to use

aws bedrock get-foundation-model --model-identifier anthropic.claude-v2:1

use the following command from the CLI to create an application inference profile

 aws bedrock create-inference-profile \
                             --inference-profile-name inference-profile-test \
                             --model-source copyFrom=arn:aws:bedrock:us-east-1::foundation-model/anthropic.claude-v2:1

Note

inference profiles are immutable

Using the inference profile

send the generated inference profile id in your model parameter like (arn:aws:bedrock:us-east-1:517194595696:application-inference-profile/s529qz7ddy06)
this is cached for upto a day

snippets for testing

curl --location 'http://localhost:8787/v1/chat/completions' \
--header 'Content-Type: application/json' \
--header 'x-portkey-virtual-key: ' \
--header 'x-portkey-api-key: ' \
--data '{
    "messages": [
        {
            "role": "user",
            "content": "How are you doing sir?"
        },
        {
            "role": "assistant",
            "content": [
                    {
                        "type": "text",
                        "text": "\n\nThank you forsking! I'\''m just a program, so I don'\''t have feelings, but I'\''m here and ready to help with whatever you need. How can I assist you today? 😊"
                    }
                ]
        },
        {
            "role": "user",
            "content": "good good, you seem cherry"
        }
    ],
    "model": "arn:aws:bedrock:us-east-1:517194595696:application-inference-profile/s529qz7ddy06",
    "max_tokens": 3000,
    "stream": false
}'

matter-code-review · 2025-05-30T14:15:26Z

Description

Summary By MatterAI

🔄 What Changed

This pull request introduces support for AWS Bedrock inference profiles. The core change involves modifying the getBaseURL function within the Bedrock API configuration to intelligently resolve the underlying foundation model when an AWS ARN for an inference profile is provided as the model parameter. This resolution process includes a caching mechanism to improve performance for subsequent requests. Additionally, the model resolution logic in BedrockConfig has been updated to prioritize the newly resolved foundationModel.

🔍 Impact of the Change

This feature allows users to leverage AWS Bedrock's inference profiles, providing a more flexible and potentially managed way to access foundation models without directly specifying the model ID. By caching the inference profile lookup for up to a day, the change significantly reduces latency and API calls to AWS Bedrock for repeated requests using the same inference profile. This enhances the overall performance and usability for Bedrock users.

📁 Total Files Changed

8 files were changed in this pull request.

🧪 Test Added

Application Inference Profile Testing: The functionality was tested using an example application inference profile ARN (arn:aws:bedrock:us-east-1:517194595696:application-inference-profile/s529qz7ddy06). This test specifically verified that the cost calculation mechanism works correctly when an inference profile is used.
Regular Model Testing: The changes were also validated with standard Bedrock models (e.g., anthropic.claude-3-haiku-20240307-v1:0) to ensure that existing functionality remains intact and unaffected by the new inference profile logic.
Cache Verification: The caching mechanism implemented for inference profile lookups was explicitly verified to ensure that it correctly stores and retrieves the resolved foundation models, reducing redundant API calls to AWS Bedrock.

🔒Security Vulnerabilities

No security vulnerabilities were detected in the provided code patch.

Motivation

This feature was motivated by the need to support AWS Bedrock's inference profiles, which provide a more flexible and managed way to interact with foundation models. This allows users to specify an inference profile ARN instead of a direct model identifier, enhancing integration with AWS Bedrock's advanced features and potentially simplifying model management for users.

Type of Change

Bug fix (non-breaking change which fixes an issue)
New feature (non-breaking change which adds functionality)
Breaking change (fix or feature that would cause existing functionality to not work as expected)
Documentation update
Refactoring (no functional changes)

How Has This Been Tested?

Unit Tests
Integration Tests
Manual Testing

Screenshots (if applicable)

N/A

Checklist

My code follows the style guidelines of this project
I have performed a self-review of my own code
I have commented my code, particularly in hard-to-understand areas
I have made corresponding changes to the documentation
My changes generate no new warnings
I have added tests that prove my fix is effective or that my feature works
New and existing unit tests pass locally with my changes

Related Issues

.

Tip

Quality Recommendations

Ensure consistent error handling in getFoundationModelFromInferenceProfile. Currently, getInferenceProfile throws an error, but getFoundationModelFromInferenceProfile catches it and returns null. It might be better to re-throw a more specific GatewayError or log the error more verbosely before returning null to aid debugging.
Consider adding a timeout to the fetch call within getInferenceProfile to prevent potential long-running requests to the AWS Bedrock API, which could impact performance or lead to hung connections.
Add more detailed logging within getFoundationModelFromInferenceProfile to indicate cache hits/misses and when an external API call to Bedrock is made for an inference profile. This can be valuable for monitoring and debugging performance.

Sequence Diagram

sequenceDiagram
    participant Client
    participant HandlerUtils
    participant BedrockAPI
    participant BedrockUtils
    participant Cache
    participant AWSBedrockAPI

    Client->>HandlerUtils: POST /v1/chat/completions (model: inferenceProfileARN)
    HandlerUtils: tryPost(fn, c, gatewayRequestURL, params)
    HandlerUtils->>BedrockAPI: BedrockAPIConfig.getBaseURL({c, providerOptions, fn, gatewayRequestURL, params})
    BedrockAPI: getBaseURL is now async
    BedrockAPI->>BedrockAPI: Decode model from params (e.g., model = 'arn:aws:...')
    alt model is an ARN and includes 'arn:aws'
        BedrockAPI->>BedrockUtils: getFoundationModelFromInferenceProfile(c, model, providerOptions)
        BedrockUtils->>Cache: getFromCacheByKey(env(c), cacheKey)
        alt Cache Hit
            Cache-->>BedrockUtils: cachedFoundationModel
            BedrockUtils-->>BedrockAPI: cachedFoundationModel
        else Cache Miss
            BedrockUtils->>BedrockUtils: getInferenceProfile(inferenceProfileIdentifier, awsRegion, awsAccessKeyId, ...)
            BedrockUtils->>BedrockUtils: generateAWSHeaders(..., url, 'GET', 'bedrock', ...)
            BedrockUtils->>AWSBedrockAPI: GET /inference-profiles/{identifier} (with AWS headers)
            AWSBedrockAPI-->>BedrockUtils: BedrockInferenceProfile JSON
            BedrockUtils: Extract foundationModel from inferenceProfile.models[0].modelArn
            BedrockUtils->>Cache: putInCacheWithValue(env(c), cacheKey, foundationModel, 86400)
            BedrockUtils-->>BedrockAPI: foundationModel
        end
        BedrockAPI: Set params.foundationModel = foundationModel
    end
    BedrockAPI-->>HandlerUtils: baseURL
    HandlerUtils->>BedrockAPI: Continue with API call using baseURL
    BedrockAPI->>BedrockUtils: BedrockConfig (uses params.foundationModel if available)

    Note over BedrockAPI,BedrockUtils: Other handlers (getBatchOutput, retrieveFileContent) also await getBaseURL

matter-code-review

This PR adds support for AWS Bedrock inference profiles, allowing the system to work with inference profile ARNs by extracting the underlying foundation model. The implementation is generally good with proper caching and error handling, but I've identified a few improvements that could enhance the code quality and reliability.

matter-code-review · 2025-05-30T14:15:28Z

src/providers/bedrock/utils.ts

+  try {
+    const getFromCacheByKey = c.get('getFromCacheByKey');
+    const putInCacheWithValue = c.get('putInCacheWithValue');
+    const cacheKey = `bedrock-inference-profile-${inferenceProfileIdentifier}`;
+    const cachedFoundationModel = getFromCacheByKey
+      ? await getFromCacheByKey(env(c), cacheKey)
+      : null;
+    if (cachedFoundationModel) {
+      //update ttl, dont't await the result
+      putInCacheWithValue(env(c), cacheKey, cachedFoundationModel, 56400);
+      return cachedFoundationModel;
+    }
+
+    const inferenceProfile = await getInferenceProfile(
+      inferenceProfileIdentifier || '',
+      providerOptions.awsRegion || '',
+      providerOptions.awsAccessKeyId || '',
+      providerOptions.awsSecretAccessKey || '',
+      providerOptions.awsSessionToken || ''
+    );
+
+    // modelArn is always like arn:aws:bedrock:us-east-1::foundation-model/anthropic.claude-v2:1
+    const foundationModel = inferenceProfile?.models?.[0]?.modelArn
+      ?.split('/')
+      ?.pop();
+    putInCacheWithValue(env(c), cacheKey, foundationModel, 56400);
+    return foundationModel;
+  } catch (error) {
+    return null;
+  }


🛠️ Code Refactor

Issue: The error handling in getFoundationModelFromInferenceProfile silently returns null for any error, which could hide important issues.
Fix: Add more specific error handling and logging to help with debugging.
Impact: Improves troubleshooting and error visibility when inference profile resolution fails.

Suggested change

try {

const getFromCacheByKey = c.get('getFromCacheByKey');

const putInCacheWithValue = c.get('putInCacheWithValue');

const cacheKey = `bedrock-inference-profile-${inferenceProfileIdentifier}`;

const cachedFoundationModel = getFromCacheByKey

? await getFromCacheByKey(env(c), cacheKey)

: null;

if (cachedFoundationModel) {

//update ttl, dont't await the result

putInCacheWithValue(env(c), cacheKey, cachedFoundationModel, 56400);

return cachedFoundationModel;

}

const inferenceProfile = await getInferenceProfile(

inferenceProfileIdentifier || '',

providerOptions.awsRegion || '',

providerOptions.awsAccessKeyId || '',

providerOptions.awsSecretAccessKey || '',

providerOptions.awsSessionToken || ''

);

// modelArn is always like arn:aws:bedrock:us-east-1::foundation-model/anthropic.claude-v2:1

const foundationModel = inferenceProfile?.models?.[0]?.modelArn

?.split('/')

?.pop();

putInCacheWithValue(env(c), cacheKey, foundationModel, 56400);

return foundationModel;

} catch (error) {

return null;

}

try {

const getFromCacheByKey = c.get('getFromCacheByKey');

const putInCacheWithValue = c.get('putInCacheWithValue');

const cacheKey = `bedrock-inference-profile-${inferenceProfileIdentifier}`;

const cachedFoundationModel = getFromCacheByKey

? await getFromCacheByKey(env(c), cacheKey)

: null;

if (cachedFoundationModel) {

//update ttl, dont't await the result

putInCacheWithValue(env(c), cacheKey, cachedFoundationModel, 56400);

return cachedFoundationModel;

}

const inferenceProfile = await getInferenceProfile(

inferenceProfileIdentifier || '',

providerOptions.awsRegion || '',

providerOptions.awsAccessKeyId || '',

providerOptions.awsSecretAccessKey || '',

providerOptions.awsSessionToken || ''

);

// modelArn is always like arn:aws:bedrock:us-east-1::foundation-model/anthropic.claude-v2:1

const foundationModel = inferenceProfile?.models?.[0]?.modelArn

?.split('/')

?.pop();

if (!foundationModel) {

console.warn(`No foundation model found in inference profile: ${inferenceProfileIdentifier}`);

return null;

}

putInCacheWithValue(env(c), cacheKey, foundationModel, 56400);

return foundationModel;

} catch (error) {

console.error(`Error resolving foundation model from inference profile ${inferenceProfileIdentifier}:`, error);

return null;

}

src/providers/bedrock/utils.ts

matter-code-review · 2025-05-30T14:15:28Z

src/providers/bedrock/api.ts

+      const foundationModel = model.includes('foundation-model/')
+        ? model.split('/').pop()
+        : await getFoundationModelFromInferenceProfile(
+            c,
+            model,
+            providerOptions
+          );
+      if (foundationModel) {
+        params.foundationModel = foundationModel;
+      }


🛠️ Code Refactor

Issue: The code doesn't handle the case where foundationModel extraction fails but still attempts to use it.
Fix: Add a check to ensure foundationModel is defined before setting it in params.
Impact: Prevents potential undefined values from being used in the model parameter.

Suggested change

const foundationModel = model.includes('foundation-model/')

? model.split('/').pop()

: await getFoundationModelFromInferenceProfile(

c,

model,

providerOptions

);

if (foundationModel) {

params.foundationModel = foundationModel;

}

const foundationModel = model.includes('foundation-model/')

? model.split('/').pop()

: await getFoundationModelFromInferenceProfile(

c,

model,

providerOptions

);

if (foundationModel && foundationModel.length > 0) {

params.foundationModel = foundationModel;

}

matter-code-review · 2025-05-30T14:15:28Z

src/providers/bedrock/utils.ts

+export const getInferenceProfile = async (
+  inferenceProfileIdentifier: string,
+  awsRegion: string,
+  awsAccessKeyId: string,
+  awsSecretAccessKey: string,
+  awsSessionToken?: string
+) => {
+  const url = `https://bedrock.${awsRegion}.amazonaws.com/inference-profiles/${encodeURIComponent(decodeURIComponent(inferenceProfileIdentifier))}`;


🔒 Security Issue Fix

Issue: The getInferenceProfile function doesn't validate the inferenceProfileIdentifier before using it in the URL, which could potentially lead to URL manipulation issues.
Fix: Add validation to ensure the inferenceProfileIdentifier is a valid ARN format before using it.
Impact: Prevents potential security issues related to URL manipulation.

Suggested change

export const getInferenceProfile = async (

inferenceProfileIdentifier: string,

awsRegion: string,

awsAccessKeyId: string,

awsSecretAccessKey: string,

awsSessionToken?: string

) => {

const url = `https://bedrock.${awsRegion}.amazonaws.com/inference-profiles/${encodeURIComponent(decodeURIComponent(inferenceProfileIdentifier))}`;

export const getInferenceProfile = async (

inferenceProfileIdentifier: string,

awsRegion: string,

awsAccessKeyId: string,

awsSecretAccessKey: string,

awsSessionToken?: string

) => {

if (!inferenceProfileIdentifier || !inferenceProfileIdentifier.startsWith('arn:aws')) {

throw new Error('Invalid inference profile identifier format');

}

const url = `https://bedrock.${awsRegion}.amazonaws.com/inference-profiles/${encodeURIComponent(decodeURIComponent(inferenceProfileIdentifier))}`;

matter-code-review · 2025-06-02T06:46:38Z

Important

PR Review Skipped

PR review skipped as per the configuration setting. Run a manually review by commenting /matter review

💡Tips to use Matter AI

Command List

/matter summary: Generate AI Summary for the PR
/matter review: Generate AI Reviews for the latest commit in the PR
/matter review-full: Generate AI Reviews for the complete PR
/matter release-notes: Generate AI release-notes for the PR
/matter : Chat with your PR with Matter AI Agent
/matter remember : Generate AI memories for the PR
/matter explain: Get an explanation of the PR
/matter help: Show the list of available commands and documentation
Need help? Join our Discord server: https://discord.gg/fJU5DvanU3

remove redundant cache write and check if cache function is available before invoking it

matter-code-review · 2025-06-11T07:48:55Z

Important

PR Review Skipped

PR review skipped as per the configuration setting. Run a manually review by commenting /matter review

💡Tips to use Matter AI

Command List

/matter summary: Generate AI Summary for the PR
/matter review: Generate AI Reviews for the latest commit in the PR
/matter review-full: Generate AI Reviews for the complete PR
/matter release-notes: Generate AI release-notes for the PR
/matter : Chat with your PR with Matter AI Agent
/matter remember : Generate AI memories for the PR
/matter explain: Get an explanation of the PR
/matter help: Show the list of available commands and documentation
Need help? Join our Discord server: https://discord.gg/fJU5DvanU3

matter-code-review · 2025-06-11T07:50:14Z

Important

PR Review Skipped

PR review skipped as per the configuration setting. Run a manually review by commenting /matter review

💡Tips to use Matter AI

Command List

/matter summary: Generate AI Summary for the PR
/matter review: Generate AI Reviews for the latest commit in the PR
/matter review-full: Generate AI Reviews for the complete PR
/matter release-notes: Generate AI release-notes for the PR
/matter : Chat with your PR with Matter AI Agent
/matter remember : Generate AI memories for the PR
/matter explain: Get an explanation of the PR
/matter help: Show the list of available commands and documentation
Need help? Join our Discord server: https://discord.gg/fJU5DvanU3

matter-code-review · 2025-06-11T08:02:46Z

Important

PR Review Skipped

PR review skipped as per the configuration setting. Run a manually review by commenting /matter review

💡Tips to use Matter AI

Command List

/matter summary: Generate AI Summary for the PR
/matter review: Generate AI Reviews for the latest commit in the PR
/matter review-full: Generate AI Reviews for the complete PR
/matter release-notes: Generate AI release-notes for the PR
/matter : Chat with your PR with Matter AI Agent
/matter remember : Generate AI memories for the PR
/matter explain: Get an explanation of the PR
/matter help: Show the list of available commands and documentation
Need help? Join our Discord server: https://discord.gg/fJU5DvanU3

application inference profiles support for bedrock

0fc1cfa

narengogi requested review from sk-portkey and VisargD May 30, 2025 14:14

matter-code-review bot reviewed May 30, 2025

View reviewed changes

narengogi added 2 commits June 2, 2025 12:13

pass params

d1e5b61

await promise for getBaseUrl in bedrock

5c9e26a

narengogi changed the title ~~application inference profiles support for bedrock~~ feature: application inference profiles support for bedrock Jun 2, 2025

narengogi changed the title ~~feature: application inference profiles support for bedrock~~ feature: inference profiles for bedrock Jun 2, 2025

b4s36t4 approved these changes Jun 10, 2025

View reviewed changes

changes per comments:

96217dd

remove redundant cache write and check if cache function is available before invoking it

update cache time

6d1e469

VisargD approved these changes Jun 11, 2025

View reviewed changes

Merge branch 'main' into feat/bedrock-inference-profiles

e982ce3

VisargD merged commit a6fe2d9 into Portkey-AI:main Jun 11, 2025
2 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feature: inference profiles for bedrock #1118

feature: inference profiles for bedrock #1118

Uh oh!

narengogi commented May 30, 2025 •

edited

Loading

Uh oh!

matter-code-review bot commented May 30, 2025 •

edited

Loading

Quality Recommendations

Uh oh!

matter-code-review bot left a comment

Uh oh!

matter-code-review bot May 30, 2025

Uh oh!

Uh oh!

Uh oh!

matter-code-review bot May 30, 2025

Uh oh!

matter-code-review bot May 30, 2025

Uh oh!

matter-code-review bot commented Jun 2, 2025

PR Review Skipped

Command List

Uh oh!

matter-code-review bot commented Jun 11, 2025

PR Review Skipped

Command List

Uh oh!

matter-code-review bot commented Jun 11, 2025

PR Review Skipped

Command List

Uh oh!

matter-code-review bot commented Jun 11, 2025

PR Review Skipped

Command List

Uh oh!

Uh oh!

Uh oh!

feature: inference profiles for bedrock #1118

feature: inference profiles for bedrock #1118

Uh oh!

Conversation

narengogi commented May 30, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Testing done:

Guide

Uh oh!

matter-code-review bot commented May 30, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Summary By MatterAI

🔄 What Changed

🔍 Impact of the Change

📁 Total Files Changed

🧪 Test Added

🔒Security Vulnerabilities

Motivation

Type of Change

How Has This Been Tested?

Screenshots (if applicable)

Checklist

Related Issues

Quality Recommendations

Sequence Diagram

Uh oh!

matter-code-review bot left a comment

Choose a reason for hiding this comment

Uh oh!

matter-code-review bot May 30, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

matter-code-review bot May 30, 2025

Choose a reason for hiding this comment

Uh oh!

matter-code-review bot May 30, 2025

Choose a reason for hiding this comment

Uh oh!

matter-code-review bot commented Jun 2, 2025

PR Review Skipped

Command List

Uh oh!

matter-code-review bot commented Jun 11, 2025

PR Review Skipped

Command List

Uh oh!

matter-code-review bot commented Jun 11, 2025

PR Review Skipped

Command List

Uh oh!

matter-code-review bot commented Jun 11, 2025

PR Review Skipped

Command List

Uh oh!

Uh oh!

Uh oh!

narengogi commented May 30, 2025 •

edited

Loading

matter-code-review bot commented May 30, 2025 •

edited

Loading