Add Remote LLM Support for Perturbation-Based Attribution via RemoteLLMAttribution and VLLMProvider #1544
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This PR introduces support for applying Captum's perturbation-based attribution algorithms to remotely hosted large language models (LLMs). It enables users to perform interpretability analyses on models served via APIs, such as those using vLLM, without requiring access to model internals.
Motivation:
Captum’s current LLM attribution framework requires access to local models, limiting its usability in production and hosted environments. With the rise of scalable remote inference backends and OpenAI-compatible APIs, this PR allows Captum to be used for black-box interpretability with hosted models, as long as they return token-level log probabilities.
This integration also aligns with ongoing efforts like llama-stack, which aims to provide a unified API layer for inference (and also for RAG, Agents, Tools, Safety, Evals, and Telemetry) across multiple backends—further expanding Captum’s reach for model explainability.
Key Additions:
RemoteLLMProvider
Interface:A generic interface for fetching log probabilities from remote LLMs, making it easy to plug in various inference backends.
VLLMProvider
Implementation:A concrete subclass of
RemoteLLMProvider
tailored for models served using vLLM, handling the specifics of communicating with vLLM endpoints to retrieve necessary data for attribution.RemoteLLMAttribution
class:A subclass of
LLMAttribution
that overrides internal methods to work with remote providers. It enables all perturbation-based algorithms (e.g., Feature Ablation, Shapley Values, KernelSHAP) using only the output logprobs from a remote LLM.Used openai client under the hood for querying remote models, as many LLM serving solutions now support the OpenAI-compatible API format (e.g., vLLM OpenAI server and projects like
llama-stack
(see here for ongoing work related to this).Issue(s) related to this: