Skip to content

Add litellm inference #385

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 50 commits into from
Jan 2, 2025
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
50 commits
Select commit Hold shift + click to select a range
17783b2
Added inference using litellm.
JoelNiklaus Nov 7, 2024
9e92150
Add Udmurt (udm) translation literals (#381)
codemurt Nov 8, 2024
30a624c
This PR adds translation literals for Belarusian language. (#382)
Kryuski Nov 8, 2024
6e6fed6
fix: cache directory variable (#378)
NazimHAli Nov 8, 2024
d1d4c69
greedy_until() fix (#344)
vsabolcec Nov 8, 2024
f69811f
Fixed some params in completion call to enable more model providers.
JoelNiklaus Nov 11, 2024
dabb4a7
Added diskcache.
JoelNiklaus Nov 13, 2024
65f759c
Merge branch 'main' into add_litellm_inference
JoelNiklaus Nov 20, 2024
f74afd4
Merge branch 'main' into add_litellm_inference
JoelNiklaus Nov 22, 2024
88a9838
Fix issue for openai evaluation.
JoelNiklaus Nov 25, 2024
02ed461
Added support for stop sequences and generation size.
JoelNiklaus Nov 26, 2024
34596c2
Merge branch 'main' into add_litellm_inference
JoelNiklaus Nov 26, 2024
190738f
Fixed issue with too many concurrent calls to APIs.
JoelNiklaus Nov 27, 2024
2bb1917
Merge branch 'main' into add_litellm_inference
clefourrier Nov 28, 2024
81e4404
Merge branch 'main' into add_litellm_inference
JoelNiklaus Dec 4, 2024
ebdd900
Merge branch 'main' into add_litellm_inference
NathanHB Dec 5, 2024
251e181
few fixes
NathanHB Dec 6, 2024
47b1888
Fixed issues with stop_sequence, max_completion_tokens and system_pro…
JoelNiklaus Dec 9, 2024
20a1191
Merge branch 'main' into add_litellm_inference
JoelNiklaus Dec 9, 2024
ade8f0c
Revert weird change to __main__.py.
JoelNiklaus Dec 9, 2024
a2587d6
Made configuration simpler.
JoelNiklaus Dec 9, 2024
7c0856e
Merge branch 'main' into add_litellm_inference
JoelNiklaus Dec 12, 2024
932fd2c
Fixed import issues.
JoelNiklaus Dec 12, 2024
8fc9b13
Merge branch 'main' into add_litellm_inference
NathanHB Dec 16, 2024
45d6d1d
fix import location
NathanHB Dec 16, 2024
2a23836
Merge branch 'add_litellm_inference' of github.com:JoelNiklaus/lighte…
NathanHB Dec 16, 2024
cca1446
Merge branch 'main' into add_litellm_inference
JoelNiklaus Dec 16, 2024
1a10351
Enabled passing through system prompt to the models in the requests.
JoelNiklaus Dec 16, 2024
ff6d5de
Fixed some bugs.
JoelNiklaus Dec 17, 2024
8d831b8
Merge branch 'main' into add_litellm_inference
JoelNiklaus Dec 17, 2024
5115403
Made litellm inference robust to content management errors.
JoelNiklaus Dec 17, 2024
78789c1
allow bette rmessage managment for litellm
NathanHB Dec 17, 2024
3ebff6c
Merge branch 'main' into add_litellm_inference
NathanHB Dec 17, 2024
be77b15
allow system prompt to be passed to litellm models
NathanHB Dec 17, 2024
21d6112
Merge branch 'main' into add_litellm_inference
JoelNiklaus Dec 17, 2024
d045d92
use system prompt from the request and use litellm encode functino as…
NathanHB Dec 18, 2024
f1ed682
fixes from review
NathanHB Dec 18, 2024
ec306fd
Merge branch 'add_litellm_inference' of github.com:JoelNiklaus/lighte…
NathanHB Dec 18, 2024
bae4506
fix tests
NathanHB Dec 18, 2024
6b0cb60
fix tests
NathanHB Dec 18, 2024
c826b0e
Merge branch 'main' into add_litellm_inference
JoelNiklaus Dec 18, 2024
a6747f4
remove unecessary doc
NathanHB Dec 19, 2024
5554787
Merge branch 'add_litellm_inference' of github.com:JoelNiklaus/lighte…
NathanHB Dec 19, 2024
5b2b72d
Update src/lighteval/models/litellm_model.py
NathanHB Dec 19, 2024
0265a74
Update src/lighteval/models/litellm_model.py
NathanHB Dec 19, 2024
4fa8311
Merge branch 'main' into add_litellm_inference
NathanHB Dec 19, 2024
86dd849
Support retrying of empty cached model responses.
JoelNiklaus Dec 21, 2024
db983e3
Merge branch 'main' into add_litellm_inference
JoelNiklaus Dec 22, 2024
221d5d5
Fixed error when stop sequence is None.
JoelNiklaus Dec 22, 2024
81f02ca
Added support for litellm as judge backend.
JoelNiklaus Dec 22, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion .github/workflows/tests.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -25,7 +25,7 @@ jobs:
cache: 'pip'
- name: Install lighteval in editable mode
run: |
pip install -e .[dev,extended_tasks,multilingual]
pip install -e .[dev,extended_tasks,multilingual,litellm]
- name: Get cached files
uses: actions/cache@v4
id: get-cache
Expand Down
1 change: 1 addition & 0 deletions pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -82,6 +82,7 @@ dependencies = [
]

[project.optional-dependencies]
litellm = ["litellm", "diskcache"]
tgi = ["text-generation==0.6.0"]
optimum = ["optimum==1.12.0"]
quantization = ["bitsandbytes>=0.41.0", "auto-gptq>=0.4.2"]
Expand Down
109 changes: 109 additions & 0 deletions src/lighteval/main_endpoint.py
Original file line number Diff line number Diff line change
Expand Up @@ -367,3 +367,112 @@ def tgi(
pipeline.save_and_push_results()

return results


@app.command(rich_help_panel="Evaluation Backends")
def litellm(
# === general ===
model_name: Annotated[
str, Argument(help="The model name to evaluate (has to be available through the litellm API.")
],
tasks: Annotated[str, Argument(help="Comma-separated list of tasks to evaluate on.")],
# === Common parameters ===
use_chat_template: Annotated[
bool, Option(help="Use chat template for evaluation.", rich_help_panel=HELP_PANEL_NAME_4)
] = False,
system_prompt: Annotated[
Optional[str], Option(help="Use system prompt for evaluation.", rich_help_panel=HELP_PANEL_NAME_4)
] = None,
dataset_loading_processes: Annotated[
int, Option(help="Number of processes to use for dataset loading.", rich_help_panel=HELP_PANEL_NAME_1)
] = 1,
custom_tasks: Annotated[
Optional[str], Option(help="Path to custom tasks directory.", rich_help_panel=HELP_PANEL_NAME_1)
] = None,
cache_dir: Annotated[
str, Option(help="Cache directory for datasets and models.", rich_help_panel=HELP_PANEL_NAME_1)
] = CACHE_DIR,
num_fewshot_seeds: Annotated[
int, Option(help="Number of seeds to use for few-shot evaluation.", rich_help_panel=HELP_PANEL_NAME_1)
] = 1,
# === saving ===
output_dir: Annotated[
str, Option(help="Output directory for evaluation results.", rich_help_panel=HELP_PANEL_NAME_2)
] = "results",
push_to_hub: Annotated[
bool, Option(help="Push results to the huggingface hub.", rich_help_panel=HELP_PANEL_NAME_2)
] = False,
push_to_tensorboard: Annotated[
bool, Option(help="Push results to tensorboard.", rich_help_panel=HELP_PANEL_NAME_2)
] = False,
public_run: Annotated[
bool, Option(help="Push results and details to a public repo.", rich_help_panel=HELP_PANEL_NAME_2)
] = False,
results_org: Annotated[
Optional[str], Option(help="Organization to push results to.", rich_help_panel=HELP_PANEL_NAME_2)
] = None,
save_details: Annotated[
bool, Option(help="Save detailed, sample per sample, results.", rich_help_panel=HELP_PANEL_NAME_2)
] = False,
# === debug ===
max_samples: Annotated[
Optional[int], Option(help="Maximum number of samples to evaluate on.", rich_help_panel=HELP_PANEL_NAME_3)
] = None,
override_batch_size: Annotated[
int, Option(help="Override batch size for evaluation.", rich_help_panel=HELP_PANEL_NAME_3)
] = -1,
job_id: Annotated[
int, Option(help="Optional job id for future refenrence.", rich_help_panel=HELP_PANEL_NAME_3)
] = 0,
):
"""
Evaluate models using LiteLLM as backend.
"""

from lighteval.logging.evaluation_tracker import EvaluationTracker
from lighteval.models.litellm_model import LiteLLMModelConfig
from lighteval.pipeline import EnvConfig, ParallelismManager, Pipeline, PipelineParameters

env_config = EnvConfig(token=TOKEN, cache_dir=cache_dir)
evaluation_tracker = EvaluationTracker(
output_dir=output_dir,
save_details=save_details,
push_to_hub=push_to_hub,
push_to_tensorboard=push_to_tensorboard,
public=public_run,
hub_results_org=results_org,
)

# TODO (nathan): better handling of model_args
parallelism_manager = ParallelismManager.NONE

model_config = LiteLLMModelConfig(model=model_name)

pipeline_params = PipelineParameters(
launcher_type=parallelism_manager,
env_config=env_config,
job_id=job_id,
dataset_loading_processes=dataset_loading_processes,
custom_tasks_directory=custom_tasks,
override_batch_size=override_batch_size,
num_fewshot_seeds=num_fewshot_seeds,
max_samples=max_samples,
use_chat_template=use_chat_template,
system_prompt=system_prompt,
)
pipeline = Pipeline(
tasks=tasks,
pipeline_parameters=pipeline_params,
evaluation_tracker=evaluation_tracker,
model_config=model_config,
)

pipeline.evaluate()

pipeline.show_results()

results = pipeline.get_results()

pipeline.save_and_push_results()

return results
39 changes: 36 additions & 3 deletions src/lighteval/metrics/llm_as_judge.py
Original file line number Diff line number Diff line change
Expand Up @@ -28,7 +28,7 @@

from tqdm import tqdm

from lighteval.utils.imports import is_openai_available, is_vllm_available
from lighteval.utils.imports import is_litellm_available, is_openai_available, is_vllm_available


logging.getLogger("openai").setLevel(logging.ERROR)
Expand Down Expand Up @@ -73,7 +73,7 @@ def __init__(
model: str,
templates: Callable,
process_judge_response: Callable,
judge_backend: Literal["openai", "transformers", "tgi", "vllm"],
judge_backend: Literal["litellm", "openai", "transformers", "tgi", "vllm"],
url: str | None = None,
api_key: str | None = None,
):
Expand All @@ -93,7 +93,7 @@ def __init__(

def __lazy_load_client(self):
match self.backend:
# Wether we use openai or TGI models, we go trhough the openai API
# Wether we use openai or TGI models, we go through the openai API
# to route to the endpoint
case "openai" | "tgi" if is_openai_available():
if self.client is None:
Expand All @@ -104,6 +104,8 @@ def __lazy_load_client(self):
else:
self.client = OpenAI(base_url=self.url, api_key=self.api_key)
return self.__call_api_parallel
case "litellm" if is_litellm_available():
return self.__call_litellm
case "vllm" if is_vllm_available():
if self.pipe is None:
from vllm import LLM, SamplingParams
Expand Down Expand Up @@ -187,6 +189,37 @@ def __call_vllm(self, prompt):
outputs = [output.outputs[0].text for output in output]
return outputs

def __call_litellm(self, prompts):
import litellm

def __call_api(prompt):
for _ in range(self.API_MAX_RETRY):
try:
response = litellm.completion(
model=self.model,
messages=prompt,
response_format={"type": "text"},
max_tokens=512,
n=1,
caching=True,
)
text = response.choices[0].message.content
return text
except Exception as e:
logger.warning(f"{type(e), e}")
time.sleep(self.API_RETRY_SLEEP)
raise Exception("Failed to get response from the API")

results = []
with ThreadPoolExecutor(100) as executor:
for entry in tqdm(executor.map(__call_api, prompts), total=len(prompts)):
results.append(entry)

if None in results:
raise ValueError("Some entries are not annotated due to errors in annotate_p, please inspect and retry.")

return results

def __call_api_parallel(self, prompts):
results = []
with ThreadPoolExecutor(100) as executor:
Expand Down
5 changes: 4 additions & 1 deletion src/lighteval/metrics/metrics_sample.py
Original file line number Diff line number Diff line change
Expand Up @@ -858,7 +858,7 @@ def __init__(
judge_model_name: str,
template: Callable,
process_judge_response: Callable,
judge_backend: Literal["openai", "transformers", "vllm", "tgi"],
judge_backend: Literal["litellm", "openai", "transformers", "vllm", "tgi"],
short_judge_name: str | None = None,
) -> None:
match judge_backend:
Expand All @@ -871,6 +871,9 @@ def __init__(
case "tgi":
api_key = os.getenv("HF_TOKEN")
url = "https://api-inference.huggingface.co/v1/"
case "litellm":
api_key = None
url = None
case "transformers" | "vllm":
api = HfApi()
models = api.list_models(model_name=judge_model_name)
Expand Down
1 change: 0 additions & 1 deletion src/lighteval/models/endpoints/openai_model.py
Original file line number Diff line number Diff line change
Expand Up @@ -145,7 +145,6 @@ def greedy_until(

Args:
requests (list[Request]): list of requests containing the context and ending conditions.
disable_tqdm (bool, optional): Whether to disable the progress bar. Defaults to False.
override_bs (int, optional): Override the batch size for generation. Defaults to None.

Returns:
Expand Down
Loading
Loading