Skip to content

Commit 876f721

Browse files
Supports extended tasks (#101)
* init - now gives the path with an arg, maybe will remove * allows several custom task modules to be loaded * fix quality --------- Co-authored-by: Nathan Habib <[email protected]> Co-authored-by: Nathan Habib <[email protected]>
1 parent 165ebc9 commit 876f721

File tree

12 files changed

+88
-30
lines changed

12 files changed

+88
-30
lines changed

README.md

Lines changed: 10 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -210,9 +210,13 @@ However, we are very grateful to the Harness and HELM teams for their continued
210210
If your new task or metric has requirements, add a specific `requirements.txt` file with your evaluation.
211211

212212
### Adding a new task
213-
To add a new task, first either open an issue, to determine whether it will be integrated in the core evaluations of lighteval, or in the community tasks, and **add its dataset** on the hub.
214-
Note: Core evaluations are evals we will add to our test suite to ensure non regression through time, and which already see a high usage in the community.
215-
A popular community evaluation can move to become a core evaluation through time.
213+
To add a new task, first either open an issue, to determine whether it will be integrated in the core evaluations of lighteval, in the extended tasks, or in the community tasks, and **add its dataset** on the hub.
214+
215+
- Core evaluations are evaluation which only require standard logic in their metrics and processing, and that we will add to our test suite to ensure non regression through time. They already see a high usage in the community.
216+
- Extended evaluations are evaluations which require custom logic in their metrics (complex normalisation, an LLM as a judge, ...), that we added to facilitate the life of users. They already see a high usage in the community.
217+
- Community evaluations are submissions by the community of new tasks.
218+
219+
A popular community evaluation can move to becoming an extended or core evaluation through time.
216220

217221
#### Core evaluations
218222
Prompt function: **find a suitable prompt function** in `src.lighteval.tasks.task_prompt_formatting.py`, or code your own. This function must output a `Doc` object, which should contain `query`, your prompt, and either `gold`, the gold output, or `choices` and `gold_index`, the list of choices and index or indices of correct answers. If your query contains an instruction which should not be repeated in a few shot setup, add it to an `instruction` field.
@@ -241,6 +245,9 @@ Summary: create a **line summary** of your evaluation, in `src/lighteval/tasks/t
241245

242246
Make sure you can launch your model with your new task using `--tasks lighteval|yournewtask|2|0`.
243247

248+
### Extended evaluations
249+
Proceed as for community evaluations, but in the `extended_tasks` folder.
250+
244251
#### Community evaluations
245252
Copy the `community_tasks/_template.yml` to `community_tasks/yourevalname.py` and edit it to add your custom tasks (the parameters you can use are explained above). It contains an interesting mechanism if the dataset you are adding contains a lot of subsets.
246253

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -23,7 +23,7 @@
2323

2424
import langdetect
2525

26-
import tasks_examples.custom_tasks_with_custom_metrics.ifeval.instructions_utils as instructions_util
26+
import extended_tasks.ifeval.instructions_utils as instructions_util
2727

2828

2929
logger = logging.getLogger(__name__)
Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -13,7 +13,7 @@
1313
# limitations under the License.
1414

1515
"""Registry of all instructions."""
16-
import tasks_examples.custom_tasks_with_custom_metrics.ifeval.instructions as instructions
16+
import extended_tasks.ifeval.instructions as instructions
1717

1818

1919
_KEYWORD = "keywords:"

tasks_examples/custom_tasks_with_custom_metrics/ifeval/ifeval.py renamed to extended_tasks/ifeval/main.py

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -23,7 +23,7 @@
2323
import numpy as np
2424
from aenum import extend_enum
2525

26-
import tasks_examples.custom_tasks_with_custom_metrics.ifeval.instructions_registry as instructions_registry
26+
import extended_tasks.ifeval.instructions_registry as instructions_registry
2727
from lighteval.metrics import Metrics
2828
from lighteval.metrics.utils import (
2929
MetricCategory,
@@ -38,7 +38,7 @@
3838
ifeval = LightevalTaskConfig(
3939
name="ifeval",
4040
prompt_function="ifeval_prompt",
41-
suite=["custom"],
41+
suite=["extended"],
4242
hf_repo="wis-k/instruction-following-eval",
4343
hf_subset="default",
4444
metric=["ifeval_metric"],

pyproject.toml

Lines changed: 3 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -78,7 +78,6 @@ dependencies = [
7878
accelerate = ["accelerate"]
7979
tgi = ["text-generation==0.6.0"]
8080
optimum = ["optimum==1.12.0"]
81-
# Quantization and adapter weights
8281
quantization = ["bitsandbytes>=0.41.0", "auto-gptq>=0.4.2"]
8382
adapters = ["peft==0.3.0"]
8483
nanotron = [
@@ -88,7 +87,9 @@ nanotron = [
8887
quality = ["ruff==v0.2.2","pre-commit"]
8988
tests = ["pytest==7.4.0"]
9089
dev = ["lighteval[accelerate,quality,tests]"]
91-
90+
extended_tasks = [
91+
"langdetect", #ifeval
92+
]
9293

9394
[project.urls]
9495
Homepage = "https://github.com/huggingface/lighteval"

run_evals_accelerate.py

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -103,6 +103,12 @@ def get_parser():
103103
default=None,
104104
help="Path to a file with custom tasks (a TASK list of dict and potentially prompt formating functions)",
105105
)
106+
parser.add_argument(
107+
"--extended_tasks",
108+
type=str,
109+
default=None,
110+
help="Path to the folder which contains all extended tasks",
111+
)
106112
group.add_argument(
107113
"--tasks",
108114
type=str,

src/lighteval/main_accelerate.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -81,7 +81,7 @@ def main(args):
8181
with accelerator.main_process_first() if accelerator is not None else nullcontext():
8282
task_names_list, few_shots_dict = taskinfo_selector(args.tasks)
8383
task_dict = Registry(cache_dir=env_config.cache_dir).get_task_dict(
84-
task_names_list, custom_tasks=args.custom_tasks
84+
task_names_list, custom_tasks=args.custom_tasks, extended_tasks=args.extended_tasks
8585
)
8686
LightevalTask.load_datasets(task_dict.values(), args.dataset_loading_processes)
8787

src/lighteval/main_nanotron.py

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -135,7 +135,8 @@ def main(
135135

136136
task_names_list, few_shots_dict = taskinfo_selector(tasks_selection)
137137
task_dict = Registry(cache_dir=cache_dir).get_task_dict(
138-
task_names_list, custom_tasks=lighteval_config.tasks.custom_tasks
138+
task_names_list,
139+
custom_tasks=lighteval_config.tasks.custom_tasks,
139140
)
140141
# Loading all the dataset in a distributed manner
141142
LightevalTask.load_datasets(task_dict.values(), lighteval_config.tasks.dataset_loading_processes)

src/lighteval/tasks/lighteval_task.py

Lines changed: 22 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -145,7 +145,9 @@ def __post_init__(self):
145145

146146

147147
class LightevalTask:
148-
def __init__(self, name: str, cfg: LightevalTaskConfig, cache_dir: Optional[str] = None, custom_tasks_module=None):
148+
def __init__( # noqa: C901
149+
self, name: str, cfg: LightevalTaskConfig, cache_dir: Optional[str] = None, custom_tasks_module: list = None
150+
):
149151
"""
150152
Initialize a LightEval task.
151153
@@ -202,16 +204,26 @@ def __init__(self, name: str, cfg: LightevalTaskConfig, cache_dir: Optional[str]
202204
# to use once prompt formatting is managed as a module
203205
if custom_tasks_module is None:
204206
self.formatter = getattr(tasks_prompt_formatting, cfg.prompt_function)
205-
elif hasattr(custom_tasks_module, cfg.prompt_function):
206-
# If we have a prompt in both the custom_tasks_module and our tasks_prompt_formatting
207-
# We take the prompt from the custom_tasks_module
208-
if hasattr(tasks_prompt_formatting, cfg.prompt_function):
209-
hlog_warn(
210-
f"Be careful you are using custom prompt function {cfg.prompt_function} and not the default one."
211-
)
212-
self.formatter = getattr(custom_tasks_module, cfg.prompt_function)
213207
else:
214-
self.formatter = getattr(tasks_prompt_formatting, cfg.prompt_function)
208+
formatter = []
209+
for module in custom_tasks_module:
210+
if hasattr(module, cfg.prompt_function):
211+
formatter.append(getattr(module, cfg.prompt_function))
212+
213+
if len(formatter) == 0: # Default version
214+
self.formatter = getattr(tasks_prompt_formatting, cfg.prompt_function)
215+
elif len(formatter) == 1:
216+
# If we have a prompt in both the module and our tasks_prompt_formatting
217+
# We take the prompt from the module
218+
if hasattr(tasks_prompt_formatting, cfg.prompt_function):
219+
hlog_warn(
220+
f"Be careful you are using custom prompt function {cfg.prompt_function} and not the default one."
221+
)
222+
self.formatter = getattr(module, cfg.prompt_function)
223+
else:
224+
raise Exception(
225+
f"You defined the prompt function {cfg.prompt_function} several times in the different custom modules you are loading."
226+
)
215227
self.generation_size = cfg.generation_size
216228
self.stop_sequence = cfg.stop_sequence
217229
self.output_regex = cfg.output_regex

src/lighteval/tasks/registry.py

Lines changed: 40 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -39,8 +39,19 @@
3939
# Original follows the original implementation as closely as possible
4040
# Leaderboard are the evaluations we fixed on the open llm leaderboard - you should get similar results
4141
# Community are for community added evaluations
42+
# Extended are for evaluations with custom logic
4243
# Custom is for all the experiments you might want to do!
43-
DEFAULT_SUITES = ["helm", "bigbench", "harness", "leaderboard", "lighteval", "original", "custom", "community"]
44+
DEFAULT_SUITES = [
45+
"helm",
46+
"bigbench",
47+
"harness",
48+
"leaderboard",
49+
"lighteval",
50+
"original",
51+
"extended",
52+
"custom",
53+
"community",
54+
]
4455

4556
TRUNCATE_FEW_SHOTS_DEFAULTS = True
4657

@@ -97,14 +108,18 @@ def get_task_class(
97108
)
98109

99110
def get_task_dict(
100-
self, task_name_list: List[str], custom_tasks: Optional[Union[str, ModuleType]] = None
111+
self,
112+
task_name_list: List[str],
113+
custom_tasks: Optional[Union[str, ModuleType]] = None,
114+
extended_tasks: str = None,
101115
) -> Dict[str, LightevalTask]:
102116
"""
103117
Get a dictionary of tasks based on the task name list.
104118
105119
Args:
106120
task_name_list (List[str]): A list of task names.
107121
custom_tasks (Optional[Union[str, ModuleType]]): Path to the custom tasks file or name of a module to import containing custom tasks or the module it-self
122+
extended_tasks (Optional[str]): The path to the extended tasks group of submodules
108123
109124
Returns:
110125
Dict[str, LightevalTask]: A dictionary containing the tasks.
@@ -115,13 +130,20 @@ def get_task_dict(
115130
"""
116131
# Import custom tasks provided by the user
117132
custom_tasks_registry = None
118-
custom_tasks_module = None
133+
custom_tasks_module = []
134+
TASKS_TABLE = []
119135
if custom_tasks is not None:
120-
custom_tasks_module = create_custom_tasks_module(custom_tasks=custom_tasks)
121-
if custom_tasks_module is not None:
122-
custom_tasks_registry = create_config_tasks(
123-
meta_table=custom_tasks_module.TASKS_TABLE, cache_dir=self.cache_dir
136+
custom_tasks_module.append(create_custom_tasks_module(custom_tasks=custom_tasks))
137+
if extended_tasks is not None:
138+
hlog_warn(
139+
"You are using extended_tasks. Make sure you installed their dependencies using `pip install -e .[extended_tasks]`."
124140
)
141+
custom_tasks_module.extend(load_extended_tasks_modules(extended_tasks_path=extended_tasks))
142+
for module in custom_tasks_module:
143+
TASKS_TABLE.extend(module.TASKS_TABLE)
144+
145+
if len(TASKS_TABLE) > 0:
146+
custom_tasks_registry = create_config_tasks(meta_table=TASKS_TABLE, cache_dir=self.cache_dir)
125147
hlog(custom_tasks_registry)
126148

127149
# Select relevant tasks given the subset asked for by the user
@@ -133,6 +155,16 @@ def get_task_dict(
133155
return tasks_dict
134156

135157

158+
def load_extended_tasks_modules(extended_tasks_path: str):
159+
all_modules = []
160+
for folder in os.listdir(extended_tasks_path):
161+
cur_module = create_custom_tasks_module(os.path.join(extended_tasks_path, folder, "main.py"))
162+
hlog(f"Successfully loaded extended task: {folder}.")
163+
all_modules.append(cur_module)
164+
165+
return all_modules
166+
167+
136168
def create_custom_tasks_module(custom_tasks: Union[str, ModuleType]) -> ModuleType:
137169
"""Creates a custom task module to load tasks defined by the user in their own file.
138170
@@ -153,7 +185,7 @@ def create_custom_tasks_module(custom_tasks: Union[str, ModuleType]) -> ModuleTy
153185

154186

155187
def get_custom_tasks(custom_tasks: Union[str, ModuleType]) -> Tuple[ModuleType, str]:
156-
"""Get custom tasks from the given custom tasks file or module.
188+
"""Get all the custom tasks available from the given custom tasks file or module.
157189
158190
Args:
159191
custom_tasks (Optional[Union[str, ModuleType]]): Path to the custom tasks file or name of a module to import containing custom tasks or the module it-self

tasks_examples/custom_tasks_with_custom_metrics/ifeval/requirements.txt

Lines changed: 0 additions & 1 deletion
This file was deleted.

0 commit comments

Comments
 (0)