Skip to content

Commit b99d8bd

Browse files
Simplified system for extended tasks (#123)
--------- Co-authored-by: Nathan Habib <[email protected]>
1 parent 8dd8323 commit b99d8bd

File tree

12 files changed

+80
-43
lines changed

12 files changed

+80
-43
lines changed

.github/workflows/tests.yaml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -26,7 +26,7 @@ jobs:
2626
cache: 'pip'
2727
- name: Install lighteval in editable mode
2828
run: |
29-
pip install -e .[dev]
29+
pip install -e .[dev,extended_tasks]
3030
- name: Get cached files
3131
uses: actions/cache@v2
3232
id: get-cache

README.md

Lines changed: 17 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -167,28 +167,38 @@ python run_evals_accelerate.py \
167167

168168
Independently of the default tasks provided in `lighteval` that you will find in the `tasks_table.jsonl` file, you can use `lighteval` to evaluate models on tasks that require special processing (or have been added by the community). These tasks have their own evaluation suites and are defined as follows:
169169

170-
* `extended`: tasks which have complex pre- or post-processing and are added by the `lighteval` maintainers. See the [`extended_tasks`](./extended_tasks) folder for examples.
170+
* `extended`: tasks which have complex pre- or post-processing and are added by the `lighteval` maintainers. See the [`extended_tasks`](./src/lighteval/tasks/extended_tasks) folder for examples.
171171
* `community`: tasks which have been added by the community. See the [`community_tasks`](./community_tasks) folder for examples.
172172
* `custom`: tasks which are defined locally and not present in the core library. Use this suite if you want to experiment with designing a special metric or task.
173173

174-
For example, to run an extended task you can run:
174+
175+
For example, to run an extended task like ifeval, you can run:
176+
```shell
177+
python run_evals_accelerate.py \
178+
--model_args "pretrained=HuggingFaceH4/zephyr-7b-beta" \
179+
--use_chat_template \ # optional, if you want to run the evaluation with the chat template
180+
--tasks "extended|ifeval|0|0" \
181+
--output_dir "./evals"
182+
```
183+
184+
To run a community or custom task, you can use (note the custom_tasks flag):
175185

176186
```shell
177187
python run_evals_accelerate.py \
178188
--model_args="pretrained=<path to model on the hub>"\
179189
--tasks <task parameters> \
180-
--extended_tasks "extended_tasks" \
190+
--custom_tasks <path to your custom or community task> \
181191
--output_dir output_dir
182192
```
183193

184-
For example, to launch `lighteval` on `ifeval` for `HuggingFaceH4/zephyr-7b-beta`, run:
194+
For example, to launch `lighteval` on `arabic_mmlu:abstract_algebra` for `HuggingFaceH4/zephyr-7b-beta`, run:
185195

186196
```shell
187197
python run_evals_accelerate.py \
188198
--model_args "pretrained=HuggingFaceH4/zephyr-7b-beta" \
189199
--use_chat_template \ # optional, if you want to run the evaluation with the chat template
190-
--tasks "extended|ifeval|0|0" \
191-
--extended_tasks "extended_tasks" \
200+
--tasks "community|arabic_mmlu:abstract_algebra|5|1" \
201+
--custom_tasks "community_tasks/arabic_evals" \
192202
--output_dir "./evals"
193203
```
194204

@@ -209,7 +219,7 @@ However, we are very grateful to the Harness and HELM teams for their continued
209219
- [logging](https://github.com/huggingface/lighteval/tree/main/src/lighteval/logging): Our loggers, to display experiment information and push it to the hub after a run
210220
- [metrics](https://github.com/huggingface/lighteval/tree/main/src/lighteval/metrics): All the available metrics you can use. They are described in metrics, and divided between sample metrics (applied at the sample level, such as a prediction accuracy) and corpus metrics (applied over the whole corpus). You'll also find available normalisation functions.
211221
- [models](https://github.com/huggingface/lighteval/tree/main/src/lighteval/models): Possible models to use. We cover transformers (base_model), with adapter or delta weights, as well as TGI models locally deployed (it's likely the code here is out of date though), and brrr/nanotron models.
212-
- [tasks](https://github.com/huggingface/lighteval/tree/main/src/lighteval/tasks): Available tasks. The complete list is in `tasks_table.jsonl`, and you'll find all the prompts in `tasks_prompt_formatting.py`.
222+
- [tasks](https://github.com/huggingface/lighteval/tree/main/src/lighteval/tasks): Available tasks. The complete list is in `tasks_table.jsonl`, and you'll find all the prompts in `tasks_prompt_formatting.py`. Popular tasks requiring custom logic are exceptionally added in the [extended tasks](https://github.com/huggingface/lighteval/blob/main/src/lighteval/tasks/extended).
213223
- [tasks_examples](https://github.com/huggingface/lighteval/tree/main/tasks_examples) contains a list of available tasks you can launch. We advise using tasks in the `recommended_set`, as it's possible that some of the other tasks need double checking.
214224
- [tests](https://github.com/huggingface/lighteval/tree/main/tests) contains our test suite, that we run at each PR to prevent regressions in metrics/prompts/tasks, for a subset of important tasks.
215225

@@ -252,9 +262,6 @@ Summary: create a **line summary** of your evaluation, in `src/lighteval/tasks/t
252262

253263
Make sure you can launch your model with your new task using `--tasks lighteval|yournewtask|2|0`.
254264

255-
### Extended evaluations
256-
Proceed as for community evaluations, but in the `extended_tasks` folder.
257-
258265
#### Community evaluations
259266
Copy the `community_tasks/_template.yml` to `community_tasks/yourevalname.py` and edit it to add your custom tasks (the parameters you can use are explained above). It contains an interesting mechanism if the dataset you are adding contains a lot of subsets.
260267

run_evals_accelerate.py

Lines changed: 0 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -104,12 +104,6 @@ def get_parser():
104104
default=None,
105105
help="Path to a file with custom tasks (a TASK list of dict and potentially prompt formating functions)",
106106
)
107-
parser.add_argument(
108-
"--extended_tasks",
109-
type=str,
110-
default=None,
111-
help="Path to the folder which contains all extended tasks",
112-
)
113107
group.add_argument(
114108
"--tasks",
115109
type=str,

src/lighteval/main_accelerate.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -81,7 +81,7 @@ def main(args):
8181
with accelerator.main_process_first() if accelerator is not None else nullcontext():
8282
task_names_list, few_shots_dict = taskinfo_selector(args.tasks)
8383
task_dict = Registry(cache_dir=env_config.cache_dir).get_task_dict(
84-
task_names_list, custom_tasks=args.custom_tasks, extended_tasks=args.extended_tasks
84+
task_names_list, custom_tasks=args.custom_tasks
8585
)
8686
LightevalTask.load_datasets(task_dict.values(), args.dataset_loading_processes)
8787

Lines changed: 33 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,33 @@
1+
# MIT License
2+
3+
# Copyright (c) 2024 The HuggingFace Team
4+
5+
# Permission is hereby granted, free of charge, to any person obtaining a copy
6+
# of this software and associated documentation files (the "Software"), to deal
7+
# in the Software without restriction, including without limitation the rights
8+
# to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
9+
# copies of the Software, and to permit persons to whom the Software is
10+
# furnished to do so, subject to the following conditions:
11+
12+
# The above copyright notice and this permission notice shall be included in all
13+
# copies or substantial portions of the Software.
14+
15+
# THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16+
# IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17+
# FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18+
# AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19+
# LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20+
# OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
21+
# SOFTWARE.
22+
23+
from lighteval.utils import can_load_extended_tasks
24+
25+
26+
if can_load_extended_tasks():
27+
import lighteval.tasks.extended.ifeval.main as ifeval
28+
import lighteval.tasks.extended.tiny_benchmarks.main as tiny_benchmarks
29+
30+
AVAILABLE_EXTENDED_TASKS_MODULES = [ifeval, tiny_benchmarks]
31+
32+
else:
33+
AVAILABLE_EXTENDED_TASKS_MODULES = []

extended_tasks/ifeval/instructions.py renamed to src/lighteval/tasks/extended/ifeval/instructions.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -23,7 +23,7 @@
2323

2424
import langdetect
2525

26-
import extended_tasks.ifeval.instructions_utils as instructions_util
26+
import lighteval.tasks.extended.ifeval.instructions_utils as instructions_util
2727

2828

2929
logger = logging.getLogger(__name__)

extended_tasks/ifeval/instructions_registry.py renamed to src/lighteval/tasks/extended/ifeval/instructions_registry.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -13,7 +13,7 @@
1313
# limitations under the License.
1414

1515
"""Registry of all instructions."""
16-
import extended_tasks.ifeval.instructions as instructions
16+
import lighteval.tasks.extended.ifeval.instructions as instructions
1717

1818

1919
_KEYWORD = "keywords:"

extended_tasks/ifeval/main.py renamed to src/lighteval/tasks/extended/ifeval/main.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -23,7 +23,7 @@
2323
import numpy as np
2424
from aenum import extend_enum
2525

26-
import extended_tasks.ifeval.instructions_registry as instructions_registry
26+
import lighteval.tasks.extended.ifeval.instructions_registry as instructions_registry
2727
from lighteval.metrics import Metrics
2828
from lighteval.metrics.utils import (
2929
MetricCategory,

extended_tasks/tiny_benchmarks/main.py renamed to src/lighteval/tasks/extended/tiny_benchmarks/main.py

Lines changed: 5 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -27,6 +27,7 @@
2727
Test with `python run_evals_accelerate.py --model_args "pretrained=EleutherAI/pythia-70m" --tasks "extended|tiny:winogrande|0|0,extended|tiny:gsm8k|0|0,extended|tiny:hellaswag|0|0,extended|tiny:arc|0|0,extended|tiny:truthfulqa|0|0" --extended_tasks extended_tasks --output_dir "./evals"`
2828
"""
2929
import os
30+
import pathlib
3031
import pickle
3132

3233
import numpy as np
@@ -40,7 +41,6 @@
4041
from lighteval.metrics.normalizations import gsm8k_normalizer
4142
from lighteval.metrics.utils import MetricCategory, MetricUseCase
4243
from lighteval.tasks.lighteval_task import LightevalTaskConfig
43-
from lighteval.tasks.requests import Doc
4444

4545

4646
# Utility functions
@@ -89,13 +89,15 @@ def __init__(self, task: str):
8989
self.num_samples = 0
9090

9191
def download(self):
92+
# Likely to crash in // processes if we don't include the pkl
93+
path_dld = os.path.join(pathlib.Path(__file__).parent.resolve(), "tinyBenchmarks.pkl")
9294
# Downloading files
93-
if not os.path.isfile("extended_tasks/tiny_benchmarks/tinyBenchmarks.pkl"):
95+
if not os.path.isfile(path_dld):
9496
url = "https://raw.githubusercontent.com/felipemaiapolo/tinyBenchmarks/main/tinyBenchmarks/tinyBenchmarks.pkl"
9597
response = requests.get(url)
9698
if response.status_code == 200:
9799
# Write the content to a file
98-
with open("extended_tasks/tiny_benchmarks/tinyBenchmarks.pkl", "wb") as file:
100+
with open(path_dld, "wb") as file:
99101
file.write(response.content)
100102

101103
def compute(self, **args):

src/lighteval/tasks/registry.py

Lines changed: 9 additions & 19 deletions
Original file line numberDiff line numberDiff line change
@@ -32,7 +32,9 @@
3232
from datasets.load import dataset_module_factory
3333

3434
from lighteval.logging.hierarchical_logger import hlog, hlog_warn
35+
from lighteval.tasks.extended import AVAILABLE_EXTENDED_TASKS_MODULES
3536
from lighteval.tasks.lighteval_task import LightevalTask, LightevalTaskConfig
37+
from lighteval.utils import CANNOT_USE_EXTENDED_TASKS_MSG, can_load_extended_tasks
3638

3739

3840
# Helm, Bigbench, Harness are implementations following an evaluation suite setup
@@ -108,10 +110,7 @@ def get_task_class(
108110
)
109111

110112
def get_task_dict(
111-
self,
112-
task_name_list: List[str],
113-
custom_tasks: Optional[Union[str, ModuleType]] = None,
114-
extended_tasks: str = None,
113+
self, task_name_list: List[str], custom_tasks: Optional[Union[str, ModuleType]] = None
115114
) -> Dict[str, LightevalTask]:
116115
"""
117116
Get a dictionary of tasks based on the task name list.
@@ -134,11 +133,12 @@ def get_task_dict(
134133
TASKS_TABLE = []
135134
if custom_tasks is not None:
136135
custom_tasks_module.append(create_custom_tasks_module(custom_tasks=custom_tasks))
137-
if extended_tasks is not None:
138-
hlog_warn(
139-
"You are using extended_tasks. Make sure you installed their dependencies using `pip install -e .[extended_tasks]`."
140-
)
141-
custom_tasks_module.extend(load_extended_tasks_modules(extended_tasks_path=extended_tasks))
136+
if can_load_extended_tasks():
137+
for extended_task_module in AVAILABLE_EXTENDED_TASKS_MODULES:
138+
custom_tasks_module.append(extended_task_module)
139+
else:
140+
hlog_warn(CANNOT_USE_EXTENDED_TASKS_MSG)
141+
142142
for module in custom_tasks_module:
143143
TASKS_TABLE.extend(module.TASKS_TABLE)
144144

@@ -155,16 +155,6 @@ def get_task_dict(
155155
return tasks_dict
156156

157157

158-
def load_extended_tasks_modules(extended_tasks_path: str):
159-
all_modules = []
160-
for folder in os.listdir(extended_tasks_path):
161-
cur_module = create_custom_tasks_module(os.path.join(extended_tasks_path, folder, "main.py"))
162-
hlog(f"Successfully loaded extended task: {folder}.")
163-
all_modules.append(cur_module)
164-
165-
return all_modules
166-
167-
168158
def create_custom_tasks_module(custom_tasks: Union[str, ModuleType]) -> ModuleType:
169159
"""Creates a custom task module to load tasks defined by the user in their own file.
170160

src/lighteval/utils.py

Lines changed: 11 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -189,3 +189,14 @@ def is_peft_available() -> bool:
189189

190190

191191
NO_PEFT_ERROR_MSG = "You are trying to use adapter weights models, for which you need `peft`, which is not available in your environment. Please install it using pip."
192+
193+
194+
def can_load_extended_tasks() -> bool:
195+
imports = []
196+
for package in ["langdetect"]:
197+
imports.append(importlib.util.find_spec(package))
198+
199+
return all(cur_import is not None for cur_import in imports)
200+
201+
202+
CANNOT_USE_EXTENDED_TASKS_MSG = "If you want to use extended_tasks, make sure you installed their dependencies using `pip install -e .[extended_tasks]`."

0 commit comments

Comments
 (0)