Skip to content

Commit ab476c7

Browse files
Eagle Speculative Sampling examples (#11104)
* Eagle Speculative Sampling examples * rm multi-gpu and ray content * updated README to include Arc A770
1 parent fabc395 commit ab476c7

File tree

10 files changed

+1396
-0
lines changed

10 files changed

+1396
-0
lines changed
Lines changed: 49 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,49 @@
1+
# Eagle - Speculative Sampling using IPEX-LLM on Intel CPUs
2+
In this directory, you will find the examples on how IPEX-LLM accelerate inference with speculative sampling using EAGLE (Extrapolation Algorithm for Greater Language-model Efficiency), a speculative sampling method that improves text generation speed) on Intel CPUs. See [here](https://arxiv.org/abs/2401.15077) to view the paper and [here](https://github.com/SafeAILab/EAGLE) for more info on EAGLE code.
3+
4+
## Requirements
5+
To run these examples with IPEX-LLM, we have some recommended requirements for your machine, please refer to [here](../README.md#recommended-requirements) for more information.
6+
7+
## Example - EAGLE Speculative Sampling with IPEX-LLM on MT-bench
8+
In this example, we run inference for a Llama2 model to showcase the speed of EAGLE with IPEX-LLM on MT-bench data on Intel CPUs.
9+
10+
### 1. Install
11+
We suggest using conda to manage the Python environment. For more information about conda installation, please refer to [here](https://docs.conda.io/en/latest/miniconda.html#).
12+
13+
After installing conda, create a Python environment for IPEX-LLM:
14+
```bash
15+
conda create -n llm python=3.11 # recommend to use Python 3.11
16+
conda activate llm
17+
18+
pip install --pre --upgrade ipex-llm[all] --extra-index-url https://download.pytorch.org/whl/cpu
19+
pip install intel_extension_for_pytorch==2.1.0
20+
pip install -r requirements.txt
21+
pip install eagle-llm
22+
```
23+
24+
### 2. Configures IPEX-LLM environment variables for Linux
25+
26+
> [!NOTE]
27+
> Skip this step if you are running on Windows.
28+
```bash
29+
# set IPEX-LLM env variables
30+
source ipex-llm-init
31+
32+
```
33+
### 3. Running Example
34+
You can test the speed of EAGLE speculative sampling with ipex-llm on MT-bench using the following command.
35+
```bash
36+
python -m evaluation.gen_ea_answer_llama2chat\
37+
--ea-model-path [path of EAGLE weight]\
38+
--base-model-path [path of the original model]\
39+
--enable-ipex-llm\
40+
```
41+
Please refer to [here](https://github.com/SafeAILab/EAGLE#eagle-weights) for the complete list of available EAGLE weights.
42+
43+
The above command will generate a .jsonl file that records the generation results and wall time. Then, you can use evaluation/speed.py to calculate the speed.
44+
```bash
45+
python -m evaluation.speed\
46+
--base-model-path [path of the original model]\
47+
--jsonl-file [pathname of the .jsonl file]\
48+
```
49+

python/llm/example/CPU/Speculative-Decoding/Eagle/data/mt_bench/question.jsonl

Lines changed: 80 additions & 0 deletions
Large diffs are not rendered by default.

0 commit comments

Comments
 (0)