|
| 1 | +# Eagle - Speculative Sampling using IPEX-LLM on Intel CPUs |
| 2 | +In this directory, you will find the examples on how IPEX-LLM accelerate inference with speculative sampling using EAGLE (Extrapolation Algorithm for Greater Language-model Efficiency), a speculative sampling method that improves text generation speed) on Intel CPUs. See [here](https://arxiv.org/abs/2401.15077) to view the paper and [here](https://github.com/SafeAILab/EAGLE) for more info on EAGLE code. |
| 3 | + |
| 4 | +## Requirements |
| 5 | +To run these examples with IPEX-LLM, we have some recommended requirements for your machine, please refer to [here](../README.md#recommended-requirements) for more information. |
| 6 | + |
| 7 | +## Example - EAGLE Speculative Sampling with IPEX-LLM on MT-bench |
| 8 | +In this example, we run inference for a Llama2 model to showcase the speed of EAGLE with IPEX-LLM on MT-bench data on Intel CPUs. |
| 9 | + |
| 10 | +### 1. Install |
| 11 | +We suggest using conda to manage the Python environment. For more information about conda installation, please refer to [here](https://docs.conda.io/en/latest/miniconda.html#). |
| 12 | + |
| 13 | +After installing conda, create a Python environment for IPEX-LLM: |
| 14 | +```bash |
| 15 | +conda create -n llm python=3.11 # recommend to use Python 3.11 |
| 16 | +conda activate llm |
| 17 | + |
| 18 | +pip install --pre --upgrade ipex-llm[all] --extra-index-url https://download.pytorch.org/whl/cpu |
| 19 | +pip install intel_extension_for_pytorch==2.1.0 |
| 20 | +pip install -r requirements.txt |
| 21 | +pip install eagle-llm |
| 22 | +``` |
| 23 | + |
| 24 | +### 2. Configures IPEX-LLM environment variables for Linux |
| 25 | + |
| 26 | +> [!NOTE] |
| 27 | +> Skip this step if you are running on Windows. |
| 28 | +```bash |
| 29 | +# set IPEX-LLM env variables |
| 30 | +source ipex-llm-init |
| 31 | + |
| 32 | +``` |
| 33 | +### 3. Running Example |
| 34 | +You can test the speed of EAGLE speculative sampling with ipex-llm on MT-bench using the following command. |
| 35 | +```bash |
| 36 | +python -m evaluation.gen_ea_answer_llama2chat\ |
| 37 | + --ea-model-path [path of EAGLE weight]\ |
| 38 | + --base-model-path [path of the original model]\ |
| 39 | + --enable-ipex-llm\ |
| 40 | +``` |
| 41 | +Please refer to [here](https://github.com/SafeAILab/EAGLE#eagle-weights) for the complete list of available EAGLE weights. |
| 42 | + |
| 43 | +The above command will generate a .jsonl file that records the generation results and wall time. Then, you can use evaluation/speed.py to calculate the speed. |
| 44 | +```bash |
| 45 | +python -m evaluation.speed\ |
| 46 | + --base-model-path [path of the original model]\ |
| 47 | + --jsonl-file [pathname of the .jsonl file]\ |
| 48 | +``` |
| 49 | + |
0 commit comments