-
Notifications
You must be signed in to change notification settings - Fork 1.4k
EAGLE support and examples #10925
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
EAGLE support and examples #10925
Conversation
I suggest you to put examples under ipex-llm/python/llm/example/CPU/PyTorch-Models and ipex-llm/python/llm/example/GPU/PyTorch-Models |
python/llm/example/CPU/EAGLE/evaluation/gen_ea_answer_llama2chat.py
Outdated
Show resolved
Hide resolved
python/llm/example/GPU/EAGLE/evaluation/gen_baseline_answer_llama2chat.py
Outdated
Show resolved
Hide resolved
python/llm/example/GPU/EAGLE/evaluation/gen_baseline_answer_llama2chat.py
Outdated
Show resolved
Hide resolved
python/llm/example/GPU/EAGLE/evaluation/gen_baseline_answer_llama2chat.py
Outdated
Show resolved
Hide resolved
@jason-dai For the Eagle example placement, which directory is best to put to? like ipex-llm/python/llm/example/CPU/PyTorch-Models/EAGLE or ipex-llm/python/llm/example/CPU/Speculative-Decoding/EAGLE? If we put eagle example to ipex-llm/python/llm/example/CPU/Speculative-Decoding, we may need to create a new directory, like ipex-llm/python/llm/example/CPU/Speculative-Decoding/ipex-llm, and move our current speculative examples to that directory. |
Maybe |
Description
ipex-llm and EAGLE integration plus EAGLE example scripts
1. Why the change?
EAGLE provides significant speedup in addition to ipex-llm optimizations. Please see below:
Llama 7B, temporature=1.0, Intel CPU
speed 27.445331381249126 TPS (optimized with both EAGLE and ipex-llm)
speed 20.132597255230788 TPS (optimized with EAGLE only)
speed 14.549053180428723 TPS (optimized with ipex-llm only)
speed_base 10.275284471199816 TPS (Baseline: not optimized)
Llama 7B, temporature=1.0, Intel GPU
speed 60.68802901159256 TPS eagle + ipex-llm (ratio: 3.74)
speed 41.41260508527679 TPS ipex-llm only (ratio: 2.55)
speed 31.480931699222744 TPS eagle only (ratio: 1.94)
speed_base 16.220403337894584 TPS (Baseline: not optimized)
2. User API changes
N/A
3. Summary of the change
Integrate with EAGLE (https://github.com/SafeAILab/EAGLE) and provide examples
4. How to test?
Please follow the setup instructions and example commands in the README.