-
Notifications
You must be signed in to change notification settings - Fork 1.3k
[BUG] Run 8-bit and16-bit gemma3-4b #13099
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Hi @bibekyess, you may install our latest v0.6.2 ipex-llm ollama in https://github.com/ipex-llm/ipex-llm/releases/tag/v2.3.0-nightly, which could support |
Hi @sgwhat I made the Ollama's source code support OneApi by making changes. However, the inference speed is slower than the portable version released here. For example, on the same device and model, mine is 30 tokens/s, while v2.3.0-nightly or v2.2.0 can achieve 48 tokens/s. What did I do wrong? my code in https://github.com/chnxq/ollama/tree/chnxq/add-oneapi |
Hi @sgwhat!
While for
Thank you! |
Hi @bibekyess , I have fixed the |
Hi @sgwhat
Thank you |
Hello!
I am experimenting with gemma-3-4b and noticed that 4bit works smoothly:.
In the same model repository, there is 8bit version also, so when I execute the following commands. It gives error.
I want to run the 16bit. Is it possible at the current state?
IpexLLM-Ollama Version
ollama version is
0.5.4-ipexllm-20250318
. I tried with the latest shared bare metal executable (tag-v2.2.0) also, it gives the same issue.Device Details
Device Name: LG Gram Pro Laptop (MFD 2024/05)
Operating System: Windows 11
Processor: Intel(R) Core(TM) Ultra 7 155H @ 3.80 GHz
RAM: 32.0 GB (31.5 GB usable)
Graphics and NPU VRAM: 16.0 GB usable (Intel(R) Arc(™) Graphics and Intel(R) AI Boost)
System Type: 64-bit operating system, x64-based processor
GPU driver version: 32.0.101.6734
Long error message with 8bit gemma3-4b is attached below:
The text was updated successfully, but these errors were encountered: