Skip to content

Support for gemma3 from google #12963

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
bigtan opened this issue Mar 12, 2025 · 47 comments
Open

Support for gemma3 from google #12963

bigtan opened this issue Mar 12, 2025 · 47 comments

Comments

@bigtan
Copy link

bigtan commented Mar 12, 2025

请更新ollama,已支持gemma3

Error: llama runner process has terminated: this model is not supported by your version of Ollama. You may need to upgrade

@puffer-duck
Copy link

Need to upgrade to Ollama v0.6, this should add support for gemma3

@bigtan
Copy link
Author

bigtan commented Mar 12, 2025

@puffer-duck But v0.6 not support intel card accelerate, am i right?

@ForstJean
Copy link

"error": {
"message": "llama runner process has terminated: this model is not supported by your version of Ollama. You may need to upgrade"

@tombii
Copy link

tombii commented Mar 12, 2025

Yes we need this. Can Intel publish patches to Ollama so that we can compile ourselves? Or setup automatic nightly builds that follow the latest version of Ollama.
Just got my A770 but can still return it, it seems LLM/Ollama is not straight forward with Intel still :(

@digitalextremist
Copy link

digitalextremist commented Mar 12, 2025

Note this was brought up here: #12950

That is generally about the version disparity versus just gemma3 ( but now mentioning that model as another reason )

@yizhangliu
Copy link

Please!

@cyita
Copy link
Contributor

cyita commented Mar 14, 2025

Hi All,

Gemma3 is now supported in ipex-llm llamacpp! (Ollama support is in progress—we'll provide updates once it's ready.)

Important Notes:

The 27B Gemma3 q4_k_m model requires >16GB VMem.

  • In text mode, you can use -c 128 to optimize memory usage.
  • In vision mode (llama-gemma3-cli), you may need two Arc GPUs or a GPU with larger memory.

Get Started:

Please follow the following steps to try it out:

1. Download the latest ipex-llm llamacpp portable zip:

https://github.com/intel/ipex-llm/blob/main/docs/mddocs/Quickstart/llamacpp_portable_zip_gpu_quickstart.md

2. Get mmproj.gguf & gemma3 gguf model files

Please download the pre-quantized version from HF: https://huggingface.co/collections/ggml-org/gemma-3-67d126315ac810df1ad9e913

(You must download both the text model and the mmproj file)

Note: Vision capability is available on these model sizes: 4b, 12b and 27b

3. Run gemma3

3.1 Linux

ngl=99
thread=8

3.1.1 Text only

./llama-cli -m $model_path --no-context-shift -n 128 --prompt "It is done, and submitted. You can play 'Survival of the Tastiest' on Android, and on the web. Playing on the web works, but you have to simulate multiple touch for table moving and that can be a bit confusing. There is a lot I'd like to talk about. I will go through every topic, insted of making the typical what went right/wrong list. Concept Working over the theme was probably one of the hardest tasks which I had to face. Originally, I had an idea of what kind of game I wanted to develop, gameplay wise - something with a lot of enemies/actors, simple graphics, maybe set in space, controlled from a top-down view. I was confident that I could fit any theme around it. In the end, the problem with a theme like 'Evolution' in a game is that evolution is unassisted. It happens through several seemingly random mutations over time, with the most apt permutation surviving. This genetic car simulator is, in my opinion, a great example of actual evolution of a species facing a challenge. But is it a game? In a game, you need to control something to reach an objective. That control goes against what evolution is supposed to be like. If you allow the user to pick how to evolve something, it's not evolution anymore - it's the equivalent of intelligent design, the fable invented by creationists to combat the idea of evolution. Being agnostic and a Pastafarian, that's not something that rubbed me the right way. Hence, my biggest dillema when deciding what to create was not with what I wanted to create, but with what I did not. I didn't want to create an 'intelligent design' simulator and wrongly call it evolution. This is a problem, of course, every other contestant also had to face. And judging by the entries submitted, not many managed to work around it. I'd say the only real solution was through the use of artificial selection, somehow. So far, I haven't seen any entry using this at its core gameplay. Alas, this is just a fun competition and after a while I decided not to be as strict with the game idea, and allowed myself to pick whatever I thought would work out. My initial idea was to create something where humanity tried to evolve to a next level, but had some kind of foe trying to stop them from doing so. I kind of had this image of human souls flying in space towards a monolith or a space baby (all based in 2001: A Space Odyssey of course) but I couldn't think of compelling (read: serious) mechanics for that. Borgs were my next inspiration, as their whole hypothesis fit pretty well into the evolution theme. But how to make it work? Are you the borg, or fighting the Borg? The third and final idea came to me through my girlfriend, who somehow gave me the idea of making something about the evolution of Pasta. The more I thought about it the more it sounded like it would work, so I decided to go with it. Conversations with my inspiring co-worker Roushey (who also created the 'Mechanical Underdogs' signature logo for my intros) further matured the concept, as it involved into the idea of having individual pieces of pasta flying around and trying to evolve until they became all-powerful. A secondary idea here was that the game would work to explain how the Flying Spaghetti Monster came to exist - by evolving from a normal dinner table. So the idea evolved more or less into this: you are sitting a table. You have your own plate, with is your 'base'. There are 5 other guests at the table, each with their own plate. Your plate can spawn little pieces of pasta. You do so by 'ordering' them through a menu. Some pastas are better than others; some are faster, some are stronger. They have varying 'costs', which are debited from your credits (you start with a number of credits). Once spawned, your pastas start flying around. Their instinct is to fly to other plates, in order to conquer them (the objective of the game is having your pasta conquer all the plates on the table). But they are really autonomous, so after being spawned, you have no control over your pasta (think DotA or LoL creeps). Your pasta doesn't like other people's pasta, so if they meet, they shoot sauce at each other until one dies. You get credits for other pastas your own pasta kill. Once a pasta is in vicinity of a plate" -t $thread -e -ngl $ngl --color -c 2048 --temp 0

3.1.2 Single turn (Vision)

./llama-gemma3-cli -m $model_path --mmproj $mmproj_path -ngl $ngl -t $thread -p "What is in this image?" --image $image_path

3.2.2 Chat mode (Vision)

./llama-gemma3-cli -m $model_path --mmproj $mmproj_path -ngl $ngl -t $thread

3.2 WIN

3.2.1 Text only

llama-cli.exe -m %MODEL_PATH% -n 128 --prompt "It is done, and submitted. You can play 'Survival of the Tastiest' on Android, and on the web. Playing on the web works, but you have to simulate multiple touch for table moving and that can be a bit confusing. There is a lot I'd like to talk about. I will go through every topic, insted of making the typical what went right/wrong list. Concept Working over the theme was probably one of the hardest tasks which I had to face. Originally, I had an idea of what kind of game I wanted to develop, gameplay wise - something with a lot of enemies/actors, simple graphics, maybe set in space, controlled from a top-down view. I was confident that I could fit any theme around it. In the end, the problem with a theme like 'Evolution' in a game is that evolution is unassisted. It happens through several seemingly random mutations over time, with the most apt permutation surviving. This genetic car simulator is, in my opinion, a great example of actual evolution of a species facing a challenge. But is it a game? In a game, you need to control something to reach an objective. That control goes against what evolution is supposed to be like. If you allow the user to pick how to evolve something, it's not evolution anymore - it's the equivalent of intelligent design, the fable invented by creationists to combat the idea of evolution. Being agnostic and a Pastafarian, that's not something that rubbed me the right way. Hence, my biggest dillema when deciding what to create was not with what I wanted to create, but with what I did not. I didn't want to create an 'intelligent design' simulator and wrongly call it evolution. This is a problem, of course, every other contestant also had to face. And judging by the entries submitted, not many managed to work around it. I'd say the only real solution was through the use of artificial selection, somehow. So far, I haven't seen any entry using this at its core gameplay. Alas, this is just a fun competition and after a while I decided not to be as strict with the game idea, and allowed myself to pick whatever I thought would work out. My initial idea was to create something where humanity tried to evolve to a next level, but had some kind of foe trying to stop them from doing so. I kind of had this image of human souls flying in space towards a monolith or a space baby (all based in 2001: A Space Odyssey of course) but I couldn't think of compelling (read: serious) mechanics for that. Borgs were my next inspiration, as their whole hypothesis fit pretty well into the evolution theme. But how to make it work? Are you the borg, or fighting the Borg? The third and final idea came to me through my girlfriend, who somehow gave me the idea of making something about the evolution of Pasta. The more I thought about it the more it sounded like it would work, so I decided to go with it. Conversations with my inspiring co-worker Roushey (who also created the 'Mechanical Underdogs' signature logo for my intros) further matured the concept, as it involved into the idea of having individual pieces of pasta flying around and trying to evolve until they became all-powerful. A secondary idea here was that the game would work to explain how the Flying Spaghetti Monster came to exist - by evolving from a normal dinner table. So the idea evolved more or less into this: you are sitting a table. You have your own plate, with is your 'base'. There are 5 other guests at the table, each with their own plate. Your plate can spawn little pieces of pasta. You do so by 'ordering' them through a menu. Some pastas are better than others; some are faster, some are stronger. They have varying 'costs', which are debited from your credits (you start with a number of credits). Once spawned, your pastas start flying around. Their instinct is to fly to other plates, in order to conquer them (the objective of the game is having your pasta conquer all the plates on the table). But they are really autonomous, so after being spawned, you have no control over your pasta (think DotA or LoL creeps). Your pasta doesn't like other people's pasta, so if they meet, they shoot sauce at each other until one dies. You get credits for other pastas your own pasta kill. Once a pasta is in vicinity of a plate" -t 8 -e -ngl 99 --color --ctx-size 1200 --no-mmap

3.2.2 Single turn (Vision)

llama-gemma3-cli.exe -m %MODEL_PATH% --mmproj %MMPROJ_PATH% -ngl 99 -t 8 -p "What is in this image?" --image %IMAGE_PATH%

3.2.3 Chat mode (Vision)

llama-gemma3-cli.exe -m %MODEL_PATH% --mmproj %MMPROJ_PATH% -ngl 99 -t 8 

@ForstJean
Copy link

Hi All,

Gemma3 is now supported in ipex-llm llamacpp! (Ollama support is in progress—we'll provide updates once it's ready.)

Important Notes:

The 27B Gemma3 q4_k_m model requires >16GB VMem.

  • In text mode, you can use -c 128 to optimize memory usage.
  • In vision mode (llama-gemma3-cli), you may need two Arc GPUs or a GPU with larger memory.

Get Started:

Please follow the following steps to try it out:

1. Download the latest ipex-llm llamacpp portable zip:

https://github.com/intel/ipex-llm/blob/main/docs/mddocs/Quickstart/llamacpp_portable_zip_gpu_quickstart.md

2. Get mmproj.gguf & gemma3 gguf model files

Please download the pre-quantized version from HF: https://huggingface.co/collections/ggml-org/gemma-3-67d126315ac810df1ad9e913

(You must download both the text model and the mmproj file)

Note: Vision capability is available on these model sizes: 4b, 12b and 27b

3. Run gemma3

3.1 Linux

ngl=99 thread=8

3.1.1 Text only

./llama-cli -m $model_path --no-context-shift -n 128 --prompt "It is done, and submitted. You can play 'Survival of the Tastiest' on Android, and on the web. Playing on the web works, but you have to simulate multiple touch for table moving and that can be a bit confusing. There is a lot I'd like to talk about. I will go through every topic, insted of making the typical what went right/wrong list. Concept Working over the theme was probably one of the hardest tasks which I had to face. Originally, I had an idea of what kind of game I wanted to develop, gameplay wise - something with a lot of enemies/actors, simple graphics, maybe set in space, controlled from a top-down view. I was confident that I could fit any theme around it. In the end, the problem with a theme like 'Evolution' in a game is that evolution is unassisted. It happens through several seemingly random mutations over time, with the most apt permutation surviving. This genetic car simulator is, in my opinion, a great example of actual evolution of a species facing a challenge. But is it a game? In a game, you need to control something to reach an objective. That control goes against what evolution is supposed to be like. If you allow the user to pick how to evolve something, it's not evolution anymore - it's the equivalent of intelligent design, the fable invented by creationists to combat the idea of evolution. Being agnostic and a Pastafarian, that's not something that rubbed me the right way. Hence, my biggest dillema when deciding what to create was not with what I wanted to create, but with what I did not. I didn't want to create an 'intelligent design' simulator and wrongly call it evolution. This is a problem, of course, every other contestant also had to face. And judging by the entries submitted, not many managed to work around it. I'd say the only real solution was through the use of artificial selection, somehow. So far, I haven't seen any entry using this at its core gameplay. Alas, this is just a fun competition and after a while I decided not to be as strict with the game idea, and allowed myself to pick whatever I thought would work out. My initial idea was to create something where humanity tried to evolve to a next level, but had some kind of foe trying to stop them from doing so. I kind of had this image of human souls flying in space towards a monolith or a space baby (all based in 2001: A Space Odyssey of course) but I couldn't think of compelling (read: serious) mechanics for that. Borgs were my next inspiration, as their whole hypothesis fit pretty well into the evolution theme. But how to make it work? Are you the borg, or fighting the Borg? The third and final idea came to me through my girlfriend, who somehow gave me the idea of making something about the evolution of Pasta. The more I thought about it the more it sounded like it would work, so I decided to go with it. Conversations with my inspiring co-worker Roushey (who also created the 'Mechanical Underdogs' signature logo for my intros) further matured the concept, as it involved into the idea of having individual pieces of pasta flying around and trying to evolve until they became all-powerful. A secondary idea here was that the game would work to explain how the Flying Spaghetti Monster came to exist - by evolving from a normal dinner table. So the idea evolved more or less into this: you are sitting a table. You have your own plate, with is your 'base'. There are 5 other guests at the table, each with their own plate. Your plate can spawn little pieces of pasta. You do so by 'ordering' them through a menu. Some pastas are better than others; some are faster, some are stronger. They have varying 'costs', which are debited from your credits (you start with a number of credits). Once spawned, your pastas start flying around. Their instinct is to fly to other plates, in order to conquer them (the objective of the game is having your pasta conquer all the plates on the table). But they are really autonomous, so after being spawned, you have no control over your pasta (think DotA or LoL creeps). Your pasta doesn't like other people's pasta, so if they meet, they shoot sauce at each other until one dies. You get credits for other pastas your own pasta kill. Once a pasta is in vicinity of a plate" -t $thread -e -ngl $ngl --color -c 2048 --temp 0

3.1.2 Single turn (Vision)

./llama-gemma3-cli -m $model_path --mmproj $mmproj_path -ngl $ngl -t $thread -p "What is in this image?" --image $image_path

3.2.2 Chat mode (Vision)

./llama-gemma3-cli -m $model_path --mmproj $mmproj_path -ngl $ngl -t $thread

3.2 WIN

3.2.1 Text only

llama-cli.exe -m %MODEL_PATH% -n 128 --prompt "It is done, and submitted. You can play 'Survival of the Tastiest' on Android, and on the web. Playing on the web works, but you have to simulate multiple touch for table moving and that can be a bit confusing. There is a lot I'd like to talk about. I will go through every topic, insted of making the typical what went right/wrong list. Concept Working over the theme was probably one of the hardest tasks which I had to face. Originally, I had an idea of what kind of game I wanted to develop, gameplay wise - something with a lot of enemies/actors, simple graphics, maybe set in space, controlled from a top-down view. I was confident that I could fit any theme around it. In the end, the problem with a theme like 'Evolution' in a game is that evolution is unassisted. It happens through several seemingly random mutations over time, with the most apt permutation surviving. This genetic car simulator is, in my opinion, a great example of actual evolution of a species facing a challenge. But is it a game? In a game, you need to control something to reach an objective. That control goes against what evolution is supposed to be like. If you allow the user to pick how to evolve something, it's not evolution anymore - it's the equivalent of intelligent design, the fable invented by creationists to combat the idea of evolution. Being agnostic and a Pastafarian, that's not something that rubbed me the right way. Hence, my biggest dillema when deciding what to create was not with what I wanted to create, but with what I did not. I didn't want to create an 'intelligent design' simulator and wrongly call it evolution. This is a problem, of course, every other contestant also had to face. And judging by the entries submitted, not many managed to work around it. I'd say the only real solution was through the use of artificial selection, somehow. So far, I haven't seen any entry using this at its core gameplay. Alas, this is just a fun competition and after a while I decided not to be as strict with the game idea, and allowed myself to pick whatever I thought would work out. My initial idea was to create something where humanity tried to evolve to a next level, but had some kind of foe trying to stop them from doing so. I kind of had this image of human souls flying in space towards a monolith or a space baby (all based in 2001: A Space Odyssey of course) but I couldn't think of compelling (read: serious) mechanics for that. Borgs were my next inspiration, as their whole hypothesis fit pretty well into the evolution theme. But how to make it work? Are you the borg, or fighting the Borg? The third and final idea came to me through my girlfriend, who somehow gave me the idea of making something about the evolution of Pasta. The more I thought about it the more it sounded like it would work, so I decided to go with it. Conversations with my inspiring co-worker Roushey (who also created the 'Mechanical Underdogs' signature logo for my intros) further matured the concept, as it involved into the idea of having individual pieces of pasta flying around and trying to evolve until they became all-powerful. A secondary idea here was that the game would work to explain how the Flying Spaghetti Monster came to exist - by evolving from a normal dinner table. So the idea evolved more or less into this: you are sitting a table. You have your own plate, with is your 'base'. There are 5 other guests at the table, each with their own plate. Your plate can spawn little pieces of pasta. You do so by 'ordering' them through a menu. Some pastas are better than others; some are faster, some are stronger. They have varying 'costs', which are debited from your credits (you start with a number of credits). Once spawned, your pastas start flying around. Their instinct is to fly to other plates, in order to conquer them (the objective of the game is having your pasta conquer all the plates on the table). But they are really autonomous, so after being spawned, you have no control over your pasta (think DotA or LoL creeps). Your pasta doesn't like other people's pasta, so if they meet, they shoot sauce at each other until one dies. You get credits for other pastas your own pasta kill. Once a pasta is in vicinity of a plate" -t 8 -e -ngl 99 --color --ctx-size 1200 --no-mmap

3.2.2 Single turn (Vision)

llama-gemma3-cli.exe -m %MODEL_PATH% --mmproj %MMPROJ_PATH% -ngl 99 -t 8 -p "What is in this image?" --image %IMAGE_PATH%

3.2.3 Chat mode (Vision)

llama-gemma3-cli.exe -m %MODEL_PATH% --mmproj %MMPROJ_PATH% -ngl 99 -t 8 

Thank you for your efforts. I see that the portable zip of ollama in the pre-release has been updated to 20250313, but it seems that the ollama portable zip still cannot run gemma3 properly.

@heylobc
Copy link

heylobc commented Mar 17, 2025

will it be possible to run on server or python? many thanks

@cyita
Copy link
Contributor

cyita commented Mar 17, 2025

will it be possible to run on server or python? many thanks

We will release the Ollama portable zip with gemma3 support soon.

@sgwhat
Copy link
Contributor

sgwhat commented Mar 18, 2025

Hi All, you may install our latest version of ipex-llm ollama via pip install --pre --upgrae ipex-llm[cpp] to run gemma3 as below:

  1. Run Ollama with GGUF Model on ModelScope

    ollama run modelscope.cn/lmstudio-community/gemma-3-4b-it-GGUF:Q4_K_M
    
  2. Run Ollama with GGUF Model on HuggingFace Hub

  • You may pull a GGUF model from HuggingFace Hub and then create an Ollama model with Modelfile.
    # In the Modelfile
    FROM /path/to/gemma-3-4b-it-Q4_K_M.gguf
    
    # Create an Ollama model
    ollama create gemma3-gguf -f Modelfile
    

You may see ipex-llm ollama quickstart for more details.

@yizhangliu
Copy link

@sgwhat No Ollama Portable Zip ?

@wjr1985
Copy link

wjr1985 commented Mar 18, 2025

I've tried the official gemma3 models in 4b and 12b, as well as the q4_K_M versions from ollama, and then also lmstudio-community/gemma-3-4b-it-GGUF:Q4_K_M from huggingface and none of them seem to work after setting up a new conda environment from the quickstart link and running pip install --pre --upgrade ipex-llm[cpp]. I've attached the logs from running ./ollama run https://huggingface.co/lmstudio-community/gemma-3-4b-it-GGUF:Q4_K_M. thank you!

ollama-gemma.txt

@ExplodingDragon
Copy link

emm ... Any new progress?

Image

@sgwhat
Copy link
Contributor

sgwhat commented Mar 19, 2025

@sgwhat No Ollama Portable Zip ?

@ExplodingDragon @yizhangliu Releasing. You may see #12963 (comment) to run it first.

@ExplodingDragon
Copy link

@ExplodingDragon @yizhangliu Releasing. You may see #12963 (comment) to run it first.

@sgwhat It looks good. Are there any plans to submit the Ollama patch to the upstream?

OneAPI already offers out-of-the-box support on certain systems like ArchLinux. Could you consider providing a statically linked Ollama or similar package?

@yizhangliu
Copy link

Thanks. But it's not easy to do "pip install --pre --upgrae ipex-llm[cpp]".

@cyita
Copy link
Contributor

cyita commented Mar 19, 2025

Hi All,

The Ollama portable zip is now available! Please follow the instructions link to download.

Note 1: For now, you need to either use ModelScope as the model source (see details here: link), or run a local GGUF model downloaded from HuggingFace (see details here: link) for Gemma3

Note 2: The text input support for Gemma3 is ready, while the image input support is still WIP for Ollama

@cunkai
Copy link

cunkai commented Mar 19, 2025

After deployment, I asked a few questions about pictures, but the answers were incorrect.
I used LM-Studio for deployment and there was no problem with answering picture questions.

@sgwhat
Copy link
Contributor

sgwhat commented Mar 19, 2025

After deployment, I asked a few questions about pictures, but the answers were incorrect. I used LM-Studio for deployment and there was no problem with answering picture questions.

Hi @cunkai, currently ipex-llm ollama Gemma3 does not have good support for the image part; we have only fully supported the text part. We will add full support in a future ipex-llm ollama 0.6.x release.

@jason-dai
Copy link
Contributor

I've tried the official gemma3 models in 4b and 12b, as well as the q4_K_M versions from ollama, and then also lmstudio-community/gemma-3-4b-it-GGUF:Q4_K_M from huggingface and none of them seem to work after setting up a new conda environment from the quickstart link and running pip install --pre --upgrade ipex-llm[cpp]. I've attached the logs from running ./ollama run https://huggingface.co/lmstudio-community/gemma-3-4b-it-GGUF:Q4_K_M. thank you!

ollama-gemma.txt

For now, you may also run a local GGUF model downloaded from HF; see #12963 (comment)

@tombii
Copy link

tombii commented Mar 19, 2025

Edit: It works with a model file from HF.

Unfortunately I can't get it to work:
tom@computer: /usr/share/ollama $ export IPEX_LLM_MODEL_SOURCE=modelscope
tom@computer: /usr/share/ollama $ ollama run gemma3:12b
tom@computer: /usr/share/ollama $ ollama run modelscope.cn/lmstudio-community/gemma-3-12b-it-GGUF:Q4_K_M --verbose
ggml_sycl_init: found 1 SYCL devices:
Error: llama runner process has terminated: exit status 2
tom@computer: /usr/share/ollama $ ollama --version
ggml_sycl_init: found 1 SYCL devices:
ollama version is 0.5.4-ipexllm-20250318
tom@computer: /usr/share/ollama $ journalctl -u ollama.service

Mar 19 10:09:41 proxmox1 ollama[2002914]: llm_load_print_meta: format           = GGUF V3 (latest)
Mar 19 10:09:41 proxmox1 ollama[2002914]: llm_load_print_meta: arch             = gemma3
Mar 19 10:09:41 proxmox1 ollama[2002914]: llm_load_print_meta: vocab type       = SPM
[...]
Mar 19 10:09:41 proxmox1 ollama[2002914]: llm_load_print_meta: model type       = 12B
Mar 19 10:09:41 proxmox1 ollama[2002914]: llm_load_print_meta: model ftype      = Q4_K - Medium
Mar 19 10:09:41 proxmox1 ollama[2002914]: llm_load_print_meta: model params     = 11.77 B
Mar 19 10:09:41 proxmox1 ollama[2002914]: llm_load_print_meta: model size       = 6.79 GiB (4.96 BPW)
Mar 19 10:09:41 proxmox1 ollama[2002914]: llm_load_print_meta: general.name     = Gemma 3 12b It
Mar 19 10:09:41 proxmox1 ollama[2002914]: llm_load_print_meta: BOS token        = 2 '<bos>'
Mar 19 10:09:41 proxmox1 ollama[2002914]: llm_load_print_meta: EOS token        = 1 '<eos>'
Mar 19 10:09:41 proxmox1 ollama[2002914]: llm_load_print_meta: EOT token        = 106 '<end_of_turn>'
Mar 19 10:09:41 proxmox1 ollama[2002914]: llm_load_print_meta: UNK token        = 3 '<unk>'
Mar 19 10:09:41 proxmox1 ollama[2002914]: llm_load_print_meta: PAD token        = 0 '<pad>'
Mar 19 10:09:41 proxmox1 ollama[2002914]: llm_load_print_meta: LF token         = 248 '<0x0A>'
Mar 19 10:09:41 proxmox1 ollama[2002914]: llm_load_print_meta: EOG token        = 1 '<eos>'
Mar 19 10:09:41 proxmox1 ollama[2002914]: llm_load_print_meta: EOG token        = 106 '<end_of_turn>'
Mar 19 10:09:41 proxmox1 ollama[2002914]: llm_load_print_meta: max token length = 48
Mar 19 10:09:41 proxmox1 ollama[2002914]: llm_load_tensors: offloading 48 repeating layers to GPU
Mar 19 10:09:41 proxmox1 ollama[2002914]: llm_load_tensors: offloading output layer to GPU
Mar 19 10:09:41 proxmox1 ollama[2002914]: llm_load_tensors: offloaded 49/49 layers to GPU
Mar 19 10:09:41 proxmox1 ollama[2002914]: llm_load_tensors:        SYCL0 model buffer size =  6956.18 MiB
Mar 19 10:09:41 proxmox1 ollama[2002914]: llm_load_tensors:          CPU model buffer size =   787.50 MiB
Mar 19 10:09:53 proxmox1 ollama[2002914]: llama_new_context_with_model: n_seq_max     = 1
Mar 19 10:09:53 proxmox1 ollama[2002914]: llama_new_context_with_model: n_ctx         = 16384
Mar 19 10:09:53 proxmox1 ollama[2002914]: llama_new_context_with_model: n_ctx_per_seq = 16384
Mar 19 10:09:53 proxmox1 ollama[2002914]: llama_new_context_with_model: n_batch       = 512
Mar 19 10:09:53 proxmox1 ollama[2002914]: llama_new_context_with_model: n_ubatch      = 512
Mar 19 10:09:53 proxmox1 ollama[2002914]: llama_new_context_with_model: flash_attn    = 0
Mar 19 10:09:53 proxmox1 ollama[2002914]: llama_new_context_with_model: freq_base     = 1000000.0
Mar 19 10:09:53 proxmox1 ollama[2002914]: llama_new_context_with_model: freq_scale    = 0.125
Mar 19 10:09:53 proxmox1 ollama[2002914]: llama_new_context_with_model: n_ctx_per_seq (16384) < n_ctx_train (131072) -- the full capacity of the model will >
Mar 19 10:09:53 proxmox1 ollama[2002914]: [SYCL] call ggml_check_sycl
Mar 19 10:09:53 proxmox1 ollama[2002914]: ggml_check_sycl: GGML_SYCL_DEBUG: 0
Mar 19 10:09:53 proxmox1 ollama[2002914]: ggml_check_sycl: GGML_SYCL_F16: no
Mar 19 10:09:53 proxmox1 ollama[2002914]: Found 1 SYCL devices:
Mar 19 10:09:53 proxmox1 ollama[2002914]: |  |                   |                                       |       |Max    |        |Max  |Global |           >
Mar 19 10:09:53 proxmox1 ollama[2002914]: |  |                   |                                       |       |compute|Max work|sub  |mem    |           >
Mar 19 10:09:53 proxmox1 ollama[2002914]: |ID|        Device Type|                                   Name|Version|units  |group   |group|size   |       Driv>
Mar 19 10:09:53 proxmox1 ollama[2002914]: |--|-------------------|---------------------------------------|-------|-------|--------|-----|-------|----------->
Mar 19 10:09:53 proxmox1 ollama[2002914]: | 0| [level_zero:gpu:0]|                Intel Arc A770 Graphics|  12.55|    512|    1024|   32| 16225M|         1.>
Mar 19 10:09:53 proxmox1 ollama[2002914]: llama_kv_cache_init:      SYCL0 KV buffer size =  6144.00 MiB
Mar 19 10:09:53 proxmox1 ollama[2002914]: llama_new_context_with_model: KV self size  = 6144.00 MiB, K (f16): 3072.00 MiB, V (f16): 3072.00 MiB
Mar 19 10:09:53 proxmox1 ollama[2002914]: llama_new_context_with_model:  SYCL_Host  output buffer size =     1.01 MiB
Mar 19 10:09:53 proxmox1 ollama[2002914]: llama_new_context_with_model:      SYCL0 compute buffer size =   671.00 MiB
Mar 19 10:09:53 proxmox1 ollama[2002914]: llama_new_context_with_model:  SYCL_Host compute buffer size =    71.51 MiB
Mar 19 10:09:53 proxmox1 ollama[2002914]: llama_new_context_with_model: graph nodes  = 1975
Mar 19 10:09:53 proxmox1 ollama[2002914]: llama_new_context_with_model: graph splits = 2
Mar 19 10:09:53 proxmox1 ollama[2002914]: key general.file_type not found in file
Mar 19 10:09:53 proxmox1 ollama[2002914]: terminate called after throwing an instance of 'std::runtime_error'
Mar 19 10:09:53 proxmox1 ollama[2002914]:   what():  Missing required key: general.file_type
Mar 19 10:09:53 proxmox1 ollama[2002914]: SIGABRT: abort
Mar 19 10:09:53 proxmox1 ollama[2002914]: PC=0x7e1fb84a9eec m=9 sigcode=18446744073709551610
Mar 19 10:09:53 proxmox1 ollama[2002914]: signal arrived during cgo execution

I'll try to download from HF instead.

@deepskyblue86
Copy link

Tried ollama-ipex-llm-2.2.0b20250318-ubuntu.tgz, I got:

❯ ./ollama run gemma3:4b
Error: llama runner process has terminated: error loading model: error loading model hyperparameters: key not found in model: gemma3.attention.layer_norm_rms_epsilon
llama_load_model_from_file: failed to load model

@jason-dai
Copy link
Contributor

Tried ollama-ipex-llm-2.2.0b20250318-ubuntu.tgz, I got:

❯ ./ollama run gemma3:4b
Error: llama runner process has terminated: error loading model: error loading model hyperparameters: key not found in model: gemma3.attention.layer_norm_rms_epsilon
llama_load_model_from_file: failed to load model

See #12963 (comment)

@savvadesogle
Copy link

savvadesogle commented Mar 19, 2025

while the image input support is still WIP

Yesterday I tried to run it directly through lama-сpp - and it worked (on 2-3x A770).

On the model gemma3 27B Q8 (modelscope.cn/lmstudio-community/gemma-3-27b-it-GGUF:Q8_0)

this command:

c:\llm\llama-cpp>llama-gemma3-cli -m C:\Users\uuk\.ollama\models\blobs\sha256-e1ef8587b2bdcbf4c2f888f3f618626dcee42096d0e38b63b26cbef4a1a56da8 --mmproj C:\llm\models\mmproj-model-f16.gguf -ngl 999 -t 8 -p "What is in this image?" --image C:\llm\models\1.jpg

Image

@jason-dai
Copy link
Contributor

while the image input support is still WIP

Yesterday I tried to run it directly through lama-сpp - and it worked (on 2-3x A770).

On the model gemma3 27V Q8 (modelscope.cn/lmstudio-community/gemma-3-27b-it-GGUF:Q8_0)

this command:

c:\llm\llama-cpp>llama-gemma3-cli -m C:\Users\uuk\.ollama\models\blobs\sha256-e1ef8587b2bdcbf4c2f888f3f618626dcee42096d0e38b63b26cbef4a1a56da8 --mmproj C:\llm\models\mmproj-model-f16.gguf -ngl 999 -t 8 -p "What is in this image?" --image C:\llm\models\1.jpg

Image

Yes, the support is llama.cpp is complete (see #12963 (comment)); the image support in Ollama is still in progress

@yizhangliu
Copy link

It's OK.
But, the output results are somewhat verbose.

@wjr1985
Copy link

wjr1985 commented Mar 19, 2025

Using the GGUF version and the instructions from #12963 (comment) along with the portable version made it work. I'm getting some strange results from the GGUF version, but I'm seeing those strange results on my AMD-based machine too, so that seems unrelated. Thanks for the help y'all!

@deepskyblue86
Copy link

Tried ollama-ipex-llm-2.2.0b20250318-ubuntu.tgz, I got:

❯ ./ollama run gemma3:4b
Error: llama runner process has terminated: error loading model: error loading model hyperparameters: key not found in model: gemma3.attention.layer_norm_rms_epsilon
llama_load_model_from_file: failed to load model

See #12963 (comment)

@jason-dai no luck with that either

❯ ./ollama run modelscope.cn/lmstudio-community/gemma-3-4b-it-GGUF
Error: llama runner process has terminated: exit status 2

on the serve side:

key general.file_type not found in file
terminate called after throwing an instance of 'std::runtime_error'
  what():  Missing required key: general.file_type
SIGABRT: abort

@tombii
Copy link

tombii commented Mar 19, 2025

Tried ollama-ipex-llm-2.2.0b20250318-ubuntu.tgz, I got:

❯ ./ollama run gemma3:4b
Error: llama runner process has terminated: error loading model: error loading model hyperparameters: key not found in model: gemma3.attention.layer_norm_rms_epsilon
llama_load_model_from_file: failed to load model

See #12963 (comment)

@jason-dai no luck with that either

❯ ./ollama run modelscope.cn/lmstudio-community/gemma-3-4b-it-GGUF
Error: llama runner process has terminated: exit status 2

on the serve side:

key general.file_type not found in file
terminate called after throwing an instance of 'std::runtime_error'
  what():  Missing required key: general.file_type
SIGABRT: abort

I got the same error, got it working by downloading models from HF instead.

@MehediHasanSazzad
Copy link

Same i am facing the same issue just like everyone

(llm) C:\Users\Ghoul>ollama.exe run modelscope.cn/lmstudio-community/gemma-3-4b-it-GGUF:Q4_K_M ggml_sycl_init: found 1 SYCL devices: Error: llama runner process has terminated: exit status 2

@deepskyblue86
Copy link

Today I tried again (ollama-ipex-llm-2.2.0b20250318-ubuntu) and it worked!

export IPEX_LLM_MODEL_SOURCE=modelscope
./ollama run gemma3

@sgwhat
Copy link
Contributor

sgwhat commented Mar 26, 2025

Hi all, we are working on upgrading ipex-llm ollama version to re-support gemma3. Before that, you may manage to run gemma3:1b. For more detailes, please see https://github.com/intel/ipex-llm/blob/main/docs/mddocs/Quickstart/ollama_portable_zip_quickstart.md.

@DocMAX
Copy link

DocMAX commented Mar 31, 2025

root@f5f88c17a043:/llm/ollama# ./ollama run gemma3
ggml_sycl_init: found 1 SYCL devices:
Error: llama runner process has terminated: exit status 2

currently not working...

@tristan-k
Copy link

Hi all, we are working on upgrading ipex-llm ollama version to re-support gemma3. Before that, you may manage to run gemma3:1b. For more detailes, please see https://github.com/intel/ipex-llm/blob/main/docs/mddocs/Quickstart/ollama_portable_zip_quickstart.md.

Has this been fixed in the latest IPEX-LLM v2.2.0 release?

@sgwhat
Copy link
Contributor

sgwhat commented Apr 9, 2025

Has this been fixed in the latest IPEX-LLM v2.2.0 release?

Not yet, we could support 0.6.2 within this week.

@rafasaurus
Copy link

I can't pull from docker image from "intelanalytics/ipex-llm-inference-cpp-xpu:latest".

@zimoai
Copy link

zimoai commented Apr 10, 2025

ollama just update to 0.6.5, can this fix the gemma3 gguf model file problem? with:
Error: llama runner process has terminated: error loading model: error loading model hyperparameters: key not found in model: gemma3.attention.layer_norm_rms_epsilon
llama_load_model_from_file: failed to load model

@sgwhat
Copy link
Contributor

sgwhat commented Apr 11, 2025

Hi @rafasaurus @zimoai @DocMAX, I will release an initial version to support gemma3, maybe next Monday or Tuesday.

@shameez-struggles-to-commit

Hi @rafasaurus @zimoai @DocMAX, I will release an initial version to support gemma3, maybe next Monday or Tuesday.

Really looking forward to the latest IPEX Ollama for Gemma 3! The interleaved sliding window attention will be extremely useful for long context windows!

Thanks again for all of the work you're doing here!!!!

@tristan-k
Copy link

Hi @rafasaurus @zimoai @DocMAX, I will release an initial version to support gemma3, maybe next Monday or Tuesday.

Does the 2.3.0-nightly build add support for Gemma3?

@rafasaurus
Copy link

rafasaurus commented Apr 22, 2025

Not sure, i am running docker version intelanalytics/ipex-llm-inference-cpp-xpu, when i run gemma3, it crashes throwing Error: POST predict: Post "http://127.0.0.1:44987/completion": EOF

@tristan-k
Copy link

Same here:

./ollama run gemma3:12b
>>> Why is the sky blue?
Error: POST predict: Post "http://127.0.0.1:45089/completion": EOF
time=2025-04-22T11:38:33.569Z level=INFO source=server.go:106 msg="system memory" total="62.1 GiB" free="59.2 GiB" free_swap="8.0 GiB"
time=2025-04-22T11:38:33.570Z level=INFO source=server.go:139 msg=offload library=cpu layers.requested=-1 layers.model=49 layers.offload=0 layers.split="" memory.available="[59.2 GiB]" memory.gpu_overhead="0 B" memory.required.full="12.3 GiB" memory.required.partial="0 B" memory.required.kv="3.0 GiB" memory.required.allocations="[12.3 GiB]" memory.weights.total="6.0 GiB" memory.weights.repeating="6.0 GiB" memory.weights.nonrepeating="787.5 MiB" memory.graph.full="519.5 MiB" memory.graph.partial="1.3 GiB" projector.weights="795.9 MiB" projector.graph="1.0 GiB"
time=2025-04-22T11:38:33.640Z level=WARN source=ggml.go:149 msg="key not found" key=tokenizer.ggml.pretokenizer default="(?i:'s|'t|'re|'ve|'m|'ll|'d)|[^\\r\\n\\p{L}\\p{N}]?\\p{L}+|\\p{N}{1,3}| ?[^\\s\\p{L}\\p{N}]+[\\r\\n]*|\\s*[\\r\\n]+|\\s+(?!\\S)|\\s+"
time=2025-04-22T11:38:33.643Z level=WARN source=ggml.go:149 msg="key not found" key=tokenizer.ggml.add_eot_token default=false
time=2025-04-22T11:38:33.644Z level=WARN source=ggml.go:149 msg="key not found" key=tokenizer.ggml.pretokenizer default="(?i:'s|'t|'re|'ve|'m|'ll|'d)|[^\\r\\n\\p{L}\\p{N}]?\\p{L}+|\\p{N}{1,3}| ?[^\\s\\p{L}\\p{N}]+[\\r\\n]*|\\s*[\\r\\n]+|\\s+(?!\\S)|\\s+"
time=2025-04-22T11:38:33.649Z level=WARN source=ggml.go:149 msg="key not found" key=gemma3.attention.layer_norm_rms_epsilon default=9.999999974752427e-07
time=2025-04-22T11:38:33.649Z level=WARN source=ggml.go:149 msg="key not found" key=gemma3.rope.local.freq_base default=10000
time=2025-04-22T11:38:33.649Z level=WARN source=ggml.go:149 msg="key not found" key=gemma3.rope.global.freq_base default=1e+06
time=2025-04-22T11:38:33.649Z level=WARN source=ggml.go:149 msg="key not found" key=gemma3.rope.freq_scale default=1
time=2025-04-22T11:38:33.649Z level=WARN source=ggml.go:149 msg="key not found" key=gemma3.mm_tokens_per_image default=256
time=2025-04-22T11:38:33.649Z level=INFO source=server.go:414 msg="starting llama server" cmd="/ollama-bin runner --ollama-engine --model /root/.ollama/models/blobs/sha256-e8ad13eff07a78d89926e9e8b882317d082ef5bf9768ad7b50fcdbbcd63748de --ctx-size 8192 --batch-size 512 --n-gpu-layers 999 --threads 4 --no-mmap --parallel 4 --port 45089"
time=2025-04-22T11:38:33.649Z level=INFO source=sched.go:450 msg="loaded runners" count=1
time=2025-04-22T11:38:33.649Z level=INFO source=server.go:589 msg="waiting for llama runner to start responding"
time=2025-04-22T11:38:33.649Z level=INFO source=server.go:623 msg="waiting for server to become available" status="llm server error"
time=2025-04-22T11:38:33.674Z level=INFO source=runner.go:757 msg="starting ollama engine"
time=2025-04-22T11:38:33.674Z level=INFO source=runner.go:817 msg="Server listening on 127.0.0.1:45089"
time=2025-04-22T11:38:33.750Z level=WARN source=ggml.go:149 msg="key not found" key=general.name default=""
time=2025-04-22T11:38:33.750Z level=WARN source=ggml.go:149 msg="key not found" key=general.description default=""
time=2025-04-22T11:38:33.750Z level=INFO source=ggml.go:68 msg="" architecture=gemma3 file_type=Q4_K_M name="" description="" num_tensors=1065 num_key_values=37
load_backend: loaded SYCL backend from /libggml-sycl.so
load_backend: loaded CPU backend from /libggml-cpu-alderlake.so
time=2025-04-22T11:38:33.774Z level=INFO source=ggml.go:109 msg=system CPU.0.LLAMAFILE=1 CPU.0.OPENMP=1 CPU.0.AARCH64_REPACK=1 CPU.1.LLAMAFILE=1 compiler=cgo(gcc)
get_memory_info: [warning] ext_intel_free_memory is not supported (export/set ZES_ENABLE_SYSMAN=1 to support), use total memory as free memory
time=2025-04-22T11:38:33.901Z level=INFO source=server.go:623 msg="waiting for server to become available" status="llm server loading model"
time=2025-04-22T11:38:34.865Z level=INFO source=ggml.go:296 msg="Number of model weight buffers" count=2
time=2025-04-22T11:38:34.865Z level=INFO source=ggml.go:299 msg="model weights" buffer=SYCL0 size="7.6 GiB"
time=2025-04-22T11:38:34.865Z level=INFO source=ggml.go:299 msg="model weights" buffer=CPU size="787.5 MiB"
Running with Environment Variables:
  GGML_SYCL_DEBUG: 0
  GGML_SYCL_DISABLE_OPT: 1
Build with Macros:
  GGML_SYCL_FORCE_MMQ: no
  GGML_SYCL_F16: no
Found 1 SYCL devices:
|  |                   |                                       |       |Max    |        |Max  |Global |                     |
|  |                   |                                       |       |compute|Max work|sub  |mem    |                     |
|ID|        Device Type|                                   Name|Version|units  |group   |group|size   |       Driver version|
|--|-------------------|---------------------------------------|-------|-------|--------|-----|-------|---------------------|
| 0| [level_zero:gpu:0]|                     Intel Arc Graphics|  12.71|    112|    1024|   32| 62228M|     1.6.32961.700000|
SYCL Optimization Feature:
|ID|        Device Type|Reorder|
|--|-------------------|-------|
| 0| [level_zero:gpu:0]|      Y|
time=2025-04-22T11:38:36.346Z level=INFO source=ggml.go:369 msg="compute graph" backend=SYCL0 buffer_type=SYCL0
time=2025-04-22T11:38:36.346Z level=INFO source=ggml.go:369 msg="compute graph" backend=CPU buffer_type=SYCL_Host
time=2025-04-22T11:38:36.346Z level=WARN source=ggml.go:149 msg="key not found" key=tokenizer.ggml.pretokenizer default="(?i:'s|'t|'re|'ve|'m|'ll|'d)|[^\\r\\n\\p{L}\\p{N}]?\\p{L}+|\\p{N}{1,3}| ?[^\\s\\p{L}\\p{N}]+[\\r\\n]*|\\s*[\\r\\n]+|\\s+(?!\\S)|\\s+"
time=2025-04-22T11:38:36.348Z level=WARN source=ggml.go:149 msg="key not found" key=tokenizer.ggml.add_eot_token default=false
time=2025-04-22T11:38:36.349Z level=WARN source=ggml.go:149 msg="key not found" key=tokenizer.ggml.pretokenizer default="(?i:'s|'t|'re|'ve|'m|'ll|'d)|[^\\r\\n\\p{L}\\p{N}]?\\p{L}+|\\p{N}{1,3}| ?[^\\s\\p{L}\\p{N}]+[\\r\\n]*|\\s*[\\r\\n]+|\\s+(?!\\S)|\\s+"
time=2025-04-22T11:38:36.352Z level=WARN source=ggml.go:149 msg="key not found" key=gemma3.attention.layer_norm_rms_epsilon default=9.999999974752427e-07
time=2025-04-22T11:38:36.352Z level=WARN source=ggml.go:149 msg="key not found" key=gemma3.rope.local.freq_base default=10000
time=2025-04-22T11:38:36.352Z level=WARN source=ggml.go:149 msg="key not found" key=gemma3.rope.global.freq_base default=1e+06
time=2025-04-22T11:38:36.352Z level=WARN source=ggml.go:149 msg="key not found" key=gemma3.rope.freq_scale default=1
time=2025-04-22T11:38:36.352Z level=WARN source=ggml.go:149 msg="key not found" key=gemma3.mm_tokens_per_image default=256
time=2025-04-22T11:38:36.415Z level=INFO source=server.go:628 msg="llama runner started in 2.77 seconds"
[GIN] 2025/04/22 - 11:38:36 | 200 |  2.932672867s |       127.0.0.1 | POST     "/api/generate"

panic: failed to sample token: no tokens to sample from

goroutine 24 [running]:
github.com/ollama/ollama/runner/ollamarunner.(*Server).run(0xc000035d40, {0x157ca60, 0xc000329090})
        /home/runner/_work/llm.cpp/llm.cpp/ollama-internal/runner/ollamarunner/runner.go:335 +0x65
created by github.com/ollama/ollama/runner/ollamarunner.Execute in goroutine 1
        /home/runner/_work/llm.cpp/llm.cpp/ollama-internal/runner/ollamarunner/runner.go:794 +0xa9c
[GIN] 2025/04/22 - 11:39:10 | 200 |  2.434284202s |       127.0.0.1 | POST     "/api/chat"

@sgwhat
Copy link
Contributor

sgwhat commented Apr 22, 2025

Does the 2.3.0-nightly build add support for Gemma3?

Currently we have supported gemma3:fp16.

@tristan-k
Copy link

When will there be support for Q4?

@savvadesogle
Copy link

Q4

Use lm-studio with Vulkan. Performance close to ipex-llm

Image

Image

Image

@sgwhat
Copy link
Contributor

sgwhat commented Apr 23, 2025

When will there be support for Q4?

Maybe this week.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests