Model Wishlist #156

EricLBuehler · 2024-04-16T13:37:38Z

NiuBlibing · 2024-04-23T03:54:54Z

qwen1.5-72B-Chat

NiuBlibing · 2024-04-23T03:55:05Z

llama3

EricLBuehler · 2024-04-23T21:58:53Z

@NiuBlibing, we have llama3 support ready: the README has a few examples. I will add Qwen support shortly.

EricLBuehler · 2024-04-25T23:08:41Z

@NiuBlibing, I just added Qwen2 support. Quantized Qwen2 support will be added in the next few days.

cargecla1 · 2024-04-26T11:26:16Z

Can you add https://huggingface.co/Snowflake/snowflake-arctic-instruct?

francis2tm · 2024-04-28T21:19:36Z

Hello!
Any plans for adding multimodal (e.g. llava) and embedding models?

EricLBuehler · 2024-04-28T21:24:38Z

Can you add https://huggingface.co/Snowflake/snowflake-arctic-instruct?

@cargecla1, yes! It will be a great use case for ISQ.

EricLBuehler · 2024-04-28T21:26:31Z

Hello!
Any plans for adding multimodal (e.g. llava) and embedding models?

@francis2tm, yes. I plan on supporting Llava and embedding models this week.

EricLBuehler · 2024-04-28T21:36:19Z

@NiuBlibing, you can run Qwen now with ISQ, which will quantize it.

kir-gadjello · 2024-04-29T01:59:23Z

Would be nice to support at least one strong vision-language model: https://huggingface.co/openbmb/MiniCPM-V-2 https://huggingface.co/OpenGVLab/InternVL-Chat-V1-5 with an option to compute visual frontend model on CPU. You might find it easier to ship visual transformer part via onnx.

chelbos · 2024-04-29T02:57:16Z

Would love to see some DeepSeek-VL, this model is better than Llava and spupports multiple images per prompt
https://huggingface.co/collections/deepseek-ai/deepseek-vl-65f295948133d9cf92b706d3

chelbos · 2024-04-29T03:00:06Z

Also, outside the LLM world, would love to see support for https://github.com/cvg/LightGlue :) but not sure if that's possible ...

jxtt01 · 2024-04-29T03:10:00Z

Could you add support to for GGUF quantized Phi-3-Mini to the wishlist? Currently, this fails (built from master):

Running `./mistralrs-server gguf -m PrunaAI/Phi-3-mini-128k-instruct-GGUF-Imatrix-smashed -t microsoft/Phi-3-mini-128k-instruct -f /home/jett/Downloads/llms/Phi-3-mini-128k-instruct-q3_K_S.gguf`
2024-04-29T03:08:35.180939Z  INFO mistralrs_server: avx: true, neon: false, simd128: false, f16c: false
2024-04-29T03:08:35.180975Z  INFO mistralrs_server: Sampling method: penalties -> temperature -> topk -> topp -> multinomial
2024-04-29T03:08:35.180982Z  INFO mistralrs_server: Loading model `microsoft/Phi-3-mini-128k-instruct` on Cpu...
2024-04-29T03:08:35.180989Z  INFO mistralrs_server: Model kind is: quantized from gguf (no adapters)
2024-04-29T03:08:35.181017Z  INFO hf_hub: Token file not found "/home/jett/.cache/huggingface/token"    
2024-04-29T03:08:35.181048Z  INFO mistralrs_core::utils::tokens: Could not load token at "/home/jett/.cache/huggingface/token", using no HF token.
2024-04-29T03:08:35.181122Z  INFO hf_hub: Token file not found "/home/jett/.cache/huggingface/token"    
2024-04-29T03:08:35.181133Z  INFO mistralrs_core::utils::tokens: Could not load token at "/home/jett/.cache/huggingface/token", using no HF token.
Error: Unknown GGUF architecture `phi3`

rodion-m · 2024-04-29T06:47:44Z

It'll be great to see WizardLM-2 and suzume. And thanks for a great tool!

W4G1 · 2024-04-29T11:53:18Z

Command-R and Command-R+ from Cohere would be amazing 🙏

yongkangzhao · 2024-04-29T17:32:02Z

T5
LLAVA

EricLBuehler · 2024-04-29T17:41:19Z

@kir-gadjello

Would be nice to support at least one strong vision-language model: https://huggingface.co/openbmb/MiniCPM-V-2 https://huggingface.co/OpenGVLab/InternVL-Chat-V1-5 with an option to compute visual frontend model on CPU. You might find it easier to ship visual transformer part via onnx.

Supporting a vision+language or multimodal model is very high priority right now.

@chelbos

Would love to see some DeepSeek-VL, this model is better than Llava and spupports multiple images per prompt
https://huggingface.co/collections/deepseek-ai/deepseek-vl-65f295948133d9cf92b706d3

I'll add this one too.

Also, outside the LLM world, would love to see support for https://github.com/cvg/LightGlue :) but not sure if that's possible ...

I will look into it!

@jett06

Could you add support to for GGUF quantized Phi-3-Mini to the wishlist?

Yes, absolutely, I think it should be easy. In the meantime, you can use ISQ to get the same speed.

@rodion-m

It'll be great to see WizardLM-2 and suzume. And thanks for a great tool!

Thanks! I think suzume is just finetuned Llama so that can be used already. I'll add WizardLM.

@W4G1

Command-R and Command-R+ from Cohere would be amazing 🙏

Yes, I'll add those.

@yongkangzhao

T5 and LLaVA

Yes, I'll add those. T5 will be a nice smaller model.

jxtt01 · 2024-04-29T19:22:32Z

@EricLBuehler Thanks for your reply, for adding my suggestion to the model wishlist, and for developing such an awesome project! It's very appreciated :)

ldt · 2024-04-30T13:50:09Z

Congrats for your great work!
+1 for vision models like Idefics2-8b or better would be awesome

maximus2600 · 2024-05-01T03:03:21Z

it would be nice to add some embedding models like nomic-text-embed.

progressionnetwork · 2024-05-04T07:06:39Z

Hello, first of all, I want to express my appreciation for the excellent work your team has accomplished on the mistral.rs engine. It's a great project.

I am currently developing a personal AI assistant using Rust, and I believe integrating additional features into your engine could significantly enhance its utility and appeal. Specifically, adding support for Whisper and incorporating Text-to-Speech (TTS) functionalities, such as StyleTTS or similar technologies, would be incredibly beneficial. This would enable the engine to handle LLM inference, speech-to-text, and text-to-speech processes in a unified system very fast (near runtime).

Implementing these features could transform the engine into a more versatile tool for developers like myself, who are keen on building more integrated and efficient AI applications.

EricLBuehler · 2024-05-09T15:25:01Z

@jett06, I just added quantized GGUF Phi-3 support in #276! That is without LongRope support currently, but you can use a plain model with ISQ.

jxtt01 · 2024-05-09T19:48:53Z

@EricLBuehler Woah, thank you so much! This will be lovely for us folks with less powerful computers or size constraints, you're awesome :)

EricLBuehler · 2024-05-09T21:20:37Z

@jett06, my pleasure! I just fixed a small bug (in case you saw the strange behavior), so it should be all ready to go now!

NeroHin · 2024-05-10T01:37:37Z

IBM's Granite series Code Models.

Granite Code Models

LLukas22 · 2024-05-11T16:50:01Z

@NeroHin

IBM's Granite series Code Models.

Granite Code Models

The 3b and 8b variants should already be supported as they are just based on the llama architecture.

The 20b and 34b variants are based on the GPTBigCode architecture which currently isn't implemented in mistral.rs.

chenwanqq · 2024-05-23T08:41:13Z

Hello! Any plans for adding multimodal (e.g. llava) and embedding models?

I'm working on it now.chenwanqq/candle-llava
It's not easy dude, tons of image preprocess and tensor concat.

youcefs21 · 2025-01-06T12:07:02Z

@test3211234 if you just need text, why not process frame by frame with something like https://huggingface.co/stepfun-ai/GOT-OCR2_0

should be relatively fast on a decent GPU, like 1-2 seconds per frame, skipping every other frame, you can probably do 45 minutes of video text extraction per day with a naive approach. You can probably optimize that more with some sort of parallelism.

test3211234 · 2025-01-10T22:16:34Z

That doesn't support bounding boxes, RIP

@EricLBuehler Can you add Mistral NeMo/Nemo 12B? Don't know how it all works but I think I need high quant. I'm 8 GB VRAM.

andreclaudino · 2025-01-17T00:21:01Z

Would be great to support Loras for Flux. Any tip or guide on how I can contribute for that?

test3211234 · 2025-01-21T23:18:04Z

@EricLBuehler You have to add the DeepSeek R1 models. 7B, 8B, etc.

EricLBuehler · 2025-01-22T02:49:53Z

@test3211234 you can run the distill models without any changes as they are simple Qwen/Llama models. I merged initial support for the DeepSeek R1 model in #1077 for those with large enough systems and will support loading from the FP8 weights shortly.

mush42 · 2025-01-25T13:10:59Z

@EricLBuehler Please add support for SmolVLM :
https://huggingface.co/blog/smolervlm

EricLBuehler · 2025-01-25T14:44:10Z

@mush42 you can already run all the SmolVLM models: https://github.com/EricLBuehler/mistral.rs/blob/master/docs/IDEFICS3.md

They share an architecture with Idefics 3, so all you need to do is change the model ID.

like-a-freedom · 2025-01-27T10:13:52Z

Vote for moondream2: https://huggingface.co/vikhyatk/moondream2

francis2tm · 2025-01-27T18:58:06Z

https://huggingface.co/deepseek-ai/Janus-Pro-7B including image gen?

Remember20240719 · 2025-02-01T17:06:38Z

Hello, Mistral Small Instruct 2501 please. I've filed issue #1118

Aveline67 · 2025-03-08T00:58:52Z

Qwen/QwQ32B

rozgo · 2025-03-11T03:59:13Z

Qwen/QwQ32B

try:

cargo run --release --features metal -- -i --log test.log --isq Q4K plain -m Qwen/QwQ-32B -a qwen2

or cuda instead of metal.

green-s · 2025-03-17T22:00:39Z

Ovis2

https://github.com/AIDC-AI/Ovis
https://huggingface.co/AIDC-AI/Ovis2-34B

It's based on Qwen-2.5 but uses Apple's AIMv2 for vision, supposedly resulting in better performance than Qwen2.5-VL-72B, and importantly it has a 34B checkpoint while Qwen2.5-VL does not.

bet0x · 2025-03-19T19:46:16Z

@EricLBuehler Can we get support for https://huggingface.co/mistralai/Mistral-Small-3.1-24B-Instruct-2503? I's a day old, Multi Modal.

EricLBuehler · 2025-03-19T23:59:06Z

@bet0x absolutely, working on it now - expect it to be added in the next few days!

EricLBuehler · 2025-03-20T00:00:21Z

@green-s that looks interesting, will check it out & possibly add it after mistral3!

bet0x · 2025-03-20T00:24:55Z

@bet0x absolutely, working on it now - expect it to be added in the next few days!

That would be absolutely amazing!

EricLBuehler · 2025-03-20T00:59:05Z

@bet0x PR is up on #1221 - the model mostly implemented already, faster than I expected :)

bet0x · 2025-03-20T01:27:05Z

@bet0x PR is up on #1221 - the model mostly implemented already, faster than I expected :)

Is Quant supported or as is using safetensors?

brrr · 2025-03-20T06:51:00Z

@bet0x PR is up on #1221 - the model mostly implemented already, faster than I expected :)

Awesome! Much faster then I expected too, I was going to ask you @EricLBuehler if you have time to port mistral3.

EricLBuehler · 2025-03-21T01:14:45Z

@bet0x @brrr the Mistral 3 PR #1221 should be runnable now - ISQ is supported!

EricLBuehler · 2025-03-21T14:01:35Z

@bet0x @brrr Mistral 3 support has been merged w/ ISQ support too!

bet0x · 2025-03-21T14:02:39Z

@EricLBuehler I will test it today! thank you for your excellent work !!!

di-osc · 2025-03-26T22:43:23Z

hello @EricLBuehler , thank you for open-sourcing the mistral.rs, it is a fantastic project. can you please support the qwen2.5-omini model?

Atharva-Phatak · 2025-04-11T20:17:03Z

@EricLBuehler I would love to contribute gotocr-2.0. Although I don't know where to start, I would love to do it myself. Looking for some guidance.

Korolen · 2025-04-24T04:49:56Z

GLM-4-32B-0414 / GLM-Z1-32B-0414 / GLM-Z1-Rumination-32B-0414 seem overpowered. would love to see them supported in mistral.rs! big cheers and thanks from me btw for this amazing work / inference setup. <3 !! 🍻 ✊

sammcj · 2025-05-01T01:41:27Z

Qwen3 would be nice!

mistralrs-server gguf --quantized-model-id unsloth/Qwen3-0.6B-GGUF --quantized-filename Qwen3-0.6B-UD-Q6_K_XL.gguf
2025-05-01T01:39:30.652604Z  INFO mistralrs_server: avx: false, neon: true, simd128: false, f16c: false
2025-05-01T01:39:30.652623Z  INFO mistralrs_server: Sampling method: penalties -> temperature -> topk -> topp -> minp -> multinomial
2025-05-01T01:39:30.652635Z  INFO mistralrs_server: Model kind is: gguf quantized from gguf (no adapters)
Qwen3-0.6B-UD-Q6_K_XL.gguf [00:00:23] [███████████████████████████████████████████████████████████████████████████████████████████████████████████████] 549.76 MiB/549.76 MiB 25.31 MiB/s (0s)2025-05-01T01:39:56.534520Z  INFO mistralrs_core::pipeline::gguf: Prompt chunk size is 1024.

thread 'main' panicked at mistralrs-core/src/gguf/content.rs:94:22:
called `Result::unwrap()` on an `Err` value: Unknown GGUF architecture `qwen3`
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace

EricLBuehler · 2025-05-01T02:38:46Z

@sammcj absolutely, will add gguf support!

anidotnet · 2025-05-03T06:10:10Z

@EricLBuehler I also want Qwen3 support for GGUF. And also while you are at it, could you please take a look at if it is possible to add all ggml types mentioned here - https://github.com/ggml-org/ggml/blob/17733de6a7854b9696be7a563711c9aa4a34b2d3/include/ggml.h#L351, so that I can use Qwen3 1-bit GGUF model also from unsloth/Qwen3-0.6B-GGUF?

Jumaron · 2025-05-21T23:20:32Z

Gemma-3n 4B and 2B version

EricLBuehler added the models Additions to model or architectures label Apr 16, 2024

EricLBuehler mentioned this issue Apr 16, 2024

Model wishlist #49

Closed

14 tasks

EricLBuehler pinned this issue Apr 16, 2024

Model Wishlist #156

Model Wishlist #156

Comments

EricLBuehler commented Apr 16, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Language models

Multimodal models

Embedding models

NiuBlibing commented Apr 23, 2024

Uh oh!

NiuBlibing commented Apr 23, 2024

Uh oh!

EricLBuehler commented Apr 23, 2024

Uh oh!

EricLBuehler commented Apr 25, 2024

Uh oh!

cargecla1 commented Apr 26, 2024

Uh oh!

francis2tm commented Apr 28, 2024

Uh oh!

EricLBuehler commented Apr 28, 2024

Uh oh!

EricLBuehler commented Apr 28, 2024

Uh oh!

EricLBuehler commented Apr 28, 2024

Uh oh!

kir-gadjello commented Apr 29, 2024

Uh oh!

chelbos commented Apr 29, 2024

Uh oh!

chelbos commented Apr 29, 2024

Uh oh!

jxtt01 commented Apr 29, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

rodion-m commented Apr 29, 2024

Uh oh!

W4G1 commented Apr 29, 2024

Uh oh!

yongkangzhao commented Apr 29, 2024

Uh oh!

EricLBuehler commented Apr 29, 2024

Uh oh!

jxtt01 commented Apr 29, 2024

Uh oh!

ldt commented Apr 30, 2024

Uh oh!

maximus2600 commented May 1, 2024

Uh oh!

progressionnetwork commented May 4, 2024

Uh oh!

EricLBuehler commented May 9, 2024

Uh oh!

jxtt01 commented May 9, 2024

Uh oh!

EricLBuehler commented May 9, 2024

Uh oh!

NeroHin commented May 10, 2024

Uh oh!

LLukas22 commented May 11, 2024

Uh oh!

chenwanqq commented May 23, 2024

Uh oh!

youcefs21 commented Jan 6, 2025

Uh oh!

test3211234 commented Jan 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

andreclaudino commented Jan 17, 2025

Uh oh!

test3211234 commented Jan 21, 2025

Uh oh!

EricLBuehler commented Jan 22, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

mush42 commented Jan 25, 2025

Uh oh!

EricLBuehler commented Jan 25, 2025

Uh oh!

like-a-freedom commented Jan 27, 2025

EricLBuehler commented Apr 16, 2024 •

edited

Loading

jxtt01 commented Apr 29, 2024 •

edited

Loading

test3211234 commented Jan 10, 2025 •

edited

Loading

EricLBuehler commented Jan 22, 2025 •

edited

Loading

rozgo commented Mar 11, 2025 •

edited

Loading

green-s commented Mar 17, 2025 •

edited

Loading

anidotnet commented May 3, 2025 •

edited

Loading