-
Notifications
You must be signed in to change notification settings - Fork 415
Model Wishlist #156
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
qwen1.5-72B-Chat |
llama3 |
@NiuBlibing, we have llama3 support ready: the README has a few examples. I will add Qwen support shortly. |
@NiuBlibing, I just added Qwen2 support. Quantized Qwen2 support will be added in the next few days. |
Hello! |
@cargecla1, yes! It will be a great use case for ISQ. |
@francis2tm, yes. I plan on supporting Llava and embedding models this week. |
@NiuBlibing, you can run Qwen now with ISQ, which will quantize it. |
Would be nice to support at least one strong vision-language model: https://huggingface.co/openbmb/MiniCPM-V-2 https://huggingface.co/OpenGVLab/InternVL-Chat-V1-5 with an option to compute visual frontend model on CPU. You might find it easier to ship visual transformer part via onnx. |
Would love to see some DeepSeek-VL, this model is better than Llava and spupports multiple images per prompt |
Also, outside the LLM world, would love to see support for https://github.com/cvg/LightGlue :) but not sure if that's possible ... |
Could you add support to for GGUF quantized Phi-3-Mini to the wishlist? Currently, this fails (built from master): Running `./mistralrs-server gguf -m PrunaAI/Phi-3-mini-128k-instruct-GGUF-Imatrix-smashed -t microsoft/Phi-3-mini-128k-instruct -f /home/jett/Downloads/llms/Phi-3-mini-128k-instruct-q3_K_S.gguf`
2024-04-29T03:08:35.180939Z INFO mistralrs_server: avx: true, neon: false, simd128: false, f16c: false
2024-04-29T03:08:35.180975Z INFO mistralrs_server: Sampling method: penalties -> temperature -> topk -> topp -> multinomial
2024-04-29T03:08:35.180982Z INFO mistralrs_server: Loading model `microsoft/Phi-3-mini-128k-instruct` on Cpu...
2024-04-29T03:08:35.180989Z INFO mistralrs_server: Model kind is: quantized from gguf (no adapters)
2024-04-29T03:08:35.181017Z INFO hf_hub: Token file not found "/home/jett/.cache/huggingface/token"
2024-04-29T03:08:35.181048Z INFO mistralrs_core::utils::tokens: Could not load token at "/home/jett/.cache/huggingface/token", using no HF token.
2024-04-29T03:08:35.181122Z INFO hf_hub: Token file not found "/home/jett/.cache/huggingface/token"
2024-04-29T03:08:35.181133Z INFO mistralrs_core::utils::tokens: Could not load token at "/home/jett/.cache/huggingface/token", using no HF token.
Error: Unknown GGUF architecture `phi3` |
It'll be great to see WizardLM-2 and suzume. And thanks for a great tool! |
Command-R and Command-R+ from Cohere would be amazing 🙏 |
T5 |
Supporting a vision+language or multimodal model is very high priority right now.
I'll add this one too.
I will look into it!
Yes, absolutely, I think it should be easy. In the meantime, you can use ISQ to get the same speed.
Thanks! I think suzume is just finetuned Llama so that can be used already. I'll add WizardLM.
Yes, I'll add those.
Yes, I'll add those. T5 will be a nice smaller model. |
@EricLBuehler Thanks for your reply, for adding my suggestion to the model wishlist, and for developing such an awesome project! It's very appreciated :) |
Congrats for your great work! |
it would be nice to add some embedding models like nomic-text-embed. |
Hello, first of all, I want to express my appreciation for the excellent work your team has accomplished on the mistral.rs engine. It's a great project. I am currently developing a personal AI assistant using Rust, and I believe integrating additional features into your engine could significantly enhance its utility and appeal. Specifically, adding support for Whisper and incorporating Text-to-Speech (TTS) functionalities, such as StyleTTS or similar technologies, would be incredibly beneficial. This would enable the engine to handle LLM inference, speech-to-text, and text-to-speech processes in a unified system very fast (near runtime). Implementing these features could transform the engine into a more versatile tool for developers like myself, who are keen on building more integrated and efficient AI applications. |
@EricLBuehler Woah, thank you so much! This will be lovely for us folks with less powerful computers or size constraints, you're awesome :) |
@jett06, my pleasure! I just fixed a small bug (in case you saw the strange behavior), so it should be all ready to go now! |
IBM's Granite series Code Models. |
The The |
I'm working on it now.chenwanqq/candle-llava |
@test3211234 if you just need text, why not process frame by frame with something like https://huggingface.co/stepfun-ai/GOT-OCR2_0 should be relatively fast on a decent GPU, like 1-2 seconds per frame, skipping every other frame, you can probably do 45 minutes of video text extraction per day with a naive approach. You can probably optimize that more with some sort of parallelism. |
That doesn't support bounding boxes, RIP @EricLBuehler Can you add Mistral NeMo/Nemo 12B? Don't know how it all works but I think I need high quant. I'm 8 GB VRAM. |
Would be great to support Loras for Flux. Any tip or guide on how I can contribute for that? |
@EricLBuehler You have to add the DeepSeek R1 models. 7B, 8B, etc. |
@test3211234 you can run the distill models without any changes as they are simple Qwen/Llama models. I merged initial support for the DeepSeek R1 model in #1077 for those with large enough systems and will support loading from the FP8 weights shortly. |
@EricLBuehler Please add support for |
@mush42 you can already run all the SmolVLM models: https://github.com/EricLBuehler/mistral.rs/blob/master/docs/IDEFICS3.md They share an architecture with Idefics 3, so all you need to do is change the model ID. |
Vote for moondream2: https://huggingface.co/vikhyatk/moondream2 |
https://huggingface.co/deepseek-ai/Janus-Pro-7B including image gen? |
Hello, Mistral Small Instruct 2501 please. I've filed issue #1118 |
Qwen/QwQ32B |
try:
or cuda instead of metal. |
Ovis2 https://github.com/AIDC-AI/Ovis It's based on Qwen-2.5 but uses Apple's AIMv2 for vision, supposedly resulting in better performance than Qwen2.5-VL-72B, and importantly it has a 34B checkpoint while Qwen2.5-VL does not. |
@EricLBuehler Can we get support for https://huggingface.co/mistralai/Mistral-Small-3.1-24B-Instruct-2503? I's a day old, Multi Modal. |
@bet0x absolutely, working on it now - expect it to be added in the next few days! |
@green-s that looks interesting, will check it out & possibly add it after mistral3! |
That would be absolutely amazing! |
Awesome! Much faster then I expected too, I was going to ask you @EricLBuehler if you have time to port mistral3. |
@EricLBuehler I will test it today! thank you for your excellent work !!! |
hello @EricLBuehler , thank you for open-sourcing the mistral.rs, it is a fantastic project. can you please support the qwen2.5-omini model? |
@EricLBuehler I would love to contribute gotocr-2.0. Although I don't know where to start, I would love to do it myself. Looking for some guidance. |
GLM-4-32B-0414 / GLM-Z1-32B-0414 / GLM-Z1-Rumination-32B-0414 seem overpowered. would love to see them supported in mistral.rs! big cheers and thanks from me btw for this amazing work / inference setup. <3 !! 🍻 ✊ |
Qwen3 would be nice!
|
@sammcj absolutely, will add gguf support! |
@EricLBuehler I also want Qwen3 support for GGUF. And also while you are at it, could you please take a look at if it is possible to add all ggml types mentioned here - https://github.com/ggml-org/ggml/blob/17733de6a7854b9696be7a563711c9aa4a34b2d3/include/ggml.h#L351, so that I can use Qwen3 1-bit GGUF model also from unsloth/Qwen3-0.6B-GGUF? |
Gemma-3n 4B and 2B version |
Uh oh!
There was an error while loading. Please reload this page.
Please let us know what model architectures you would like to be added!
Up to date todo list below. Please feel free to contribute any model, a PR without device mapping, ISQ, etc. will still be merged!
Language models
Multimodal models
Embedding models
The text was updated successfully, but these errors were encountered: