Skip to content

[Model] Qwen 2.5 VL compat #1490

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
HarrisDePerceptron opened this issue Mar 29, 2025 · 2 comments
Closed

[Model] Qwen 2.5 VL compat #1490

HarrisDePerceptron opened this issue Mar 29, 2025 · 2 comments
Labels
bug Something isn't working

Comments

@HarrisDePerceptron
Copy link

HarrisDePerceptron commented Mar 29, 2025

Describe the bug

Qwen2.5-VL-32B-Instruct getting error when trying to quantize:

Traceback (most recent call last):
  File "/home/aipc/workspace/ai/qwen-vl/gptq_quant.py", line 90, in <module>
    main()
  File "/home/aipc/workspace/ai/qwen-vl/gptq_quant.py", line 42, in main
    model = GPTQModel.load(pretrained_model_id, quantize_config)
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/aipc/anaconda3/envs/autoawq/lib/python3.11/site-packages/gptqmodel/models/auto.py", line 247, in load
    return cls.from_pretrained(
           ^^^^^^^^^^^^^^^^^^^^
  File "/home/aipc/anaconda3/envs/autoawq/lib/python3.11/site-packages/gptqmodel/models/auto.py", line 275, in from_pretrained
    model_type = check_and_get_model_type(model_id_or_path, trust_remote_code)
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/aipc/anaconda3/envs/autoawq/lib/python3.11/site-packages/gptqmodel/models/auto.py", line 184, in check_and_get_model_type
    raise TypeError(f"{config.model_type} isn't supported yet.")
TypeError: qwen2_5_vl isn't supported yet.

GPU Info

Show output of:

nvidia-smi
Sat Mar 29 23:09:47 2025       
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 555.42.06              Driver Version: 555.42.06      CUDA Version: 12.5     |
|-----------------------------------------+------------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|=========================================+========================+======================|
|   0  NVIDIA GeForce RTX 3090        Off |   00000000:01:00.0  On |                  N/A |
|  0%   43C    P8             23W /  350W |     413MiB /  24576MiB |      1%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+
                                                                                         
+-----------------------------------------------------------------------------------------+
| Processes:                                                                              |
|  GPU   GI   CI        PID   Type   Process name                              GPU Memory |
|        ID   ID                                                               Usage      |
|=========================================================================================|
|    0   N/A  N/A      2438      G   /usr/lib/xorg/Xorg                            100MiB |
|    0   N/A  N/A      2651      G   /usr/bin/gnome-shell                           75MiB |
|    0   N/A  N/A      3210      G   ...irefox/5947/usr/lib/firefox/firefox        198MiB |
+-----------------------------------------------------------------------------------------+

Software Info

Operation System/Version + Python Version

Show output of:

pip show gptqmodel torch transformers accelerate triton

Name: gptqmodel
Version: 2.1.0
Summary: Production ready LLM model compression/quantization toolkit with hw accelerated inference support for both cpu/gpu via HF, vLLM, and SGLang.
Home-page: https://github.com/ModelCloud/GPTQModel
Author: ModelCloud
Author-email: [email protected]
License: Apache 2.0
Location: /home/aipc/anaconda3/envs/autoawq/lib/python3.11/site-packages
Requires: accelerate, datasets, device-smi, hf_transfer, huggingface_hub, logbar, numpy, packaging, pillow, protobuf, safetensors, threadpoolctl, tokenicer, torch, transformers
Required-by: 
---
Name: torch
Version: 2.6.0
Summary: Tensors and Dynamic neural networks in Python with strong GPU acceleration
Home-page: https://pytorch.org/
Author: PyTorch Team
Author-email: [email protected]
License: BSD-3-Clause
Location: /home/aipc/anaconda3/envs/autoawq/lib/python3.11/site-packages
Requires: filelock, fsspec, jinja2, networkx, nvidia-cublas-cu12, nvidia-cuda-cupti-cu12, nvidia-cuda-nvrtc-cu12, nvidia-cuda-runtime-cu12, nvidia-cudnn-cu12, nvidia-cufft-cu12, nvidia-curand-cu12, nvidia-cusolver-cu12, nvidia-cusparse-cu12, nvidia-cusparselt-cu12, nvidia-nccl-cu12, nvidia-nvjitlink-cu12, nvidia-nvtx-cu12, sympy, triton, typing-extensions
Required-by: accelerate, autoawq, compressed-tensors, gptqmodel, outlines, torchaudio, torchvision, vllm, xformers, xgrammar
---
Name: transformers
Version: 4.51.0.dev0
Summary: State-of-the-art Machine Learning for JAX, PyTorch and TensorFlow
Home-page: https://github.com/huggingface/transformers
Author: The Hugging Face team (past and future) with the help of all our contributors (https://github.com/huggingface/transformers/graphs/contributors)
Author-email: [email protected]
License: Apache 2.0 License
Location: /home/aipc/anaconda3/envs/autoawq/lib/python3.11/site-packages
Requires: filelock, huggingface-hub, numpy, packaging, pyyaml, regex, requests, safetensors, tokenizers, tqdm
Required-by: autoawq, compressed-tensors, gptqmodel, tokenicer, vllm, xgrammar
---
Name: accelerate
Version: 1.5.2
Summary: Accelerate
Home-page: https://github.com/huggingface/accelerate
Author: The HuggingFace team
Author-email: [email protected]
License: Apache
Location: /home/aipc/anaconda3/envs/autoawq/lib/python3.11/site-packages
Requires: huggingface-hub, numpy, packaging, psutil, pyyaml, safetensors, torch
Required-by: autoawq, gptqmodel
---
Name: triton
Version: 3.2.0
Summary: A language and compiler for custom Deep Learning operations
Home-page: https://github.com/triton-lang/triton/
Author: Philippe Tillet
Author-email: [email protected]
License: 
Location: /home/aipc/anaconda3/envs/autoawq/lib/python3.11/site-packages
Requires: 
Required-by: autoawq, torch

If you are reporting an inference bug of a post-quantized model, please post the content of config.json and quantize_config.json.

To Reproduce

import os

from gptqmodel import GPTQModel, QuantizeConfig, get_best_device
from transformers import AutoTokenizer

os.environ["CUDA_DEVICE_ORDER"] = "PCI_BUS_ID"

pretrained_model_id = "Qwen/Qwen2.5-VL-32B-Instruct" # "TinyLlama/TinyLlama-1.1B-Chat-v1.0"
quantized_model_id = "Qwen2.5-VL-32B-Instruct-GPTQ"


def main():
    tokenizer = AutoTokenizer.from_pretrained(pretrained_model_id, use_fast=True)
    calibration_dataset = [
        tokenizer(
            "gptqmodel is an easy-to-use model quantization library with user-friendly apis, based on GPTQ algorithm."
        )
    ]

    quantize_config = QuantizeConfig(
        bits=4,  # quantize model to 4-bit
        group_size=128,  # it is recommended to set the value to 128
    )

    # load un-quantized model, by default, the model will always be loaded into CPU memory
    model = GPTQModel.load(pretrained_model_id, quantize_config)

    # quantize model, the calibration_dataset should be list of dict whose keys can only be "input_ids" and "attention_mask"
    model.quantize(calibration_dataset)

    # save quantized model
    model.save(quantized_model_id)

    # push quantized model to Hugging Face Hub.
    # to use use_auth_token=True, Login first via huggingface-cli login.
    # or pass explcit token with: use_auth_token="hf_xxxxxxx"
    # (uncomment the following three lines to enable this feature)
    # repo_id = f"YourUserName/{quantized_model_dir}"
    # commit_message = f"GPTQModel model for {pretrained_model_dir}: {quantize_config.bits}bits, gr{quantize_config.group_size}, desc_act={quantize_config.desc_act}"
    # model.push_to_hub(repo_id, commit_message=commit_message, use_auth_token=True)

    # alternatively you can save and push at the same time
    # (uncomment the following three lines to enable this feature)
    # repo_id = f"YourUserName/{quantized_model_dir}"
    # commit_message = f"GPTQModel model for {pretrained_model_dir}: {quantize_config.bits}bits, gr{quantize_config.group_size}, desc_act={quantize_config.desc_act}"
    # model.push_to_hub(repo_id, save_dir=quantized_model_dir, commit_message=commit_message, use_auth_token=True)

    # save quantized model using safetensors
    model.save(quantized_model_id)

    # load quantized model to the first GPU
    device = get_best_device()
    model = GPTQModel.load(quantized_model_id, device=device)

    # load quantized model to CPU with IPEX kernel linear.
    # model = GPTQModel.from_quantized(quantized_model_dir, device="cpu")

    # download quantized model from Hugging Face Hub and load to the first GPU
    # model = GPTQModel.from_quantized(repo_id, device="cuda:0",)

    # inference with model.generate
    print(tokenizer.decode(model.generate(**tokenizer("gptqmodel is", return_tensors="pt").to(model.device))[0]))


if __name__ == "__main__":
    import logging

    logging.basicConfig(
        format="%(asctime)s %(levelname)s [%(name)s] %(message)s",
        level=logging.INFO,
        datefmt="%Y-%m-%d %H:%M:%S",
    )

    main()

Expected behavior
quantizes the 32B VL version

Model/Datasets

Make sure your model/dataset is downloadable (on HF for example) so we can reproduce your issue.
It is downloadable

Screenshots

If applicable, add screenshots to help explain your problem.
included the error ouput
Additional context

Qwen2.5-32B-Instruct is new. but readme mentions qwen-vl is supported so expected it to work

@HarrisDePerceptron HarrisDePerceptron added the bug Something isn't working label Mar 29, 2025
@Qubitium
Copy link
Collaborator

Qubitium commented Apr 1, 2025

@ZX-ModelCloud Check if we Qwen 2.5 VL is same as Qwen 2 VL. If same, we just need to update class for support.

@Qubitium Qubitium changed the title [BUG] [Model] Qwen 2.5 VL compat Apr 1, 2025
@Qubitium
Copy link
Collaborator

Qubitium commented Apr 2, 2025

@HarrisDePerceptron Qwen 2.5 VL support added. Please pull main and test.

#1493

@Qubitium Qubitium closed this as completed Apr 2, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants