Skip to content

load_4bit和load_8bit选项出现报错 #3

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
ZyqAlwaysCool opened this issue Mar 20, 2024 · 0 comments
Open

load_4bit和load_8bit选项出现报错 #3

ZyqAlwaysCool opened this issue Mar 20, 2024 · 0 comments

Comments

@ZyqAlwaysCool
Copy link

根据README里的指引,测试了单卡和多卡运行,遇到了几个问题想请教下大佬:

1. 单卡运行

本地环境配置

  • cuda==v12.0,nvcc==v11.8,亲测nvcc==11.8或是nvcc==12.0均可
  • 计算卡为6*V100
  • torch和cuda配置:
  • image

遇到的问题

--load-4bit出现报错

Traceback (most recent call last):
  File "/home/kemove/miniconda3/envs/py39-test/lib/python3.9/runpy.py", line 197, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/home/kemove/miniconda3/envs/py39-test/lib/python3.9/runpy.py", line 87, in _run_code
    exec(code, run_globals)
  File "/home/kemove/zyq/giit/Qilin-Med-VL/llava/serve/cli.py", line 122, in <module>
    main(args)
  File "/home/kemove/zyq/giit/Qilin-Med-VL/llava/serve/cli.py", line 39, in main
    tokenizer, model, image_processor, context_len = load_pretrained_model(args.model_path, args.model_base, model_name, args.load_8bit, args.load_4bit, device=args.device)
  File "/home/kemove/zyq/giit/Qilin-Med-VL/llava/model/builder.py", line 103, in load_pretrained_model
    model = LlavaLlamaForCausalLM.from_pretrained(model_path, low_cpu_mem_usage=True, **kwargs)
  File "/home/kemove/miniconda3/envs/py39-test/lib/python3.9/site-packages/transformers/modeling_utils.py", line 2629, in from_pretrained
    model = cls(config, *model_args, **model_kwargs)
TypeError: __init__() got an unexpected keyword argument 'load_in_4bit'

--load-8bit出现报错:RuntimeError: probability tensor contains either inf, nan or element < 0

查阅资料发现,其他模型也存在类似的问题,原因在于load_8bit.

2. 多卡运行

本地环境为6*V100,测试以下2种策略:

  1. 不指定特定的gpu卡。device_map='auto',模型会默认使用6张卡进行加载,推理时出现报错:RuntimeError: probability tensor contains either inf, nan or element < 0
  2. 指定特定gpu卡。发现当指定卡数大于2时,推理时出现同1的报错信息。指定卡数为2时正常推理
import os 

#两卡ok,超过2会报错:RuntimeError: probability tensor contains either `inf`, `nan` or element < 0
os.environ['CUDA_VISIBLE_DEVICES'] = '2,3'

3. 总结

实测QiLin-Med-VL模型发现,以下策略能够正常运行:

  1. 单卡吃26G左右显存,不加4bit或8bit选项
  2. 多卡指定2张gpu

实测demo:
f6678f521e1b341013d5f40a9d717c5

想请教作者的问题:

  1. 为什么load-4bit和load-8bit均会报错,README中脚本指定了load-4bit,该问题是否和我本地python包版本有关系?
  2. 多卡推理时,为什么当卡数>2时会出现报错,是否和llava有关系?

谢谢~

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant