Closed
Description
LocalAI version:
Docker image: localai/localai:v2.9.0-cublas-cuda12-core
with extra backend autogptq
Environment, CPU architecture, OS, and Version:
# nvidia-smi
Fri Mar 8 05:21:56 2024
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 535.129.03 Driver Version: 535.129.03 CUDA Version: 12.2 |
|-----------------------------------------+----------------------+----------------------+
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+======================+======================|
| 0 NVIDIA A10 On | 00000000:F0:00.0 Off | 0 |
| 0% 29C P8 15W / 150W | 2MiB / 23028MiB | 0% Default |
| | | N/A |
+-----------------------------------------+----------------------+----------------------+
| 1 NVIDIA A10 On | 00000000:F1:00.0 Off | 0 |
| 0% 29C P8 15W / 150W | 2MiB / 23028MiB | 0% Default |
| | | N/A |
+-----------------------------------------+----------------------+----------------------+
+---------------------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=======================================================================================|
| No running processes found |
+---------------------------------------------------------------------------------------+
Describe the bug
Trying to start Qwen-VL-Chat-int4 model, but failed due to autogptq can't find the config.json
in the model folder.
To Reproduce
- Build docker image with Dockerfile:
FROM localai/localai:v2.9.0-cublas-cuda12-core
RUN apt-get update -y && apt-get install -y curl gcc libxml2 libxml2-dev
RUN apt install -y wget git && \
apt clean && \
rm -rf /var/lib/apt/lists/* /tmp/* /var/tmp/*
ENV PATH="/root/miniconda3/bin:${PATH}"
ARG PATH="/root/miniconda3/bin:${PATH}"
RUN wget \
https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh \
&& mkdir .conda \
&& bash Miniconda3-latest-Linux-x86_64.sh -b \
&& rm -f Miniconda3-latest-Linux-x86_64.sh
RUN conda init bash
RUN PATH=$PATH:/opt/conda/bin make -C backend/python/autogptq
ENV EXTERNAL_GRPC_BACKENDS="autogptq:/build/backend/python/autogptq/run.sh"
ENV BUILD_TYPE="cublas"
-
Downlaod the model files to local drive:
huggingface-cli download --resume-download Qwen/Qwen-VL-Chat-Int4 --local-dir qwen-vl-chat-int4 --local-dir-use-symlinks False
-
Create
qwen-vl.yaml
file
# Model name.
# The model name is used to identify the model in the API calls.
- name: gpt-4-vision-preview
# Default model parameters.
# These options can also be specified in the API calls
parameters:
model: qwen-vl-chat-int4
temperature: 0.7
top_k: 85
top_p: 0.7
# Default context size
context_size: 4096
# Default number of threads
threads: 16
backend: autogptq
# define chat roles
roles:
user: "user:"
assistant: "assistant:"
system: "system:"
template:
chat: &template |
Instruct: {{.Input}}
Output:
# Modify the prompt template here ^^^ as per your requirements
completion: *template
# Enable F16 if backend supports it
f16: true
embeddings: false
# Enable debugging
debug: true
# GPU Layers (only used when built with cublas)
gpu_layers: -1
# Diffusers/transformers
cuda: true
-
Run the model:
docker run -p 8080:8080 -v $PWD/models:/opt/models -e MODELS_PATH=/opt/models localai:v2.9.0-autogptq --config-file /opt/models/qwen-vl.yaml
-
Call the API
curl http://localhost:8080/v1/chat/completions -H "Content-Type: application/json" -d '{
"model": "gpt-4-vision-preview",
"messages": [{"role": "user", "content": [{"type":"text", "text": "What is in the image?"}, {"type": "image_url", "image_url": {"url": "https://upload.wikimedia.org/wikipedia/commons/thumb/d/dd/Gfp-wisconsin-madison-the-nature-boardwalk.jpg/2560px-Gfp-wisconsin-madison-the-nature-boardwalk.jpg" }}], "temperature": 0.9}]}'
Expected behavior
Respond with answers.
Logs
{
"error": {
"code": 500,
"message": "could not load model (no success): Unexpected err=OSError(\"We couldn't connect to 'https://huggingface.co' to load this file, couldn't find it in the cached files and it looks like qwen-vl-chat-int4 is not the path to a directory containing a file named config.json.\\nCheckout your internet connection or see how to run the library in offline mode at 'https://huggingface.co/docs/transformers/installation#offline-mode'.\"), type(err)=<class 'OSError'>",
"type": ""
}
}