You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: docs/content/docs/features/GPU-acceleration.md
+49-29Lines changed: 49 additions & 29 deletions
Original file line number
Diff line number
Diff line change
@@ -15,9 +15,45 @@ This section contains instruction on how to use LocalAI with GPU acceleration.
15
15
For accelleration for AMD or Metal HW there are no specific container images, see the [build]({{%relref "docs/getting-started/build#Acceleration" %}})
16
16
{{% /alert %}}
17
17
18
-
### CUDA(NVIDIA) acceleration
19
18
20
-
#### Requirements
19
+
## Model configuration
20
+
21
+
Depending on the model architecture and backend used, there might be different ways to enable GPU acceleration. It is required to configure the model you intend to use with a YAML config file. For example, for `llama.cpp` workloads a configuration file might look like this (where `gpu_layers` is the number of layers to offload to the GPU):
22
+
23
+
```yaml
24
+
name: my-model-name
25
+
# Default model parameters
26
+
parameters:
27
+
# Relative to the models path
28
+
model: llama.cpp-model.ggmlv3.q5_K_M.bin
29
+
30
+
context_size: 1024
31
+
threads: 1
32
+
33
+
f16: true # enable with GPU acceleration
34
+
gpu_layers: 22# GPU Layers (only used when built with cublas)
35
+
36
+
```
37
+
38
+
For diffusers instead, it might look like this instead:
@@ -74,37 +110,21 @@ llama_model_load_internal: total VRAM used: 1598 MB
74
110
llama_init_from_file: kv self size = 512.00 MB
75
111
```
76
112
77
-
#### Model configuration
113
+
## Intel acceleration (sycl)
78
114
79
-
Depending on the model architecture and backend used, there might be different ways to enable GPU acceleration. It is required to configure the model you intend to use with a YAML config file. For example, for `llama.cpp` workloads a configuration file might look like this (where `gpu_layers` is the number of layers to offload to the GPU):
115
+
#### Requirements
80
116
81
-
```yaml
82
-
name: my-model-name
83
-
# Default model parameters
84
-
parameters:
85
-
# Relative to the models path
86
-
model: llama.cpp-model.ggmlv3.q5_K_M.bin
117
+
Requirement: [Intel oneAPI Base Toolkit](https://software.intel.com/content/www/us/en/develop/tools/oneapi/base-toolkit/download.html)
87
118
88
-
context_size: 1024
89
-
threads: 1
119
+
To use SYCL, use the images with the `sycl-f16` or `sycl-f32` tag, for example `{{< version >}}-sycl-f32-core`, `{{< version >}}-sycl-f16-ffmpeg-core`, ...
90
120
91
-
f16: true # enable with GPU acceleration
92
-
gpu_layers: 22# GPU Layers (only used when built with cublas)
121
+
The image list is on [quay](https://quay.io/repository/go-skynet/local-ai?tab=tags).
93
122
94
-
```
123
+
### Notes
95
124
96
-
For diffusers instead, it might look like this instead:
125
+
In addition to the commands to run LocalAI normally, you need to specify `--device /dev/dri` to docker, for example:
126
+
127
+
```bash
128
+
docker run --rm -ti --device /dev/dri -p 8080:8080 -e DEBUG=true -e MODELS_PATH=/models -e THREADS=1 -v $PWD/models:/models quay.io/go-skynet/local-ai:{{< version >}}-sycl-f16-ffmpeg-core
|`GO_TAGS`|`tts stablediffusion`| Go tags. Available: `stablediffusion`, `tts`, `tinydream`|
88
88
|`CLBLAST_DIR`|| Specify a CLBlast directory |
89
89
|`CUDA_LIBPATH`|| Specify a CUDA library path |
@@ -225,6 +225,17 @@ make BUILD_TYPE=clblas build
225
225
226
226
To specify a clblast dir set: `CLBLAST_DIR`
227
227
228
+
#### Intel GPU acceleration
229
+
230
+
Intel GPU acceleration is supported via SYCL.
231
+
232
+
Requirements: [Intel oneAPI Base Toolkit](https://www.intel.com/content/www/us/en/developer/tools/oneapi/base-toolkit-download.html) (see also [llama.cpp setup installations instructions](https://github.com/ggerganov/llama.cpp/blob/d71ac90985854b0905e1abba778e407e17f9f887/README-sycl.md?plain=1#L56))
0 commit comments