Skip to content

Commit fd13312

Browse files
committed
docs: add sycl
1 parent 6bb8e36 commit fd13312

File tree

2 files changed

+61
-30
lines changed

2 files changed

+61
-30
lines changed

docs/content/docs/features/GPU-acceleration.md

Lines changed: 49 additions & 29 deletions
Original file line numberDiff line numberDiff line change
@@ -15,9 +15,45 @@ This section contains instruction on how to use LocalAI with GPU acceleration.
1515
For accelleration for AMD or Metal HW there are no specific container images, see the [build]({{%relref "docs/getting-started/build#Acceleration" %}})
1616
{{% /alert %}}
1717

18-
### CUDA(NVIDIA) acceleration
1918

20-
#### Requirements
19+
## Model configuration
20+
21+
Depending on the model architecture and backend used, there might be different ways to enable GPU acceleration. It is required to configure the model you intend to use with a YAML config file. For example, for `llama.cpp` workloads a configuration file might look like this (where `gpu_layers` is the number of layers to offload to the GPU):
22+
23+
```yaml
24+
name: my-model-name
25+
# Default model parameters
26+
parameters:
27+
# Relative to the models path
28+
model: llama.cpp-model.ggmlv3.q5_K_M.bin
29+
30+
context_size: 1024
31+
threads: 1
32+
33+
f16: true # enable with GPU acceleration
34+
gpu_layers: 22 # GPU Layers (only used when built with cublas)
35+
36+
```
37+
38+
For diffusers instead, it might look like this instead:
39+
40+
```yaml
41+
name: stablediffusion
42+
parameters:
43+
model: toonyou_beta6.safetensors
44+
backend: diffusers
45+
step: 30
46+
f16: true
47+
diffusers:
48+
pipeline_type: StableDiffusionPipeline
49+
cuda: true
50+
enable_parameters: "negative_prompt,num_inference_steps,clip_skip"
51+
scheduler_type: "k_dpmpp_sde"
52+
```
53+
54+
## CUDA(NVIDIA) acceleration
55+
56+
### Requirements
2157
2258
Requirement: nvidia-container-toolkit (installation instructions [1](https://www.server-world.info/en/note?os=Ubuntu_22.04&p=nvidia&f=2) [2](https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/install-guide.html))
2359
@@ -74,37 +110,21 @@ llama_model_load_internal: total VRAM used: 1598 MB
74110
llama_init_from_file: kv self size = 512.00 MB
75111
```
76112

77-
#### Model configuration
113+
## Intel acceleration (sycl)
78114

79-
Depending on the model architecture and backend used, there might be different ways to enable GPU acceleration. It is required to configure the model you intend to use with a YAML config file. For example, for `llama.cpp` workloads a configuration file might look like this (where `gpu_layers` is the number of layers to offload to the GPU):
115+
#### Requirements
80116

81-
```yaml
82-
name: my-model-name
83-
# Default model parameters
84-
parameters:
85-
# Relative to the models path
86-
model: llama.cpp-model.ggmlv3.q5_K_M.bin
117+
Requirement: [Intel oneAPI Base Toolkit](https://software.intel.com/content/www/us/en/develop/tools/oneapi/base-toolkit/download.html)
87118

88-
context_size: 1024
89-
threads: 1
119+
To use SYCL, use the images with the `sycl-f16` or `sycl-f32` tag, for example `{{< version >}}-sycl-f32-core`, `{{< version >}}-sycl-f16-ffmpeg-core`, ...
90120

91-
f16: true # enable with GPU acceleration
92-
gpu_layers: 22 # GPU Layers (only used when built with cublas)
121+
The image list is on [quay](https://quay.io/repository/go-skynet/local-ai?tab=tags).
93122

94-
```
123+
### Notes
95124

96-
For diffusers instead, it might look like this instead:
125+
In addition to the commands to run LocalAI normally, you need to specify `--device /dev/dri` to docker, for example:
126+
127+
```bash
128+
docker run --rm -ti --device /dev/dri -p 8080:8080 -e DEBUG=true -e MODELS_PATH=/models -e THREADS=1 -v $PWD/models:/models quay.io/go-skynet/local-ai:{{< version >}}-sycl-f16-ffmpeg-core
129+
```
97130

98-
```yaml
99-
name: stablediffusion
100-
parameters:
101-
model: toonyou_beta6.safetensors
102-
backend: diffusers
103-
step: 30
104-
f16: true
105-
diffusers:
106-
pipeline_type: StableDiffusionPipeline
107-
cuda: true
108-
enable_parameters: "negative_prompt,num_inference_steps,clip_skip"
109-
scheduler_type: "k_dpmpp_sde"
110-
```

docs/content/docs/getting-started/build.md

Lines changed: 12 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -83,7 +83,7 @@ Here is the list of the variables available that can be used to customize the bu
8383

8484
| Variable | Default | Description |
8585
| ---------------------| ------- | ----------- |
86-
| `BUILD_TYPE` | None | Build type. Available: `cublas`, `openblas`, `clblas`, `metal`,`hipblas` |
86+
| `BUILD_TYPE` | None | Build type. Available: `cublas`, `openblas`, `clblas`, `metal`,`hipblas`, `sycl_f16`, `sycl_f32` |
8787
| `GO_TAGS` | `tts stablediffusion` | Go tags. Available: `stablediffusion`, `tts`, `tinydream` |
8888
| `CLBLAST_DIR` | | Specify a CLBlast directory |
8989
| `CUDA_LIBPATH` | | Specify a CUDA library path |
@@ -225,6 +225,17 @@ make BUILD_TYPE=clblas build
225225

226226
To specify a clblast dir set: `CLBLAST_DIR`
227227

228+
#### Intel GPU acceleration
229+
230+
Intel GPU acceleration is supported via SYCL.
231+
232+
Requirements: [Intel oneAPI Base Toolkit](https://www.intel.com/content/www/us/en/developer/tools/oneapi/base-toolkit-download.html) (see also [llama.cpp setup installations instructions](https://github.com/ggerganov/llama.cpp/blob/d71ac90985854b0905e1abba778e407e17f9f887/README-sycl.md?plain=1#L56))
233+
234+
```
235+
make BUILD_TYPE=sycl_f16 build # for float16
236+
make BUILD_TYPE=sycl_f32 build # for float32
237+
```
238+
228239
#### Metal (Apple Silicon)
229240

230241
```

0 commit comments

Comments
 (0)