Skip to content

chore: update Image generation docs and examples #4841

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Feb 17, 2025
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
123 changes: 49 additions & 74 deletions docs/content/docs/features/image-generation.md
Original file line number Diff line number Diff line change
Expand Up @@ -38,98 +38,40 @@ curl http://localhost:8080/v1/images/generations -H "Content-Type: application/j

## Backends

### stablediffusion-cpp
### stablediffusion-ggml

| mode=0 | mode=1 (winograd/sgemm) |
|------------------------------------------------------------------------------------------------------------------------|------------------------------------------------------------------------------------------------------------------------|
| ![test](https://github.com/go-skynet/LocalAI/assets/2420543/7145bdee-4134-45bb-84d4-f11cb08a5638) | ![b643343452981](https://github.com/go-skynet/LocalAI/assets/2420543/abf14de1-4f50-4715-aaa4-411d703a942a) |
| ![b6441997879](https://github.com/go-skynet/LocalAI/assets/2420543/d50af51c-51b7-4f39-b6c2-bf04c403894c) | ![winograd2](https://github.com/go-skynet/LocalAI/assets/2420543/1935a69a-ecce-4afc-a099-1ac28cb649b3) |
| ![winograd](https://github.com/go-skynet/LocalAI/assets/2420543/1979a8c4-a70d-4602-95ed-642f382f6c6a) | ![winograd3](https://github.com/go-skynet/LocalAI/assets/2420543/e6d184d4-5002-408f-b564-163986e1bdfb) |
This backend is based on [stable-diffusion.cpp](https://github.com/leejet/stable-diffusion.cpp). Every model supported by that backend is suppoerted indeed with LocalAI.

Note: image generator supports images up to 512x512. You can use other tools however to upscale the image, for instance: https://github.com/upscayl/upscayl.

#### Setup

Note: In order to use the `images/generation` endpoint with the `stablediffusion` C++ backend, you need to build LocalAI with `GO_TAGS=stablediffusion`. If you are using the container images, it is already enabled.

{{< tabs >}}
{{% tab name="Prepare the model in runtime" %}}

While the API is running, you can install the model by using the `/models/apply` endpoint and point it to the `stablediffusion` model in the [models-gallery](https://github.com/go-skynet/model-gallery#image-generation-stable-diffusion):

```bash
curl http://localhost:8080/models/apply -H "Content-Type: application/json" -d '{
"url": "github:go-skynet/model-gallery/stablediffusion.yaml"
}'
```

{{% /tab %}}
{{% tab name="Automatically prepare the model before start" %}}

You can set the `PRELOAD_MODELS` environment variable:
There are already several models in the gallery that are available to install and get up and running with this backend, you can for example run flux by searching it in the Model gallery (`flux.1-dev-ggml`) or start LocalAI with `run`:

```bash
PRELOAD_MODELS=[{"url": "github:go-skynet/model-gallery/stablediffusion.yaml"}]
local-ai run flux.1-dev-ggml
```

or as arg:

```bash
local-ai --preload-models '[{"url": "github:go-skynet/model-gallery/stablediffusion.yaml"}]'
```

or in a YAML file:

```bash
local-ai --preload-models-config "/path/to/yaml"
```

YAML:

```yaml
- url: github:go-skynet/model-gallery/stablediffusion.yaml
```

{{% /tab %}}
{{% tab name="Install manually" %}}
To use a custom model, you can follow these steps:

1. Create a model file `stablediffusion.yaml` in the models folder:

```yaml
name: stablediffusion
backend: stablediffusion
backend: stablediffusion-ggml
parameters:
model: stablediffusion_assets
```

2. Create a `stablediffusion_assets` directory inside your `models` directory
3. Download the ncnn assets from https://github.com/EdVince/Stable-Diffusion-NCNN#out-of-box and place them in `stablediffusion_assets`.

The models directory should look like the following:

```bash
models
├── stablediffusion_assets
│   ├── AutoencoderKL-256-256-fp16-opt.param
│   ├── AutoencoderKL-512-512-fp16-opt.param
│   ├── AutoencoderKL-base-fp16.param
│   ├── AutoencoderKL-encoder-512-512-fp16.bin
│   ├── AutoencoderKL-fp16.bin
│   ├── FrozenCLIPEmbedder-fp16.bin
│   ├── FrozenCLIPEmbedder-fp16.param
│   ├── log_sigmas.bin
│   ├── tmp-AutoencoderKL-encoder-256-256-fp16.param
│   ├── UNetModel-256-256-MHA-fp16-opt.param
│   ├── UNetModel-512-512-MHA-fp16-opt.param
│   ├── UNetModel-base-MHA-fp16.param
│   ├── UNetModel-MHA-fp16.bin
│   └── vocab.txt
└── stablediffusion.yaml
model: gguf_model.gguf
step: 25
cfg_scale: 4.5
options:
- "clip_l_path:clip_l.safetensors"
- "clip_g_path:clip_g.safetensors"
- "t5xxl_path:t5xxl-Q5_0.gguf"
- "sampler:euler"
```

{{% /tab %}}
2. Download the required assets to the `models` repository
3. Start LocalAI

{{< /tabs >}}

### Diffusers

Expand Down Expand Up @@ -213,6 +155,9 @@ The following parameters are available in the configuration file:
| `cfg_scale` | Configuration scale | `8` |
| `clip_skip` | Clip skip | None |
| `pipeline_type` | Pipeline type | `AutoPipelineForText2Image` |
| `lora_adapters` | A list of lora adapters (file names relative to model directory) to apply | None |
| `lora_scales` | A list of lora scales (floats) to apply | None |


There are available several types of schedulers:

Expand Down Expand Up @@ -246,6 +191,36 @@ Pipelines types available:
| `StableDiffusionDepth2ImgPipeline` | Stable diffusion depth to image pipeline |
| `DiffusionPipeline` | Diffusion pipeline |
| `StableDiffusionXLPipeline` | Stable diffusion XL pipeline |
| `StableVideoDiffusionPipeline` | Stable video diffusion pipeline |
| `AutoPipelineForText2Image` | Automatic detection pipeline for text to image |
| `VideoDiffusionPipeline` | Video diffusion pipeline |
| `StableDiffusion3Pipeline` | Stable diffusion 3 pipeline |
| `FluxPipeline` | Flux pipeline |
| `FluxTransformer2DModel` | Flux transformer 2D model |
| `SanaPipeline` | Sana pipeline |

##### Advanced: Additional parameters

Additional arbitrarly parameters can be specified in the option field in key/value separated by `:`:

```yaml
name: animagine-xl
# ...
options:
- "cfg_scale:6"
```

**Note**: There is no complete parameter list. Any parameter can be passed arbitrarly and is passed to the model directly as argument to the pipeline. Different pipelines/implementations support different parameters.

The example above, will result in the following python code when generating images:

```python
pipe(
prompt="A cute baby sea otter", # Options passed via API
size="256x256", # Options passed via API
cfg_scale=6 # Additional parameter passed via configuration file
)
```

#### Usage

Expand Down
9 changes: 1 addition & 8 deletions docs/content/docs/reference/compatibility-table.md
Original file line number Diff line number Diff line change
Expand Up @@ -17,27 +17,20 @@ LocalAI will attempt to automatically load models which are not explicitly confi
| Backend and Bindings | Compatible models | Completion/Chat endpoint | Capability | Embeddings support | Token stream support | Acceleration |
|----------------------------------------------------------------------------------|-----------------------|--------------------------|---------------------------|-----------------------------------|----------------------|--------------|
| [llama.cpp]({{%relref "docs/features/text-generation#llama.cpp" %}}) | LLama, Mamba, RWKV, Falcon, Starcoder, GPT-2, [and many others](https://github.com/ggerganov/llama.cpp?tab=readme-ov-file#description) | yes | GPT and Functions | yes | yes | CUDA, openCL, cuBLAS, Metal |
| [llama.cpp's ggml model (backward compatibility with old format, before GGUF)](https://github.com/ggerganov/llama.cpp) ([binding](https://github.com/go-skynet/go-llama.cpp)) | LLama, GPT-2, [and many others](https://github.com/ggerganov/llama.cpp?tab=readme-ov-file#description) | yes | GPT and Functions | yes | yes | CUDA, openCL, cuBLAS, Metal |
| [whisper](https://github.com/ggerganov/whisper.cpp) | whisper | no | Audio | no | no | N/A |
| [stablediffusion](https://github.com/EdVince/Stable-Diffusion-NCNN) ([binding](https://github.com/mudler/go-stable-diffusion)) | stablediffusion | no | Image | no | no | N/A |
| [langchain-huggingface](https://github.com/tmc/langchaingo) | Any text generators available on HuggingFace through API | yes | GPT | no | no | N/A |
| [piper](https://github.com/rhasspy/piper) ([binding](https://github.com/mudler/go-piper)) | Any piper onnx model | no | Text to voice | no | no | N/A |
| [sentencetransformers](https://github.com/UKPLab/sentence-transformers) | BERT | no | Embeddings only | yes | no | N/A |
| `bark` | bark | no | Audio generation | no | no | yes |
| `autogptq` | GPTQ | yes | GPT | yes | no | N/A |
| `exllama` | GPTQ | yes | GPT only | no | no | N/A |
| `diffusers` | SD,... | no | Image generation | no | no | N/A |
| `vall-e-x` | Vall-E | no | Audio generation and Voice cloning | no | no | CPU/CUDA |
| `vllm` | Various GPTs and quantization formats | yes | GPT | no | no | CPU/CUDA |
| `mamba` | Mamba models architecture | yes | GPT | no | no | CPU/CUDA |
| `exllama2` | GPTQ | yes | GPT only | no | no | N/A |
| `transformers-musicgen` | | no | Audio generation | no | no | N/A |
| stablediffusion | no | Image | no | no | N/A |
| `coqui` | Coqui | no | Audio generation and Voice cloning | no | no | CPU/CUDA |
| `openvoice` | Open voice | no | Audio generation and Voice cloning | no | no | CPU/CUDA |
| `parler-tts` | Open voice | no | Audio generation and Voice cloning | no | no | CPU/CUDA |
| [rerankers](https://github.com/AnswerDotAI/rerankers) | Reranking API | no | Reranking | no | no | CPU/CUDA |
| `transformers` | Various GPTs and quantization formats | yes | GPT, embeddings | yes | yes* | CPU/CUDA/XPU |
| `transformers` | Various GPTs and quantization formats | yes | GPT, embeddings, Audio generation | yes | yes* | CPU/CUDA/XPU |
| [bark-cpp](https://github.com/PABannier/bark.cpp) | bark | no | Audio-Only | no | no | yes |
| [stablediffusion-cpp](https://github.com/leejet/stable-diffusion.cpp) | stablediffusion-1, stablediffusion-2, stablediffusion-3, flux, PhotoMaker | no | Image | no | no | N/A |
| [silero-vad](https://github.com/snakers4/silero-vad) with [Golang bindings](https://github.com/streamer45/silero-vad-go) | Silero VAD | no | Voice Activity Detection | no | no | CPU |
Expand Down
10 changes: 0 additions & 10 deletions gallery/index.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -1551,7 +1551,7 @@
sha256: edc50f6c243e6bd6912599661a15e030de03d2be53409663ac27d3ca48306ee4
uri: huggingface://mudler/LocalAI-functioncall-llama3.2-3b-v0.5-Q4_K_M-GGUF/localai-functioncall-llama3.2-3b-v0.5-q4_k_m.gguf
- &qwen25
name: "qwen2.5-14b-instruct" ## Qwen2.5

Check warning on line 1554 in gallery/index.yaml

View workflow job for this annotation

GitHub Actions / Yamllint

1554:32 [comments] too few spaces before comment
icon: https://avatars.githubusercontent.com/u/141221163
url: "github:mudler/LocalAI/gallery/chatml.yaml@master"
license: apache-2.0
Expand Down Expand Up @@ -3512,7 +3512,7 @@
sha256: 0fec82625f74a9a340837de7af287b1d9042e5aeb70cda2621426db99958b0af
uri: huggingface://bartowski/Chuluun-Qwen2.5-72B-v0.08-GGUF/Chuluun-Qwen2.5-72B-v0.08-Q4_K_M.gguf
- &smollm
url: "github:mudler/LocalAI/gallery/chatml.yaml@master" ## SmolLM

Check warning on line 3515 in gallery/index.yaml

View workflow job for this annotation

GitHub Actions / Yamllint

3515:59 [comments] too few spaces before comment
name: "smollm-1.7b-instruct"
icon: https://huggingface.co/datasets/HuggingFaceTB/images/resolve/main/banner_smol.png
tags:
Expand Down Expand Up @@ -3831,7 +3831,7 @@
sha256: 7f163e72ead7522bd6774555a932e0a11f212d17cdc9442e2cfd1b017009f832
uri: huggingface://bartowski/ozone-ai_0x-lite-GGUF/ozone-ai_0x-lite-Q4_K_M.gguf
- &llama31
url: "github:mudler/LocalAI/gallery/llama3.1-instruct.yaml@master" ## LLama3.1

Check warning on line 3834 in gallery/index.yaml

View workflow job for this annotation

GitHub Actions / Yamllint

3834:70 [comments] too few spaces before comment
icon: https://avatars.githubusercontent.com/u/153379578
name: "meta-llama-3.1-8b-instruct"
license: llama3.1
Expand Down Expand Up @@ -5812,7 +5812,7 @@
sha256: 6aea4e13f03347e03d6989c736a7ccab82582115eb072cacfeb7f0b645a8bec0
uri: huggingface://bartowski/DavidBrowne17_LlamaThink-8B-instruct-GGUF/DavidBrowne17_LlamaThink-8B-instruct-Q4_K_M.gguf
- &deepseek
url: "github:mudler/LocalAI/gallery/deepseek.yaml@master" ## Deepseek

Check warning on line 5815 in gallery/index.yaml

View workflow job for this annotation

GitHub Actions / Yamllint

5815:61 [comments] too few spaces before comment
name: "deepseek-coder-v2-lite-instruct"
icon: "https://avatars.githubusercontent.com/u/148330874"
license: deepseek
Expand Down Expand Up @@ -5877,7 +5877,7 @@
sha256: a47782c55ef2b39b19644213720a599d9849511a73c9ebb0c1de749383c0a0f8
uri: huggingface://RichardErkhov/ContextualAI_-_archangel_sft_pythia2-8b-gguf/archangel_sft_pythia2-8b.Q4_K_M.gguf
- &deepseek-r1
url: "github:mudler/LocalAI/gallery/deepseek-r1.yaml@master" ## Start DeepSeek-R1

Check warning on line 5880 in gallery/index.yaml

View workflow job for this annotation

GitHub Actions / Yamllint

5880:64 [comments] too few spaces before comment
name: "deepseek-r1-distill-qwen-1.5b"
icon: "https://avatars.githubusercontent.com/u/148330874"
urls:
Expand Down Expand Up @@ -6105,7 +6105,7 @@
sha256: bf51b412360a84792ae9145e2ca322379234c118dbff498ff08e589253b67ded
uri: huggingface://bartowski/agentica-org_DeepScaleR-1.5B-Preview-GGUF/agentica-org_DeepScaleR-1.5B-Preview-Q4_K_M.gguf
- &qwen2
url: "github:mudler/LocalAI/gallery/chatml.yaml@master" ## Start QWEN2

Check warning on line 6108 in gallery/index.yaml

View workflow job for this annotation

GitHub Actions / Yamllint

6108:59 [comments] too few spaces before comment
name: "qwen2-7b-instruct"
icon: https://avatars.githubusercontent.com/u/141221163
license: apache-2.0
Expand Down Expand Up @@ -6506,7 +6506,7 @@
sha256: dbffc989d12d42ef8e4a2994e102d7ec7a02c49ec08ea2e35426372ad07b4cd8
uri: huggingface://bartowski/TAID-LLM-1.5B-GGUF/TAID-LLM-1.5B-Q4_K_M.gguf
- &mistral03
url: "github:mudler/LocalAI/gallery/mistral-0.3.yaml@master" ## START Mistral

Check warning on line 6509 in gallery/index.yaml

View workflow job for this annotation

GitHub Actions / Yamllint

6509:64 [comments] too few spaces before comment
name: "mistral-7b-instruct-v0.3"
icon: https://cdn-avatars.huggingface.co/v1/production/uploads/62dac1c7a8ead43d20e3e17a/wrLf5yaGC6ng4XME70w6Z.png
license: apache-2.0
Expand Down Expand Up @@ -7241,7 +7241,7 @@
sha256: 899091671ae483fc7c132512221ee6600984c936cd8c261becee696d00080701
uri: huggingface://bartowski/PygmalionAI_Eleusis-12B-GGUF/PygmalionAI_Eleusis-12B-Q4_K_M.gguf
- &mudler
url: "github:mudler/LocalAI/gallery/mudler.yaml@master" ### START mudler's LocalAI specific-models

Check warning on line 7244 in gallery/index.yaml

View workflow job for this annotation

GitHub Actions / Yamllint

7244:59 [comments] too few spaces before comment
name: "LocalAI-llama3-8b-function-call-v0.2"
icon: "https://cdn-uploads.huggingface.co/production/uploads/647374aa7ff32a81ac6d35d4/us5JKi9z046p8K-cn_M0w.webp"
license: llama3
Expand Down Expand Up @@ -7286,7 +7286,7 @@
sha256: 579cbb229f9c11d0330759ff4733102d2491615a4c61289e26c09d1b3a583fec
uri: huggingface://mudler/Mirai-Nova-Llama3-LocalAI-8B-v0.1-GGUF/Mirai-Nova-Llama3-LocalAI-8B-v0.1-q4_k_m.bin
- &parler-tts
url: "github:mudler/LocalAI/gallery/parler-tts.yaml@master" ### START parler-tts

Check warning on line 7289 in gallery/index.yaml

View workflow job for this annotation

GitHub Actions / Yamllint

7289:63 [comments] too few spaces before comment
name: parler-tts-mini-v0.1
overrides:
parameters:
Expand All @@ -7303,7 +7303,7 @@
- text-to-speech
- python
- &rerankers
url: "github:mudler/LocalAI/gallery/rerankers.yaml@master" ### START rerankers

Check warning on line 7306 in gallery/index.yaml

View workflow job for this annotation

GitHub Actions / Yamllint

7306:62 [comments] too few spaces before comment
name: cross-encoder
parameters:
model: cross-encoder
Expand Down Expand Up @@ -12340,16 +12340,6 @@
embeddings: true
parameters:
model: llama-3.2-1b-instruct-q4_k_m.gguf
## Stable Diffusion
- url: github:mudler/LocalAI/gallery/stablediffusion.yaml@master
license: "BSD-3"
urls:
- https://github.com/EdVince/Stable-Diffusion-NCNN
- https://github.com/EdVince/Stable-Diffusion-NCNN/blob/main/LICENSE
description: |
Stable Diffusion in NCNN with c++, supported txt2img and img2img
name: stablediffusion-cpp
icon: https://avatars.githubusercontent.com/u/100950301
- &piper
url: github:mudler/LocalAI/gallery/piper.yaml@master ## Piper TTS
name: voice-en-us-kathleen-low
Expand Down
49 changes: 0 additions & 49 deletions gallery/stablediffusion.yaml

This file was deleted.

Loading