mudler · mudler · Feb 17, 2025 · Feb 17, 2025
diff --git a/docs/content/docs/features/image-generation.md b/docs/content/docs/features/image-generation.md
@@ -38,98 +38,40 @@ curl http://localhost:8080/v1/images/generations -H "Content-Type: application/j
 
 ## Backends
 
-### stablediffusion-cpp
+### stablediffusion-ggml
 
-| mode=0                                                                                                                | mode=1 (winograd/sgemm)                                                                                                                |
-|------------------------------------------------------------------------------------------------------------------------|------------------------------------------------------------------------------------------------------------------------|
-| ![test](https://github.com/go-skynet/LocalAI/assets/2420543/7145bdee-4134-45bb-84d4-f11cb08a5638)                      | ![b643343452981](https://github.com/go-skynet/LocalAI/assets/2420543/abf14de1-4f50-4715-aaa4-411d703a942a)          |
-| ![b6441997879](https://github.com/go-skynet/LocalAI/assets/2420543/d50af51c-51b7-4f39-b6c2-bf04c403894c)              | ![winograd2](https://github.com/go-skynet/LocalAI/assets/2420543/1935a69a-ecce-4afc-a099-1ac28cb649b3)                |
-| ![winograd](https://github.com/go-skynet/LocalAI/assets/2420543/1979a8c4-a70d-4602-95ed-642f382f6c6a)                | ![winograd3](https://github.com/go-skynet/LocalAI/assets/2420543/e6d184d4-5002-408f-b564-163986e1bdfb)                |
+This backend is based on [stable-diffusion.cpp](https://github.com/leejet/stable-diffusion.cpp). Every model supported by that backend is suppoerted indeed with LocalAI.
 
-Note: image generator supports images up to 512x512. You can use other tools however to upscale the image, for instance: https://github.com/upscayl/upscayl.
 
 #### Setup
 
-Note: In order to use the `images/generation` endpoint with the `stablediffusion` C++ backend, you need to build LocalAI with `GO_TAGS=stablediffusion`. If you are using the container images, it is already enabled.
-
-{{< tabs >}}
-{{% tab name="Prepare the model in runtime" %}}
-
-While the API is running, you can install the model by using the `/models/apply` endpoint and point it to the `stablediffusion` model in the [models-gallery](https://github.com/go-skynet/model-gallery#image-generation-stable-diffusion):
-
-```bash
-curl http://localhost:8080/models/apply -H "Content-Type: application/json" -d '{
-  "url": "github:go-skynet/model-gallery/stablediffusion.yaml"
-}'
-```
-
-{{% /tab %}}
-{{% tab name="Automatically prepare the model before start" %}}
-
-You can set the `PRELOAD_MODELS` environment variable:
+There are already several models in the gallery that are available to install and get up and running with this backend, you can for example run flux by searching it in the Model gallery (`flux.1-dev-ggml`) or start LocalAI with `run`:
 
 ```bash
-PRELOAD_MODELS=[{"url": "github:go-skynet/model-gallery/stablediffusion.yaml"}]
+local-ai run flux.1-dev-ggml
 ```
 
-or as arg:
-
-```bash
-local-ai --preload-models '[{"url": "github:go-skynet/model-gallery/stablediffusion.yaml"}]'
-```
-
-or in a YAML file:
-
-```bash
-local-ai --preload-models-config "/path/to/yaml"
-```
-
-YAML:
-
-```yaml
-- url: github:go-skynet/model-gallery/stablediffusion.yaml
-```
-
-{{% /tab %}}
-{{% tab name="Install manually" %}}
+To use a custom model, you can follow these steps:
 
 1. Create a model file `stablediffusion.yaml` in the models folder:
 
 ```yaml
 name: stablediffusion
-backend: stablediffusion
+backend: stablediffusion-ggml
 parameters:
-  model: stablediffusion_assets
-```
-
-2. Create a `stablediffusion_assets` directory inside your `models` directory
-3. Download the ncnn assets from https://github.com/EdVince/Stable-Diffusion-NCNN#out-of-box and place them in `stablediffusion_assets`.
-
-The models directory should look like the following:
-
-```bash
-models
-├── stablediffusion_assets
-│   ├── AutoencoderKL-256-256-fp16-opt.param
-│   ├── AutoencoderKL-512-512-fp16-opt.param
-│   ├── AutoencoderKL-base-fp16.param
-│   ├── AutoencoderKL-encoder-512-512-fp16.bin
-│   ├── AutoencoderKL-fp16.bin
-│   ├── FrozenCLIPEmbedder-fp16.bin
-│   ├── FrozenCLIPEmbedder-fp16.param
-│   ├── log_sigmas.bin
-│   ├── tmp-AutoencoderKL-encoder-256-256-fp16.param
-│   ├── UNetModel-256-256-MHA-fp16-opt.param
-│   ├── UNetModel-512-512-MHA-fp16-opt.param
-│   ├── UNetModel-base-MHA-fp16.param
-│   ├── UNetModel-MHA-fp16.bin
-│   └── vocab.txt
-└── stablediffusion.yaml
+  model: gguf_model.gguf
+step: 25
+cfg_scale: 4.5
+options:
+- "clip_l_path:clip_l.safetensors"
+- "clip_g_path:clip_g.safetensors"
+- "t5xxl_path:t5xxl-Q5_0.gguf"
+- "sampler:euler"
 ```
 
-{{% /tab %}}
+2. Download the required assets to the `models` repository
+3. Start LocalAI
 
-{{< /tabs >}}
 
 ### Diffusers
 
@@ -213,6 +155,9 @@ The following parameters are available in the configuration file:
 | `cfg_scale` | Configuration scale | `8` |
 | `clip_skip` | Clip skip | None |
 | `pipeline_type` | Pipeline type | `AutoPipelineForText2Image` |
+| `lora_adapters` | A list of lora adapters (file names relative to model directory) to apply | None |
+| `lora_scales` | A list of lora scales (floats) to apply | None |
+
 
 There are available several types of schedulers:
 
@@ -246,6 +191,36 @@ Pipelines types available:
 | `StableDiffusionDepth2ImgPipeline` | Stable diffusion depth to image pipeline |
 | `DiffusionPipeline` | Diffusion pipeline |
 | `StableDiffusionXLPipeline` | Stable diffusion XL pipeline |
+| `StableVideoDiffusionPipeline` | Stable video diffusion pipeline |
+| `AutoPipelineForText2Image` | Automatic detection pipeline for text to image |
+| `VideoDiffusionPipeline` | Video diffusion pipeline |
+| `StableDiffusion3Pipeline` | Stable diffusion 3 pipeline |
+| `FluxPipeline` | Flux pipeline |
+| `FluxTransformer2DModel` | Flux transformer 2D model |
+| `SanaPipeline` | Sana pipeline |
+
+##### Advanced: Additional parameters
+
+Additional arbitrarly parameters can be specified in the option field in key/value separated by `:`:
+
+```yaml
+name: animagine-xl
+# ...
+options:
+- "cfg_scale:6"
+```
+
+**Note**: There is no complete parameter list. Any parameter can be passed arbitrarly and is passed to the model directly as argument to the pipeline. Different pipelines/implementations support different parameters.
+
+The example above, will result in the following python code when generating images:
+
+```python
+pipe(
+    prompt="A cute baby sea otter", # Options passed via API
+    size="256x256", # Options passed via API
+    cfg_scale=6 # Additional parameter passed via configuration file
+)
+```
 
 #### Usage
 

diff --git a/docs/content/docs/reference/compatibility-table.md b/docs/content/docs/reference/compatibility-table.md
@@ -17,27 +17,20 @@ LocalAI will attempt to automatically load models which are not explicitly confi
 | Backend and Bindings                                                             | Compatible models     | Completion/Chat endpoint | Capability | Embeddings support                | Token stream support | Acceleration |
 |----------------------------------------------------------------------------------|-----------------------|--------------------------|---------------------------|-----------------------------------|----------------------|--------------|
 | [llama.cpp]({{%relref "docs/features/text-generation#llama.cpp" %}})        | LLama, Mamba, RWKV, Falcon, Starcoder, GPT-2, [and many others](https://github.com/ggerganov/llama.cpp?tab=readme-ov-file#description) | yes                      | GPT and Functions                        | yes | yes                  | CUDA, openCL, cuBLAS, Metal |
-| [llama.cpp's ggml model (backward compatibility with old format, before GGUF)](https://github.com/ggerganov/llama.cpp) ([binding](https://github.com/go-skynet/go-llama.cpp))  | LLama, GPT-2, [and many others](https://github.com/ggerganov/llama.cpp?tab=readme-ov-file#description) | yes                      | GPT and Functions                        | yes | yes                  | CUDA, openCL, cuBLAS, Metal |
 | [whisper](https://github.com/ggerganov/whisper.cpp)         | whisper               | no                       | Audio                 | no                                | no                   | N/A |
-| [stablediffusion](https://github.com/EdVince/Stable-Diffusion-NCNN) ([binding](https://github.com/mudler/go-stable-diffusion))        | stablediffusion               | no                       | Image                 | no                                | no                   | N/A |
 | [langchain-huggingface](https://github.com/tmc/langchaingo)                                                                    | Any text generators available on HuggingFace through API | yes                      | GPT                        | no                                | no                   | N/A |
 | [piper](https://github.com/rhasspy/piper) ([binding](https://github.com/mudler/go-piper))                                                                     | Any piper onnx model | no                      | Text to voice                        | no                                | no                   | N/A |
 | [sentencetransformers](https://github.com/UKPLab/sentence-transformers) | BERT                   | no                       | Embeddings only                  | yes                               | no                   | N/A |
 | `bark`  | bark                   | no                       | Audio generation                  | no                               | no                   | yes |
 | `autogptq` | GPTQ                   | yes                       | GPT                  | yes                               | no                   | N/A |
-| `exllama`  | GPTQ                   | yes                       | GPT only                  | no                               | no                   | N/A |
 | `diffusers`  | SD,...                   | no                       | Image generation    | no                               | no                   | N/A |
-| `vall-e-x` | Vall-E    | no                       | Audio generation and Voice cloning    | no                               | no                   | CPU/CUDA |
 | `vllm` | Various GPTs and quantization formats | yes                      | GPT             | no | no                  | CPU/CUDA |
-| `mamba` | Mamba models architecture | yes                      | GPT             | no | no                  | CPU/CUDA |
 | `exllama2`  | GPTQ                   | yes                       | GPT only                  | no                               | no                   | N/A |
 | `transformers-musicgen`  |                    | no                       | Audio generation                | no                               | no                   | N/A |
 | stablediffusion               | no                       | Image                 | no                                | no                   | N/A |
 | `coqui` | Coqui    | no                       | Audio generation and Voice cloning    | no                               | no                   | CPU/CUDA |
-| `openvoice` | Open voice    | no                       | Audio generation and Voice cloning    | no                               | no                   | CPU/CUDA |
-| `parler-tts` | Open voice    | no                       | Audio generation and Voice cloning    | no                               | no                   | CPU/CUDA |
 | [rerankers](https://github.com/AnswerDotAI/rerankers) | Reranking API    | no                       | Reranking   | no                               | no                   | CPU/CUDA |
-| `transformers` | Various GPTs and quantization formats | yes                      | GPT, embeddings            | yes | yes*                  | CPU/CUDA/XPU |
+| `transformers` | Various GPTs and quantization formats  | yes                      | GPT, embeddings, Audio generation            | yes | yes*                  | CPU/CUDA/XPU |
 | [bark-cpp](https://github.com/PABannier/bark.cpp)        | bark               | no                       | Audio-Only                 | no                                | no                   | yes |
 | [stablediffusion-cpp](https://github.com/leejet/stable-diffusion.cpp)         | stablediffusion-1, stablediffusion-2, stablediffusion-3, flux, PhotoMaker               | no                       | Image                 | no                                | no                   | N/A |
 | [silero-vad](https://github.com/snakers4/silero-vad) with [Golang bindings](https://github.com/streamer45/silero-vad-go) | Silero VAD    | no                       | Voice Activity Detection    | no                               | no                   | CPU |

diff --git a/gallery/index.yaml b/gallery/index.yaml
@@ -1551,7 +1551,7 @@
      sha256: edc50f6c243e6bd6912599661a15e030de03d2be53409663ac27d3ca48306ee4
      uri: huggingface://mudler/LocalAI-functioncall-llama3.2-3b-v0.5-Q4_K_M-GGUF/localai-functioncall-llama3.2-3b-v0.5-q4_k_m.gguf
 - &qwen25
  name: "qwen2.5-14b-instruct" ## Qwen2.5
  icon: https://avatars.githubusercontent.com/u/141221163
  url: "github:mudler/LocalAI/gallery/chatml.yaml@master"
  license: apache-2.0
@@ -3512,7 +3512,7 @@
      sha256: 0fec82625f74a9a340837de7af287b1d9042e5aeb70cda2621426db99958b0af
      uri: huggingface://bartowski/Chuluun-Qwen2.5-72B-v0.08-GGUF/Chuluun-Qwen2.5-72B-v0.08-Q4_K_M.gguf
 - &smollm
  url: "github:mudler/LocalAI/gallery/chatml.yaml@master" ## SmolLM
  name: "smollm-1.7b-instruct"
  icon: https://huggingface.co/datasets/HuggingFaceTB/images/resolve/main/banner_smol.png
  tags:
@@ -3831,7 +3831,7 @@
      sha256: 7f163e72ead7522bd6774555a932e0a11f212d17cdc9442e2cfd1b017009f832
      uri: huggingface://bartowski/ozone-ai_0x-lite-GGUF/ozone-ai_0x-lite-Q4_K_M.gguf
 - &llama31
  url: "github:mudler/LocalAI/gallery/llama3.1-instruct.yaml@master" ## LLama3.1
  icon: https://avatars.githubusercontent.com/u/153379578
  name: "meta-llama-3.1-8b-instruct"
  license: llama3.1
@@ -5812,7 +5812,7 @@
      sha256: 6aea4e13f03347e03d6989c736a7ccab82582115eb072cacfeb7f0b645a8bec0
      uri: huggingface://bartowski/DavidBrowne17_LlamaThink-8B-instruct-GGUF/DavidBrowne17_LlamaThink-8B-instruct-Q4_K_M.gguf
 - &deepseek
  url: "github:mudler/LocalAI/gallery/deepseek.yaml@master" ## Deepseek
  name: "deepseek-coder-v2-lite-instruct"
  icon: "https://avatars.githubusercontent.com/u/148330874"
  license: deepseek
@@ -5877,7 +5877,7 @@
      sha256: a47782c55ef2b39b19644213720a599d9849511a73c9ebb0c1de749383c0a0f8
      uri: huggingface://RichardErkhov/ContextualAI_-_archangel_sft_pythia2-8b-gguf/archangel_sft_pythia2-8b.Q4_K_M.gguf
 - &deepseek-r1
  url: "github:mudler/LocalAI/gallery/deepseek-r1.yaml@master" ## Start DeepSeek-R1
  name: "deepseek-r1-distill-qwen-1.5b"
  icon: "https://avatars.githubusercontent.com/u/148330874"
  urls:
@@ -6105,7 +6105,7 @@
      sha256: bf51b412360a84792ae9145e2ca322379234c118dbff498ff08e589253b67ded
      uri: huggingface://bartowski/agentica-org_DeepScaleR-1.5B-Preview-GGUF/agentica-org_DeepScaleR-1.5B-Preview-Q4_K_M.gguf
 - &qwen2
  url: "github:mudler/LocalAI/gallery/chatml.yaml@master" ## Start QWEN2
  name: "qwen2-7b-instruct"
  icon: https://avatars.githubusercontent.com/u/141221163
  license: apache-2.0
@@ -6506,7 +6506,7 @@
      sha256: dbffc989d12d42ef8e4a2994e102d7ec7a02c49ec08ea2e35426372ad07b4cd8
      uri: huggingface://bartowski/TAID-LLM-1.5B-GGUF/TAID-LLM-1.5B-Q4_K_M.gguf
 - &mistral03
  url: "github:mudler/LocalAI/gallery/mistral-0.3.yaml@master" ## START Mistral
  name: "mistral-7b-instruct-v0.3"
  icon: https://cdn-avatars.huggingface.co/v1/production/uploads/62dac1c7a8ead43d20e3e17a/wrLf5yaGC6ng4XME70w6Z.png
  license: apache-2.0
@@ -7241,7 +7241,7 @@
      sha256: 899091671ae483fc7c132512221ee6600984c936cd8c261becee696d00080701
      uri: huggingface://bartowski/PygmalionAI_Eleusis-12B-GGUF/PygmalionAI_Eleusis-12B-Q4_K_M.gguf
 - &mudler
  url: "github:mudler/LocalAI/gallery/mudler.yaml@master" ### START mudler's LocalAI specific-models
  name: "LocalAI-llama3-8b-function-call-v0.2"
  icon: "https://cdn-uploads.huggingface.co/production/uploads/647374aa7ff32a81ac6d35d4/us5JKi9z046p8K-cn_M0w.webp"
  license: llama3
@@ -7286,7 +7286,7 @@
      sha256: 579cbb229f9c11d0330759ff4733102d2491615a4c61289e26c09d1b3a583fec
      uri: huggingface://mudler/Mirai-Nova-Llama3-LocalAI-8B-v0.1-GGUF/Mirai-Nova-Llama3-LocalAI-8B-v0.1-q4_k_m.bin
 - &parler-tts
  url: "github:mudler/LocalAI/gallery/parler-tts.yaml@master" ### START parler-tts
  name: parler-tts-mini-v0.1
  overrides:
    parameters:
@@ -7303,7 +7303,7 @@
    - text-to-speech
    - python
 - &rerankers
  url: "github:mudler/LocalAI/gallery/rerankers.yaml@master" ### START rerankers
  name: cross-encoder
  parameters:
    model: cross-encoder
@@ -12340,16 +12340,6 @@
     embeddings: true
     parameters:
       model: llama-3.2-1b-instruct-q4_k_m.gguf
-## Stable Diffusion
-- url: github:mudler/LocalAI/gallery/stablediffusion.yaml@master
-  license: "BSD-3"
-  urls:
-    - https://github.com/EdVince/Stable-Diffusion-NCNN
-    - https://github.com/EdVince/Stable-Diffusion-NCNN/blob/main/LICENSE
-  description: |
-    Stable Diffusion in NCNN with c++, supported txt2img and img2img
-  name: stablediffusion-cpp
-  icon: https://avatars.githubusercontent.com/u/100950301
 - &piper
   url: github:mudler/LocalAI/gallery/piper.yaml@master ## Piper TTS
   name: voice-en-us-kathleen-low

diff --git a/gallery/stablediffusion.yaml b/gallery/stablediffusion.yaml