docker · ArthurFlag · Jun 26, 2025 · Jun 26, 2025
@@ -385,3 +385,7 @@ The Docker Model CLI currently lacks consistent support for specifying models by
 ## Share feedback
 
 Thanks for trying out Docker Model Runner. Give feedback or report any bugs you may find through the **Give feedback** link next to the **Enable Docker Model Runner** setting.
+
+## Related pages
+
+- [Use Model Runner with Compose](/manuals/compose/how-tos/model-runner.md)
@@ -40,15 +40,33 @@
       type: model
       options:
         model: ai/smollm2
+        context-size: 1024
+        runtime-flags: "--no-prefill-assistant"
 ```
 
-Notice the dedicated `provider` attribute in the `ai_runner` service.   
-This attribute specifies that the service is a model provider and lets you define options such as the name of the model to be used.
-
-There is also a `depends_on` attribute in the `chat` service.  
-This attribute specifies that the `chat` service depends on the `ai_runner` service.  
-This means that the `ai_runner` service will be started before the `chat` service to allow injection of model information to the `chat` service.
-
+Notice the following:
+
+In the `ai_runner` service:
+
+- `provider.type`: Specifies that the service is a `model` provider.
+- `provider.options`: Specifies the options of the mode:
+  - We want to use `ai/smollm2` model.
+  - We set the context size to `1024` tokens.
+
+    > [!NOTE]
+    > Each model has its own maximum context size. When increasing the context length,
+    > consider your hardware constraints. In general, try to use the smallest context size
+    > possible for your use case.
+  - We pass the llama.cpp server `--no-prefill-assistant` parameter,
+    see [the available parameters](https://github.com/ggml-org/llama.cpp/blob/master/tools/server/README.md).
+
+
+
+In the `chat` service:
+
+-  `depends_on` specifies that the `chat` service depends on the `ai_runner` service. The
+   `ai_runner` service will be started before the `chat` service, to allow injection of model information to the `chat` service.
+
 ## How it works
 
 During the `docker compose up` process, Docker Model Runner automatically pulls and runs the specified model.  
@@ -61,6 +79,6 @@
 
 This lets the `chat` service to interact with the model and use it for its own purposes.
 
-## Reference
+## Related pages
 
 - [Docker Model Runner documentation](/manuals/ai/model-runner.md)