Skip to content

add context-size prop #22924

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 2 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 4 additions & 0 deletions content/manuals/ai/model-runner/_index.md
Original file line number Diff line number Diff line change
Expand Up @@ -385,3 +385,7 @@ The Docker Model CLI currently lacks consistent support for specifying models by
## Share feedback

Thanks for trying out Docker Model Runner. Give feedback or report any bugs you may find through the **Give feedback** link next to the **Enable Docker Model Runner** setting.

## Related pages

- [Use Model Runner with Compose](/manuals/compose/how-tos/model-runner.md)
34 changes: 26 additions & 8 deletions content/manuals/compose/how-tos/model-runner.md
Original file line number Diff line number Diff line change
Expand Up @@ -40,15 +40,33 @@
type: model
options:
model: ai/smollm2
context-size: 1024
runtime-flags: "--no-prefill-assistant"
```

Notice the dedicated `provider` attribute in the `ai_runner` service.
This attribute specifies that the service is a model provider and lets you define options such as the name of the model to be used.

There is also a `depends_on` attribute in the `chat` service.
This attribute specifies that the `chat` service depends on the `ai_runner` service.
This means that the `ai_runner` service will be started before the `chat` service to allow injection of model information to the `chat` service.

Notice the following:

In the `ai_runner` service:

- `provider.type`: Specifies that the service is a `model` provider.
- `provider.options`: Specifies the options of the mode:
- We want to use `ai/smollm2` model.

Check warning on line 53 in content/manuals/compose/how-tos/model-runner.md

View workflow job for this annotation

GitHub Actions / vale

[vale] reported by reviewdog 🐶 [Docker.We] Avoid using first-person plural like 'We'. Raw Output: {"message": "[Docker.We] Avoid using first-person plural like 'We'.", "location": {"path": "content/manuals/compose/how-tos/model-runner.md", "range": {"start": {"line": 53, "column": 5}}}, "severity": "WARNING"}

Check warning on line 53 in content/manuals/compose/how-tos/model-runner.md

View workflow job for this annotation

GitHub Actions / vale

[vale] reported by reviewdog 🐶 [Docker.We] Avoid using first-person plural like 'We'. Raw Output: {"message": "[Docker.We] Avoid using first-person plural like 'We'.", "location": {"path": "content/manuals/compose/how-tos/model-runner.md", "range": {"start": {"line": 53, "column": 5}}}, "severity": "WARNING"}
- We set the context size to `1024` tokens.

Check warning on line 54 in content/manuals/compose/how-tos/model-runner.md

View workflow job for this annotation

GitHub Actions / vale

[vale] reported by reviewdog 🐶 [Docker.We] Avoid using first-person plural like 'We'. Raw Output: {"message": "[Docker.We] Avoid using first-person plural like 'We'.", "location": {"path": "content/manuals/compose/how-tos/model-runner.md", "range": {"start": {"line": 54, "column": 5}}}, "severity": "WARNING"}

Check warning on line 54 in content/manuals/compose/how-tos/model-runner.md

View workflow job for this annotation

GitHub Actions / vale

[vale] reported by reviewdog 🐶 [Docker.We] Avoid using first-person plural like 'We'. Raw Output: {"message": "[Docker.We] Avoid using first-person plural like 'We'.", "location": {"path": "content/manuals/compose/how-tos/model-runner.md", "range": {"start": {"line": 54, "column": 5}}}, "severity": "WARNING"}

> [!NOTE]
> Each model has its own maximum context size. When increasing the context length,
> consider your hardware constraints. In general, try to use the smallest context size
> possible for your use case.
- We pass the llama.cpp server `--no-prefill-assistant` parameter,

Check warning on line 60 in content/manuals/compose/how-tos/model-runner.md

View workflow job for this annotation

GitHub Actions / vale

[vale] reported by reviewdog 🐶 [Docker.We] Avoid using first-person plural like 'We'. Raw Output: {"message": "[Docker.We] Avoid using first-person plural like 'We'.", "location": {"path": "content/manuals/compose/how-tos/model-runner.md", "range": {"start": {"line": 60, "column": 5}}}, "severity": "WARNING"}

Check warning on line 60 in content/manuals/compose/how-tos/model-runner.md

View workflow job for this annotation

GitHub Actions / vale

[vale] reported by reviewdog 🐶 [Docker.We] Avoid using first-person plural like 'We'. Raw Output: {"message": "[Docker.We] Avoid using first-person plural like 'We'.", "location": {"path": "content/manuals/compose/how-tos/model-runner.md", "range": {"start": {"line": 60, "column": 5}}}, "severity": "WARNING"}
see [the available parameters](https://github.com/ggml-org/llama.cpp/blob/master/tools/server/README.md).



In the `chat` service:

- `depends_on` specifies that the `chat` service depends on the `ai_runner` service. The
`ai_runner` service will be started before the `chat` service, to allow injection of model information to the `chat` service.

Check warning on line 68 in content/manuals/compose/how-tos/model-runner.md

View workflow job for this annotation

GitHub Actions / vale

[vale] reported by reviewdog 🐶 [Docker.RecommendedWords] Consider using 'let' instead of 'allow' Raw Output: {"message": "[Docker.RecommendedWords] Consider using 'let' instead of 'allow'", "location": {"path": "content/manuals/compose/how-tos/model-runner.md", "range": {"start": {"line": 68, "column": 70}}}, "severity": "INFO"}

Check warning on line 68 in content/manuals/compose/how-tos/model-runner.md

View workflow job for this annotation

GitHub Actions / vale

[vale] reported by reviewdog 🐶 [Docker.RecommendedWords] Consider using 'let' instead of 'allow' Raw Output: {"message": "[Docker.RecommendedWords] Consider using 'let' instead of 'allow'", "location": {"path": "content/manuals/compose/how-tos/model-runner.md", "range": {"start": {"line": 68, "column": 70}}}, "severity": "INFO"}

## How it works

During the `docker compose up` process, Docker Model Runner automatically pulls and runs the specified model.
Expand All @@ -61,6 +79,6 @@

This lets the `chat` service to interact with the model and use it for its own purposes.

## Reference
## Related pages

- [Docker Model Runner documentation](/manuals/ai/model-runner.md)