Skip to content

Add anything and until method in the DSL for reasoning models #1480

Open
@rlouf

Description

@rlouf

This was triggered by trying to make structured generation with DeepSeek, where we want to let the model generate anything (unstructured) between the think tags, and start structured generation after </think> has been generated. It would thus make sense to have an anything construct that corresponds to .*, the unstructured case. Intuitively, we could add until method so that users could write the following for a classification task

from outlines.types import anything, either

model = ...

outline = "<think>" + anything.until("</think>") + "</think>" + either("yes", "no")
result = model.generate("Are you a reasoning model?", outline)

The implementation is trickier than it looks, anything.until("<think>") is best expressed with a negative lookahead:

((?!<\/think>).)*

but the regex engine that outlines-core uses does not support lookaheads. However, it is equivalent to the following regular expression:

([^<]|<[^\/]|<\/[^t]|<\/t[^h]|<\/th[^i]|<\/thi[^n]|<\/thin[^k]|<\/think[^>])*

The idea would thus be to have anything.until generate a Regex node with an "expanded lookahead". Computing the expansion shouldn't be too hard.

Note: we could also decide that "until" keeps the </think> token. We should discuss.

Regexes

It is tempting to also implement until_regex this way. It is very simple for simple patterns like [0-9]:

(^[0-9])*

however it can get complicated quite quickly, for instance anything.until_regex("(abc|def)") which we would need to translate to:

([^ad]|a[^b]|ab[^c]|d[^e]|de[^f])*

so we will not implement this in the first iteration.

Metadata

Metadata

Labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions