Skip to content

[BUG]: DecodeSpecialTokens is exposed by the StreamingTokenDecoder but is not accessible when using an ILlamaExecutor #1201

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
jacob-mink-1996 opened this issue Jun 5, 2025 · 4 comments · May be fixed by #1203

Comments

@jacob-mink-1996
Copy link

jacob-mink-1996 commented Jun 5, 2025

Description

In the PR #777, the support for DecodeSpecialTokens was added. In some of my testing, I thought it would be useful to toggle this feature (right now, it appears to default to false!), but found that due to the way StreamingTokenDecoder is used (at least in StatelessExecutor), I am unable to toggle it.

It seems an implementation bug, rather than a feature request, since the feature is there, just not exposed. Additionally, I cannot implement my own StatelessExecutor implementation because it relies on methods that are internal to NativeApi.

The actual EFFECT of this is that models like Devstral that, in GGUF form, classify the token [TOOL_CALLS] as a special token to be output before any tool call-formatted output, will end up returning this token but not actually stream it out of the executor to the caller.

Reproduction Steps

Run a Devstral GGUF from LlamaSharp source with an appropriate tool call prompt - compare, for example, to Ollama in raw mode to see the [TOOL_CALLS] token.

Environment & Configuration

  • Operating system:
  • .NET runtime version:
  • LLamaSharp version: 0.24.0
  • CUDA version (if you are using cuda backend):
  • CPU & GPU device:

Known Workarounds

No response

@martindevans
Copy link
Member

Would you be interested in putting together a PR to fix this?

I cannot implement my own StatelessExecutor implementation because it relies on methods that are internal to NativeApi.

LLamaSharp should expose all native functionality of the llama.cpp API, that's the core goal of the library! Which bits are missing/inaccessible?

@jacob-mink-1996
Copy link
Author

Sure - I’ll get one started. In terms of general direction, is simply adding the parameter as a property to any Executor that uses the decoder in order to pass it through an appropriate design? Or would it be better to add it to one of the structures passed in to construct the Executor?

The problem line came from

NativeApi.llama_kv_self_seq_add(Context.NativeHandle, LLamaSeqId.Zero, tokensKeep + n_discard, n_past, -n_discard);

I was trying to duplicate behavior (naively) in my own class and project, and this internal api was the problem:

internal static extern void llama_kv_self_seq_add(SafeLLamaContextHandle ctx, LLamaSeqId seq, LLamaPos p0, LLamaPos p1, int delta);

Not sure that should be exposed, though.

@martindevans
Copy link
Member

In terms of general direction

It's up to you, but from a quick skim of the relevant code it looks to me like adding it as a property to IInferenceParams might be the best option.

llama_kv_self_seq_add

That's exposed through SafeLlamaContextHandle.KvCacheSequenceAdd:

public void KvCacheSequenceAdd(LLamaSeqId seq, LLamaPos p0, LLamaPos p1, int delta)
{
if (!KvCacheCanShift)
throw new InvalidOperationException("Cannot shift KV cache (KvCacheCanShift=False)");
NativeApi.llama_kv_self_seq_add(this, seq, p0, p1, delta);
}

It used to be that the entire llama.cpp API was exposed through NativeApi completely unchanged (no checks, accepting raw pointers etc). Over time we're slowly replacing those methods with slightly safer wrappers in the SafeHandle classes.

@jacob-mink-1996
Copy link
Author

Awesome - I agree with that from a user perspective 😄 I'll get this fired off quickly.

Makes sense that I got confused if the code is in the process of being changed - since I saw the rm function publicly available, I assumed I would also see the add function. That's on me for a naive copy-paste.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants