[BUG]: DecodeSpecialTokens is exposed by the StreamingTokenDecoder but is not accessible when using an ILlamaExecutor #1201

jacob-mink-1996 · 2025-06-05T18:39:15Z

Description

In the PR #777, the support for DecodeSpecialTokens was added. In some of my testing, I thought it would be useful to toggle this feature (right now, it appears to default to false!), but found that due to the way StreamingTokenDecoder is used (at least in StatelessExecutor), I am unable to toggle it.

It seems an implementation bug, rather than a feature request, since the feature is there, just not exposed. Additionally, I cannot implement my own StatelessExecutor implementation because it relies on methods that are internal to NativeApi.

The actual EFFECT of this is that models like Devstral that, in GGUF form, classify the token [TOOL_CALLS] as a special token to be output before any tool call-formatted output, will end up returning this token but not actually stream it out of the executor to the caller.

Reproduction Steps

Run a Devstral GGUF from LlamaSharp source with an appropriate tool call prompt - compare, for example, to Ollama in raw mode to see the [TOOL_CALLS] token.

Environment & Configuration

Operating system:
.NET runtime version:
LLamaSharp version: 0.24.0
CUDA version (if you are using cuda backend):
CPU & GPU device:

Known Workarounds

No response

martindevans · 2025-06-05T23:25:57Z

Would you be interested in putting together a PR to fix this?

I cannot implement my own StatelessExecutor implementation because it relies on methods that are internal to NativeApi.

LLamaSharp should expose all native functionality of the llama.cpp API, that's the core goal of the library! Which bits are missing/inaccessible?

jacob-mink-1996 · 2025-06-06T02:42:07Z

Sure - I’ll get one started. In terms of general direction, is simply adding the parameter as a property to any Executor that uses the decoder in order to pass it through an appropriate design? Or would it be better to add it to one of the structures passed in to construct the Executor?

The problem line came from

LLamaSharp/LLama/LLamaStatelessExecutor.cs

Line 159 in e52f5ef

    
           NativeApi.llama_kv_self_seq_add(Context.NativeHandle, LLamaSeqId.Zero, tokensKeep + n_discard, n_past, -n_discard);

I was trying to duplicate behavior (naively) in my own class and project, and this internal api was the problem:

LLamaSharp/LLama/Native/NativeApi.cs

Line 345 in e52f5ef

    
           internal static extern void llama_kv_self_seq_add(SafeLLamaContextHandle ctx, LLamaSeqId seq, LLamaPos p0, LLamaPos p1, int delta);

Not sure that should be exposed, though.

martindevans · 2025-06-06T13:09:42Z

In terms of general direction

It's up to you, but from a quick skim of the relevant code it looks to me like adding it as a property to IInferenceParams might be the best option.

llama_kv_self_seq_add

That's exposed through SafeLlamaContextHandle.KvCacheSequenceAdd:

LLamaSharp/LLama/Native/SafeLLamaContextHandle.cs

Lines 867 to 873 in e52f5ef

    
           public void KvCacheSequenceAdd(LLamaSeqId seq, LLamaPos p0, LLamaPos p1, int delta) 
        
           { 
        
               if (!KvCacheCanShift) 
        
                   throw new InvalidOperationException("Cannot shift KV cache (KvCacheCanShift=False)"); 
        
               NativeApi.llama_kv_self_seq_add(this, seq, p0, p1, delta); 
        
           }

It used to be that the entire llama.cpp API was exposed through NativeApi completely unchanged (no checks, accepting raw pointers etc). Over time we're slowly replacing those methods with slightly safer wrappers in the SafeHandle classes.

jacob-mink-1996 · 2025-06-06T13:39:07Z

Awesome - I agree with that from a user perspective 😄 I'll get this fired off quickly.

Makes sense that I got confused if the code is in the process of being changed - since I saw the rm function publicly available, I assumed I would also see the add function. That's on me for a naive copy-paste.

jacob-mink-1996 linked a pull request Jun 6, 2025 that will close this issue

Expose DecodeSpecialTokens through IInferenceParams for StatelessExecutor #1203

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[BUG]: DecodeSpecialTokens is exposed by the StreamingTokenDecoder but is not accessible when using an ILlamaExecutor #1201

[BUG]: DecodeSpecialTokens is exposed by the StreamingTokenDecoder but is not accessible when using an ILlamaExecutor #1201

jacob-mink-1996 commented Jun 5, 2025 •

edited

Loading

martindevans commented Jun 5, 2025

Uh oh!

jacob-mink-1996 commented Jun 6, 2025

Uh oh!

martindevans commented Jun 6, 2025

Uh oh!

jacob-mink-1996 commented Jun 6, 2025

Uh oh!

[BUG]: DecodeSpecialTokens is exposed by the StreamingTokenDecoder but is not accessible when using an ILlamaExecutor #1201

[BUG]: DecodeSpecialTokens is exposed by the StreamingTokenDecoder but is not accessible when using an ILlamaExecutor #1201

Comments

jacob-mink-1996 commented Jun 5, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Reproduction Steps

Environment & Configuration

Known Workarounds

martindevans commented Jun 5, 2025

Uh oh!

jacob-mink-1996 commented Jun 6, 2025

Uh oh!

martindevans commented Jun 6, 2025

Uh oh!

jacob-mink-1996 commented Jun 6, 2025

Uh oh!

jacob-mink-1996 commented Jun 5, 2025 •

edited

Loading