Add DRY and fix the server to use other new samplers. #504

Ph0rk0z · 2025-06-07T22:32:33Z

I have read the contributing guidelines
Self-reported review complexity:
- Low
- Medium
- High

So with some vibe coding I added what should be a working dry implementation. Nothing has exploded. Also the server was never modified to use the new samplers so they did nothing unless you were using the main llama.cpp executable without a front end.

I didn't update docs as the other PR seems to do that (but not the server, lol). Let me know if this is any good or not. Much lighter than porting the new sampler arch.

There's also a spot in the header where sampler order array was never updated? Does it have to be?

saood06 · 2025-06-08T05:11:07Z

I see you also used the Z-algorithm and the implementation looks the same as llama.cpp (which was based on the implementation done in kobold.cpp) but you stripped the comments explaining it and where it came from.

Any reason for that choice?

Ph0rk0z · 2025-06-08T10:24:01Z

Nope. If you find a mistake or a place that needs a comment/credit/etc please point it out. All I care is that it works and we have dry.

saood06 · 2025-06-08T10:55:44Z

Nope. If you find a mistake or a place that needs a comment/credit/etc please point it out.

I did point it out though?

The whole code of llama_sample_dry_impl looks like the exact same as llama_sampler_dry_apply from the DRY PR of mainline. Link to the PR (and specific file referenced) here as a reference.

But the difference is it is lacking pretty much all of comments (which contain attributions alongside a lot of helpful info) that are contained in the mainline PR.

Initially I only looked at the Z-algorithm because I was keeping up with the initial DRY PR in mainline and I knew that it was stalled waiting for permission for that specific code allowed into an MIT project (as kobold.cpp is AGPL-3.0 ) but now I see that what I said applies to that entire function not just the Z-algorithm.

All I care is that it works and we have dry.

That may be what you care about, but attribution and credit even when not required (I am not sure it is here, but IANAL) is a nice thing to give, and it looks especially bad considering it really does look like you copy and pasted the code and then removed the attributions and comments.

I am not saying that is what you did (I can't know, so I won't assume), but it definitely does look that way considering the code is identical (but the comments are not) and that is not a good look.

ikawrakow · 2025-06-08T11:28:21Z

I agree with @saood06. Let's not remove the credits and comments.

Ph0rk0z · 2025-06-08T11:46:41Z

It went through a LLM but you're working up some scenario where I actively went through and took them out. I'll put them back best I can.

saood06 · 2025-06-08T11:57:42Z

It went through a LLM but you're working up some scenario where I actively went through and took them out.

I did originally ask more politely "Any reason for that choice?" and you didn't offer an explanation so I wanted to make it clear what it looks like happened, and I even stated "I can't know, so I won't assume" and I was going to even reference you stating you did vibe coding, but the point was that it looks identical to that like that, and that does impact how people perceive it.

Even if you didn't actively take them out (which I believe you when you say you didn't), you did submit a PR where they were stripped out.

Ph0rk0z · 2025-06-08T12:04:21Z

It doesn't match their code 1:1 copy pasted.. putting the comments in sort of reveals that. Parts of it do. Its an amalgamation of the PR which was built from k.cpp which itself is probably based on pew and textgen webui code.

saood06 · 2025-06-08T12:28:52Z

It doesn't match their code 1:1 copy pasted.. putting the comments in sort of reveals that. Parts of it do. Its an amalgamation of the PR which was built from k.cpp which itself is probably based on pew and textgen webui code.

Enough of it did where me looking at both side by side made me feel the need to say something. I never stated it was a copy and paste, but unless you look closely it is hard to tell that it isn't.

Thank you for putting in the work to make this PR I do appreciate it, sorry that didn't come across in my earlier comments, but I still stand by what I said (but maybe I should have included the thank you earlier).

src/llama-sampling.cpp

saood06 · 2025-06-08T12:32:40Z

I haven't built or ran the code yet, don't have time to test it tonight.

Explanation was that it was to show logic of going back one position and then afterwards to the next token ahead.. hmm

Ph0rk0z · 2025-06-08T12:53:16Z

That's fine, I hope that more people test it than just us. Remember that dry removes/breaks up engrams not single word repetition. I'll pull changes from here back in and keep rolling with it. Also another reminder that anyone using XTC or n sigma on server was not having it apply. The parameters weren't there.

Need to figure out if new samplers all belong here in sampling.h too

    std::vector<llama_sampler_type> samplers_sequence = {
        llama_sampler_type::TOP_K,
        llama_sampler_type::TFS_Z,
        llama_sampler_type::TYPICAL_P,
        llama_sampler_type::TOP_P,
        llama_sampler_type::MIN_P,
        llama_sampler_type::TEMPERATURE
    };

Edit: this is default sampler order.. so makes no difference if you want no new samplers within it.

src/llama-sampling.cpp

src/llama-sampling.h

saood06 · 2025-06-08T13:09:57Z

It doesn't match their code 1:1 copy pasted

From my experience porting code from mainline it is usually easier to do that and then fix incompatibilities and any other issues than to do what you did. It also makes reviewing it easier.

Ph0rk0z · 2025-06-08T13:11:22Z

Yea, in this case it is much much too different. I took several cracks at that and failed each time.

saood06 · 2025-06-08T13:20:16Z

I haven't built or ran the code yet, don't have time to test it tonight.

I did leave some more comments from just reading the code, I don't think it is worth testing anyway until they are resolved.

Unused and vector version works.

Ph0rk0z · 2025-06-08T13:57:00Z

It does compile and work while penalizing tokens per debug messages.

why does it show comments as pending in gh?

…dles sequence breakers

saood06 · 2025-06-09T10:31:04Z

why does it show comments as pending in gh?

That is odd.

If you want I can try to port the ring buffer if you say it offers better efficiency, but I am testing it as it is right now.

I'll approve or request changes based on that.

Ph0rk0z · 2025-06-09T10:38:59Z

I tried with the RB and it caused more problems. Unless there's some big slowdowns, its probably not worth it. Another "trick" directly from pew was to set high top_K like (i.e 100) and place it before DRY to speed everything up. I've been doing that on mainline since I heard about it. Here I already did DRY on/off and the t/s was the same. Probably the thing to look out for.

Your Name added 7 commits June 6, 2025 15:54

Skeleton

4920f2f

Non working

1a275ed

Well this is something.. it compiles

ce0b5f0

Debug and fixes

e8a1152

Cordon off dry debug

6d5c0a2

Add new samplers to server, xtc or sigma did nothing before either

bac93da

Merge branch 'main' of https://github.com/ikawrakow/ik_llama.cpp

e05d780

Comments and attributions (approx)

4edd4bc

ikawrakow requested a review from saood06 June 8, 2025 12:02

Merge branch 'main' into main

cad53af

saood06 reviewed Jun 8, 2025

View reviewed changes

src/llama-sampling.cpp Outdated Show resolved Hide resolved

saood06 reviewed Jun 8, 2025

View reviewed changes

src/llama-sampling.cpp Show resolved Hide resolved

Ph0rk0z added 2 commits June 8, 2025 12:34

Doubled float_max.

ed1388b

Remove LLM jank.

401b88e

Explanation was that it was to show logic of going back one position and then afterwards to the next token ahead.. hmm

Merge branch 'main' of https://github.com/Ph0rk0z/ik_llama.cpp

b5d7da3

saood06 reviewed Jun 8, 2025

View reviewed changes

src/llama-sampling.cpp Outdated Show resolved Hide resolved

saood06 reviewed Jun 8, 2025

View reviewed changes

src/llama-sampling.h Show resolved Hide resolved

Ph0rk0z and others added 2 commits June 8, 2025 13:45

Take out ring buffer

32dab5e

Unused and vector version works.

Merge branch 'main' of https://github.com/Ph0rk0z/ik_llama.cpp

758a92b

the impl function only does Z-algorithm, while the main function han…

9e70498

…dles sequence breakers

saood06 mentioned this pull request Jun 9, 2025

Webui improvement #481

Merged

4 tasks

Merge branch 'main' of https://github.com/ikawrakow/ik_llama.cpp

ad4c815

saood06 mentioned this pull request Jun 10, 2025

add dry sampler #513

Merged

4 tasks

Your Name and others added 9 commits June 10, 2025 05:45

Merge branch 'main' of https://github.com/ikawrakow/ik_llama.cpp

98f1eb2

Typos

89cf632

Fix buffer ambiguity

b47fddc

Merge branch 'main' of https://github.com/ikawrakow/ik_llama.cpp

d3da7a7

Merge branch 'main' of https://github.com/ikawrakow/ik_llama.cpp

c25be41

Merge branch 'main' into main

6741c08

Merge branch 'main' of https://github.com/ikawrakow/ik_llama.cpp

8e737ba

Double declaration

486d4f3

Merge branch 'main' of https://github.com/Ph0rk0z/ik_llama.cpp

ff76ae0

Add DRY and fix the server to use other new samplers. #504

Are you sure you want to change the base?

Add DRY and fix the server to use other new samplers. #504

Uh oh!

Conversation

Ph0rk0z commented Jun 7, 2025

Uh oh!

saood06 commented Jun 8, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Ph0rk0z commented Jun 8, 2025

Uh oh!

saood06 commented Jun 8, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ikawrakow commented Jun 8, 2025

Uh oh!

Ph0rk0z commented Jun 8, 2025

Uh oh!

saood06 commented Jun 8, 2025

Uh oh!

Ph0rk0z commented Jun 8, 2025

Uh oh!

saood06 commented Jun 8, 2025

Uh oh!

Uh oh!

Uh oh!

saood06 commented Jun 8, 2025

Uh oh!

Ph0rk0z commented Jun 8, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

saood06 commented Jun 8, 2025

Uh oh!

Ph0rk0z commented Jun 8, 2025

Uh oh!

saood06 commented Jun 8, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Ph0rk0z commented Jun 8, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

saood06 commented Jun 9, 2025

Uh oh!

Ph0rk0z commented Jun 9, 2025

Uh oh!

Uh oh!

saood06 commented Jun 8, 2025 •

edited

Loading

saood06 commented Jun 8, 2025 •

edited

Loading

Ph0rk0z commented Jun 8, 2025 •

edited

Loading

saood06 commented Jun 8, 2025 •

edited

Loading

Ph0rk0z commented Jun 8, 2025 •

edited

Loading