-
Notifications
You must be signed in to change notification settings - Fork 54
Add DRY and fix the server to use other new samplers. #504
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
I see you also used the Z-algorithm and the implementation looks the same as llama.cpp (which was based on the implementation done in kobold.cpp) but you stripped the comments explaining it and where it came from. Any reason for that choice? |
Nope. If you find a mistake or a place that needs a comment/credit/etc please point it out. All I care is that it works and we have dry. |
I did point it out though? The whole code of But the difference is it is lacking pretty much all of comments (which contain attributions alongside a lot of helpful info) that are contained in the mainline PR. Initially I only looked at the Z-algorithm because I was keeping up with the initial DRY PR in mainline and I knew that it was stalled waiting for permission for that specific code allowed into an MIT project (as kobold.cpp is AGPL-3.0 ) but now I see that what I said applies to that entire function not just the Z-algorithm.
That may be what you care about, but attribution and credit even when not required (I am not sure it is here, but IANAL) is a nice thing to give, and it looks especially bad considering it really does look like you copy and pasted the code and then removed the attributions and comments. I am not saying that is what you did (I can't know, so I won't assume), but it definitely does look that way considering the code is identical (but the comments are not) and that is not a good look. |
I agree with @saood06. Let's not remove the credits and comments. |
It went through a LLM but you're working up some scenario where I actively went through and took them out. I'll put them back best I can. |
I did originally ask more politely "Any reason for that choice?" and you didn't offer an explanation so I wanted to make it clear what it looks like happened, and I even stated "I can't know, so I won't assume" and I was going to even reference you stating you did vibe coding, but the point was that it looks identical to that like that, and that does impact how people perceive it. Even if you didn't actively take them out (which I believe you when you say you didn't), you did submit a PR where they were stripped out. |
It doesn't match their code 1:1 copy pasted.. putting the comments in sort of reveals that. Parts of it do. Its an amalgamation of the PR which was built from k.cpp which itself is probably based on pew and textgen webui code. |
Enough of it did where me looking at both side by side made me feel the need to say something. I never stated it was a copy and paste, but unless you look closely it is hard to tell that it isn't. Thank you for putting in the work to make this PR I do appreciate it, sorry that didn't come across in my earlier comments, but I still stand by what I said (but maybe I should have included the thank you earlier). |
I haven't built or ran the code yet, don't have time to test it tonight. |
Explanation was that it was to show logic of going back one position and then afterwards to the next token ahead.. hmm
That's fine, I hope that more people test it than just us. Remember that dry removes/breaks up engrams not single word repetition. I'll pull changes from here back in and keep rolling with it. Also another reminder that anyone using XTC or n sigma on server was not having it apply. The parameters weren't there. Need to figure out if new samplers all belong here in sampling.h too
Edit: this is default sampler order.. so makes no difference if you want no new samplers within it. |
From my experience porting code from mainline it is usually easier to do that and then fix incompatibilities and any other issues than to do what you did. It also makes reviewing it easier. |
Yea, in this case it is much much too different. I took several cracks at that and failed each time. |
I did leave some more comments from just reading the code, I don't think it is worth testing anyway until they are resolved. |
Unused and vector version works.
…dles sequence breakers
That is odd. If you want I can try to port the ring buffer if you say it offers better efficiency, but I am testing it as it is right now. I'll approve or request changes based on that. |
I tried with the RB and it caused more problems. Unless there's some big slowdowns, its probably not worth it. Another "trick" directly from pew was to set high top_K like (i.e 100) and place it before DRY to speed everything up. I've been doing that on mainline since I heard about it. Here I already did DRY on/off and the t/s was the same. Probably the thing to look out for. |
So with some vibe coding I added what should be a working dry implementation. Nothing has exploded. Also the server was never modified to use the new samplers so they did nothing unless you were using the main llama.cpp executable without a front end.
I didn't update docs as the other PR seems to do that (but not the server, lol). Let me know if this is any good or not. Much lighter than porting the new sampler arch.
There's also a spot in the header where sampler order array was never updated? Does it have to be?