-
Notifications
You must be signed in to change notification settings - Fork 52
Adding top-n-sigma sampler #489
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Since this PR is still open could the documentation for this and XTC be added to examples/server/README.md and examples/main/README.md. |
Sure, will do. What else do people want for sampling? DRY? |
That does seem to be more popular than the other two you just added (based on what I've seen reported in other places). Looking at the I do personally think DRY is the best repeat penalty (of the ones that are publicly used), and so I would use it if I ever encounter looping again (but I wouldn't ever turn it on unless needed, since it does definitely affect quality if left on and there is no looping you want to avoid). I fortunately haven't seen looping in a while (and I think it is because newer models have this issue a lot less if at all) |
examples/main/README.md
Outdated
### XTC Sampling | ||
|
||
- --xtc-probability p: xtc probability (default: 0.0 => disabled) | ||
- --xtc-threshold t : xtc threshold (default: 1.0 => disabled) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe add something like the following:
XTC probability sets how likely the XTC sampler is to engage.
XTC threshold is the lower-bound for what probability is needed for a token to be considered a "Top choice" and when engaged only the lowest probability top choice is kept.
And maybe change ### XTC Sampling to ### XTC Sampling (Exclude Top Choices) since the description above refers to the full name
examples/main/README.md
Outdated
### Top-n-sigma Sampling | ||
|
||
Sets all logits $L_i$ to $-\infty$ where $L_i < L_{\rm max} - n \sigma$. Here $L_{\rm max}$ is the maximum logit, $\sigma$ is the logit standard deviation, and $n$ is the top-n-sigma parameter. | ||
|
||
- --top-n-sigma t top-n-sigma parmeter (default: 0.0 => disabled) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe add something letting people know that increasing top-n-sigma results in more tokens being considered, while decreasing it makes less tokens be considered as not all users will be able to figure that out from the mathematical description you provided.
Yep, DRY is good. XTC threshold is usually .1 and below to get anything meaningful out of it. Not sure how that compares here. Super interesting how this one is going to compare to the one I stole from mainline. |
The function of this sampler is conrolled by `--xtc-probability` and `--xtc-threshold`. `--xtc-probability` takes values between | ||
0 and 1 (<=0 turns this sampler off) and defines the probability for randomly invoking the sampler. `--xtc-threshold` | ||
defines the token probability threshold. Tokens with probability greater than this threshold will be excluded from the sampling. | ||
The sampler is turned off for `threshold > 0.5`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"conrolled" -> controlled
This isn't really accurate, as the lowest "top choice" is retained. As it is written it makes it seem like it removes all tokens with probability greater than the threshold.
Also I think the conditions for it to be turned off should be consistent instead of having the probability one in the beginning and the threshold one at the bottom
Why don't you make your changes on top of the PR? Or, we merge the way it is and you make a new PR with better description. |
Sure. I can do that. |
Given popular demand, adding top-n$\sigma$ sampler.
Set to off by default.
--sampling-chain ...n...
or--samplers ...top-n-sigma...
--top-n-sigma value