Skip to content

Adding top-n-sigma sampler #489

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 5 commits into from
Jun 3, 2025
Merged

Adding top-n-sigma sampler #489

merged 5 commits into from
Jun 3, 2025

Conversation

ikawrakow
Copy link
Owner

Given popular demand, adding top-n $\sigma$ sampler.

Set to off by default.

  • Add to sampling chain using --sampling-chain ...n... or --samplers ...top-n-sigma...
  • Set parameter using --top-n-sigma value

@saood06
Copy link
Collaborator

saood06 commented Jun 3, 2025

Since this PR is still open could the documentation for this and XTC be added to examples/server/README.md and examples/main/README.md.

@ikawrakow
Copy link
Owner Author

Sure, will do.

What else do people want for sampling?

DRY?

@saood06
Copy link
Collaborator

saood06 commented Jun 3, 2025

What else do people want for sampling?

DRY?

That does seem to be more popular than the other two you just added (based on what I've seen reported in other places). Looking at the main/README.md of mainline that is the only one that is missing. (We also have TFS which was removed in mainline due to low usage and bugs).

I do personally think DRY is the best repeat penalty (of the ones that are publicly used), and so I would use it if I ever encounter looping again (but I wouldn't ever turn it on unless needed, since it does definitely affect quality if left on and there is no looping you want to avoid). I fortunately haven't seen looping in a while (and I think it is because newer models have this issue a lot less if at all)

Comment on lines 242 to 246
### XTC Sampling

- --xtc-probability p: xtc probability (default: 0.0 => disabled)
- --xtc-threshold t : xtc threshold (default: 1.0 => disabled)

Copy link
Collaborator

@saood06 saood06 Jun 3, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe add something like the following:

XTC probability sets how likely the XTC sampler is to engage.
XTC threshold is the lower-bound for what probability is needed for a token to be considered a "Top choice" and when engaged only the lowest probability top choice is kept.

And maybe change ### XTC Sampling to ### XTC Sampling (Exclude Top Choices) since the description above refers to the full name

Comment on lines 247 to 252
### Top-n-sigma Sampling

Sets all logits $L_i$ to $-\infty$ where $L_i < L_{\rm max} - n \sigma$. Here $L_{\rm max}$ is the maximum logit, $\sigma$ is the logit standard deviation, and $n$ is the top-n-sigma parameter.

- --top-n-sigma t top-n-sigma parmeter (default: 0.0 => disabled)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe add something letting people know that increasing top-n-sigma results in more tokens being considered, while decreasing it makes less tokens be considered as not all users will be able to figure that out from the mathematical description you provided.

@Ph0rk0z
Copy link

Ph0rk0z commented Jun 3, 2025

Yep, DRY is good. XTC threshold is usually .1 and below to get anything meaningful out of it. Not sure how that compares here. Super interesting how this one is going to compare to the one I stole from mainline.

Comment on lines +244 to +247
The function of this sampler is conrolled by `--xtc-probability` and `--xtc-threshold`. `--xtc-probability` takes values between
0 and 1 (<=0 turns this sampler off) and defines the probability for randomly invoking the sampler. `--xtc-threshold`
defines the token probability threshold. Tokens with probability greater than this threshold will be excluded from the sampling.
The sampler is turned off for `threshold > 0.5`.
Copy link
Collaborator

@saood06 saood06 Jun 3, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"conrolled" -> controlled

This isn't really accurate, as the lowest "top choice" is retained. As it is written it makes it seem like it removes all tokens with probability greater than the threshold.

Also I think the conditions for it to be turned off should be consistent instead of having the probability one in the beginning and the threshold one at the bottom

@ikawrakow
Copy link
Owner Author

Why don't you make your changes on top of the PR? Or, we merge the way it is and you make a new PR with better description.

@Ph0rk0z Ph0rk0z mentioned this pull request Jun 3, 2025
4 tasks
@saood06
Copy link
Collaborator

saood06 commented Jun 3, 2025

Or, we merge the way it is and you make a new PR with better description.

Sure. I can do that.

@ikawrakow ikawrakow merged commit f6d5fbd into main Jun 3, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants