Closed
Description
Please include information about your system, the steps to reproduce the bug, and the version of llama.cpp that you are using. If possible, please provide a minimal code example that reproduces the bug.
In the current speculative.cpp implementation, params.sparams.temp is forced to -1.0f
However, if I change this value to 0:
draft sampling seems to fail completely:
(speculative.log)
Is this intended behavior?
I'm working on #5625 which removes the temperature limit so I'd like to get this fixed