-
Notifications
You must be signed in to change notification settings - Fork 4.2k
Whisper large v3 model repeats a lot #1507
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
If we can determine in some way that the problem is |
Yeah, the problem seems to be the model itself: openai/whisper#1762 (comment) The problem seems to occur during silence, so maybe Whisper.cpp could remove silence from audio? |
Removing silence from the audio is outside the scope of |
Repetition on silences is a big problem with Whisper in general, large v3 just made it even worse. Having it built into Whisper.cpp would improve the general Whisper quality for all consumers, instead of every consumer having to implement a custom solution. It could be opt-in. I think it's worth considering. In my case, I haven't found any good solutions for it that is not Python based. I need it to be C++/C/Swift. |
We can add a naive VAD as an optional pre-processing step, but I'm doubtful that it will help much, because the samples that I see failing with v3 do not contain silences. Here are some strategies that I've observed to reduce repetition and hallucinations:
|
I think it's possible for someone to fine-tune the model using |
I tried this with large-v2 and it made it even better! |
There is the no speech token that currently whisper.cpp ignores Actually implement no speech threshold similar to openai/whisper might help. |
I am trying to work-around this problem and VAD is not useful since if there even small silence interval it would emit something. whisper.cpp ignores "no speak" token while it is very crucial for it to work and it seems impossible to make it work without it. |
I have added a PR to return nosp token: #1588 |
I don't see this as a param I can use in the Swift Package. What am I missing? |
I have gotten many reports that the large v3 model repeats sentences much more often than v2. I'm not sure if there's anything Whisper.cpp can do about this.
The text was updated successfully, but these errors were encountered: