Whisper large v3 model repeats a lot #1507

sindresorhus · 2023-11-17T06:49:54Z

I have gotten many reports that the large v3 model repeats sentences much more often than v2. I'm not sure if there's anything Whisper.cpp can do about this.

ggerganov · 2023-11-17T07:56:06Z

If we can determine in some way that the problem is whisper.cpp related, then I'll look into it more.
But so far, my analysis indicate that the problem is in the v3 model itself as I observe similar issues with the OpenAI implementation.

sindresorhus · 2023-11-17T08:00:07Z

Yeah, the problem seems to be the model itself: openai/whisper#1762 (comment)

The problem seems to occur during silence, so maybe Whisper.cpp could remove silence from audio?

ggerganov · 2023-11-17T08:24:07Z

Removing silence from the audio is outside the scope of whisper.cpp because AFAIK there are many different algorithms to achieve this. It's better to leave it to the 3rd party to decide which one to use for their specific case.

sindresorhus · 2023-11-17T09:05:24Z

Repetition on silences is a big problem with Whisper in general, large v3 just made it even worse. Having it built into Whisper.cpp would improve the general Whisper quality for all consumers, instead of every consumer having to implement a custom solution. It could be opt-in. I think it's worth considering.

In my case, I haven't found any good solutions for it that is not Python based. I need it to be C++/C/Swift.

ggerganov · 2023-11-17T11:58:17Z

We can add a naive VAD as an optional pre-processing step, but I'm doubtful that it will help much, because the samples that I see failing with v3 do not contain silences.

Here are some strategies that I've observed to reduce repetition and hallucinations:

Use 5 beams
Increase entropy threshold from the default 2.4 to 2.8 for example. Higher threshold will reject repetitive text and fallback to sampling with higher temperature
Reduce the maximum context size (--max-context). By default it is 224. Setting it to 64 or 32 can reduce the repetitions significantly. Setting it to 0 will most likely eliminate all repetitions, but the transcription quality can be affected because it will be losing the context from the previous transcript

bobqianic · 2023-11-17T18:46:47Z

Repetition on silences is a big problem with Whisper in general

I think it's possible for someone to fine-tune the model using silence audio. Even the largest whisper model, which has 1.5 B parameters, is considered relatively small in the context of today's LLM : )

dubefab · 2023-11-20T00:18:10Z

We can add a naive VAD as an optional pre-processing step, but I'm doubtful that it will help much, because the samples that I see failing with v3 do not contain silences.

Here are some strategies that I've observed to reduce repetition and hallucinations:

Use 5 beams

Increase entropy threshold from the default 2.4 to 2.8 for example. Higher threshold will reject repetitive text and fallback to sampling with higher temperature

Reduce the maximum context size (--max-context). By default it is 224. Setting it to 64 or 32 can reduce the repetitions significantly. Setting it to 0 will most likely eliminate all repetitions, but the transcription quality can be affected because it will be losing the context from the previous transcript

I tried this with large-v2 and it made it even better!

jxy · 2023-11-25T05:59:55Z

There is the no speech token that currently whisper.cpp ignores

https://github.com/ggerganov/whisper.cpp/blob/447d49530c9af41fe24f2ae510f452903dba330d/whisper.cpp#L4592

Actually implement no speech threshold similar to openai/whisper might help.

ex3ndr · 2023-12-03T09:02:55Z

I am trying to work-around this problem and VAD is not useful since if there even small silence interval it would emit something. whisper.cpp ignores "no speak" token while it is very crucial for it to work and it seems impossible to make it work without it.

ex3ndr · 2023-12-03T09:43:27Z

I have added a PR to return nosp token: #1588

itsthisjustin · 2024-05-10T01:48:36Z

Reduce the maximum context size (--max-context). By default it is 224. Setting it to 64 or 32 can reduce the repetitions significantly. Setting it to 0 will most likely eliminate all repetitions, but the transcription quality can be affected because it will be losing the context from the previous transcript

I don't see this as a param I can use in the Swift Package. What am I missing?

aiyinyuedejustin · 2024-12-24T16:35:56Z

same thing here...

bobqianic added the question Further information is requested label Nov 18, 2023

pannous mentioned this issue Nov 20, 2023

Language detection - Trim silence #1104

Open

ken0911208818 mentioned this issue Jan 24, 2024

feat: refactor and optimize code for improved performance appleboy/whisper.cpp#2

Merged

jwijffels mentioned this issue Jan 29, 2024

Notes on repetitions bnosac/audio.whisper#38

Open

liusong1111 added a commit to liusong1111/whisper-rs that referenced this issue Mar 4, 2024

https://github.com/ggerganov/whisper.cpp/issues/1507

a31b94e

taivlam mentioned this issue Mar 25, 2024

Hullucinating issues #1515

Closed

zhouwg mentioned this issue Apr 2, 2024

PoC:clean-room implementation of real-time AI subtitle for English online-TV(OTT TV) kantv-ai/kantv#64

Closed

thewh1teagle mentioned this issue Jun 4, 2024

[Bug]: App keeps repeating two same sentences after some time and till runtime end. thewh1teagle/vibe#98

Closed

chengyjonathan mentioned this issue Aug 6, 2024

Streaming creates a lot of repetitions, behavior does not resemble example #2338

Closed

jensdraht1999 mentioned this issue Sep 1, 2024

Latest 1.6.2 release substantial increase in hallucinations for large-v3 on CUDA #2191

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Whisper large v3 model repeats a lot #1507

Whisper large v3 model repeats a lot #1507

sindresorhus commented Nov 17, 2023

ggerganov commented Nov 17, 2023

sindresorhus commented Nov 17, 2023

ggerganov commented Nov 17, 2023

sindresorhus commented Nov 17, 2023

ggerganov commented Nov 17, 2023

bobqianic commented Nov 17, 2023

dubefab commented Nov 20, 2023

jxy commented Nov 25, 2023

ex3ndr commented Dec 3, 2023

ex3ndr commented Dec 3, 2023

itsthisjustin commented May 10, 2024

aiyinyuedejustin commented Dec 24, 2024

Whisper large v3 model repeats a lot #1507

Whisper large v3 model repeats a lot #1507

Comments

sindresorhus commented Nov 17, 2023

ggerganov commented Nov 17, 2023

sindresorhus commented Nov 17, 2023

ggerganov commented Nov 17, 2023

sindresorhus commented Nov 17, 2023

ggerganov commented Nov 17, 2023

bobqianic commented Nov 17, 2023

dubefab commented Nov 20, 2023

jxy commented Nov 25, 2023

ex3ndr commented Dec 3, 2023

ex3ndr commented Dec 3, 2023

itsthisjustin commented May 10, 2024

aiyinyuedejustin commented Dec 24, 2024