Skip to content

Streaming creates a lot of repetitions, behavior does not resemble example #2338

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
chengyjonathan opened this issue Aug 6, 2024 · 7 comments

Comments

@chengyjonathan
Copy link

chengyjonathan commented Aug 6, 2024

./stream -m ./models/ggml-base.en.bin -t 8 --step 500 --length 5000

Not sure if I'm doing something wrong. Just tried to run out of the box.

But instead of incrementally adding text to the transcription, i get repeating text as shown below

[Start speaking]
Testing to see if the log is there any mor

Testing to see if the log is there anymore

Testing to see if the log is there anymore

Testing to see if the log is there anymore

Testing to see if the log is there anymore

Testing to see if the log is there anymore. Okay, it's not there anymore.

@chengyjonathan
Copy link
Author

Reading through some other similar issues, i tried

./stream -m ./models/ggml-large-v2.bin -t 8 --step 500 --length 5000

But still got the same repetitions

[Start speaking]
All right, we're going to prove it h

Alright, we're going to prove it her

All right, we're going to improvemen

Alright, we're gonna prove it here. I don't think things are better, but.

Alright, we're gonna prove it here. I don't think things are better, but I'm noticing...

@chengyjonathan
Copy link
Author

https://github.com/ggerganov/whisper.cpp/tree/master/examples/stream

Mostly going off of this readme, where the functionality seems quite effective?

@chengyjonathan
Copy link
Author

The VAD transcriptions are still quite nice. Even with the base model.

So it might just be a matter of hallucinations on silences?

@chengyjonathan
Copy link
Author

Only problem with VAD approach is that it seems like it waits for a pause before running the inference, meaning that it's not quite streaming anymore?

@chengyjonathan
Copy link
Author

Oh interesting, when i do

./stream -m ./models/ggml-base.en.bin --step 0 --step 1000

I start seeing the issue, even though at a 10000 step the vad seems to solve the hallucination problem.

So something about the smaller context is causing...repetitions I guess? Or in the one second silences, we're getting repetitions in the inference.

[Start speaking]

Now here's where we do one second based inferences

Now here's where we do one second based inferences

Now here's where we do one second base inferences a

Now here's where we do one second based inferences

Now here's where we do one second base inferences a

Now here's where we do one second based inferences and see how the reaction speed looks like.

There's going to be a lot of repetition now. I wond

there's going to be a lot of repetition now. I wond

there's going to be a lot of repetition now. I wonder why that is.

@chengyjonathan
Copy link
Author

I tried the advice to adjust the max_context down, but I still get:
(#1507 (comment))

[Start speaking]

Testing one, two, three. Hmm, okay, well let'

Testing one, two, three. Hmm, okay, well let'

Testing one, two, three. Hmm, okay, well let'

Testing one, two, three. Hmm, okay, well let's see how this goes and let's see whether ther

Testing one, two, three. Hmm, okay, well let's see how this goes and let's see whether there's repetitions, there's all these things.

a lot of repetitions.

@chengyjonathan
Copy link
Author

I tried the large v2 model, as per
(#1507 (comment))

And I still have repetitions. I can't tell if I'm misunderstanding something.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant