How to handle real-time sound streams #25

attsion · 2023-03-31T11:12:48Z

thank u

chriswalz · 2023-04-04T18:38:08Z

I'm also interested to know how use this with real-time audio streams

bmduarte · 2023-04-08T22:50:55Z

I managed to record audio chunks in real time using NAudio and iterate them as they were available.
It seems to slow to handle them. Is there any way to speed things up?

VRCWizard · 2023-04-13T19:19:28Z

https://github.com/AwesomeYuer/whisper.NAudio.NET is an example but it does seem too slow for real time with this repo.

https://github.com/Const-me/Whisper/tree/master/Examples/MicrophoneCS This other implementation is much faster but it doesn't use naudio loopback (doesn't detect silence... it seems to basically chuck audio depending on what you set the capture max and min duration to be)

sandrohanea · 2023-04-29T09:51:36Z

Nice options are provided here,
I am thinking that the best solution will be to have something like:

Having 2 configurable values:
IntervalTime = The time needed for one processing (e.g. default to 5 seconds)
OverlapTime= The time which will be processed twice in order to have continuity. (Default to 1 second)

Wait for the initial {IntervalTime} and process it.
Wait for additional {IntervalTime} for the second interval and process the time from {IntervalTime} - ({OverlapTime} / 2) to 2 * {IntervalTime}
Identify some common segment at the end of Result1 and Begginging of Result 2 and merge them.
Repeat the process.

During merge we need to keep in mind that exact the end of the segment might be gibberish, the same as the begging of the new segment (as it can be a word which is cut in half in any part).

This way, the context will be maintained as if you just cut randomly and process everything it can end up in the middle of the word and that cannot be recognized.

On the other hand, if we're always processing everything from 0 to CurrentTime, that will become too slow.

In order to improve quality, we can increase the overlap time.

Ideally, there would be this capability directly in this package, and anyone would be able to use NAudio or other stream provider to call some library (e.g. some PushStream similar to Azure Cognitive Service's PushStreams: https://github.com/Azure-Samples/cognitive-services-speech-sdk/blob/7e61fcb022f5dd75cfaf579703f8c92ad83317b0/samples/csharp/sharedcontent/console/speech_recognition_samples.cs#LL352C26-L352C26 )

dfengpo · 2024-06-30T02:24:57Z

+1

sandrohanea · 2024-12-26T18:13:16Z

For anyone exploring near-real-time audio stream processing, I’d like to invite you to check out the new library: EchoSharp (currently in its early stages).

EchoSharp is designed to leverage Whisper.net, along with other Speech-to-Text components and VAD modules, to enable near-real-time audio processing.

Your early feedback would be incredibly valuable, so if you have some time to try it out, I’d love to hear your thoughts!

sandrohanea added the enhancement New feature or request label Apr 29, 2023

sandrohanea mentioned this issue Jun 19, 2023

Using a microphone #80

Closed

This was referenced Dec 6, 2023

How do you translate a stream? #144

Closed

Can we provide an example of the input sound to Whisper.net through the websocket? #117

Closed

This was referenced Feb 3, 2024

Poor real-time speech recognition performance for Chinese #160

Closed

Real-time identification of microphone has no result. #155

Closed

zhouwg mentioned this issue Mar 13, 2024

PoC:clean-room implementation of real-time AI subtitle for English online-TV(OTT TV) kantv-ai/kantv#64

Closed

sandrohanea closed this as completed Dec 26, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How to handle real-time sound streams #25

How to handle real-time sound streams #25

attsion commented Mar 31, 2023

chriswalz commented Apr 4, 2023

bmduarte commented Apr 8, 2023

VRCWizard commented Apr 13, 2023

sandrohanea commented Apr 29, 2023

dfengpo commented Jun 30, 2024

sandrohanea commented Dec 26, 2024

How to handle real-time sound streams #25

How to handle real-time sound streams #25

Comments

attsion commented Mar 31, 2023

chriswalz commented Apr 4, 2023

bmduarte commented Apr 8, 2023

VRCWizard commented Apr 13, 2023

sandrohanea commented Apr 29, 2023

dfengpo commented Jun 30, 2024

sandrohanea commented Dec 26, 2024