-
-
Notifications
You must be signed in to change notification settings - Fork 106
How to handle real-time sound streams #25
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
I'm also interested to know how use this with real-time audio streams |
I managed to record audio chunks in real time using NAudio and iterate them as they were available. |
https://github.com/AwesomeYuer/whisper.NAudio.NET is an example but it does seem too slow for real time with this repo. https://github.com/Const-me/Whisper/tree/master/Examples/MicrophoneCS This other implementation is much faster but it doesn't use naudio loopback (doesn't detect silence... it seems to basically chuck audio depending on what you set the capture max and min duration to be) |
Nice options are provided here, Having 2 configurable values:
During merge we need to keep in mind that exact the end of the segment might be gibberish, the same as the begging of the new segment (as it can be a word which is cut in half in any part). This way, the context will be maintained as if you just cut randomly and process everything it can end up in the middle of the word and that cannot be recognized. On the other hand, if we're always processing everything from 0 to CurrentTime, that will become too slow. In order to improve quality, we can increase the overlap time. Ideally, there would be this capability directly in this package, and anyone would be able to use NAudio or other stream provider to call some library (e.g. some PushStream similar to Azure Cognitive Service's PushStreams: https://github.com/Azure-Samples/cognitive-services-speech-sdk/blob/7e61fcb022f5dd75cfaf579703f8c92ad83317b0/samples/csharp/sharedcontent/console/speech_recognition_samples.cs#LL352C26-L352C26 ) |
+1 |
For anyone exploring near-real-time audio stream processing, I’d like to invite you to check out the new library: EchoSharp (currently in its early stages). EchoSharp is designed to leverage Whisper.net, along with other Speech-to-Text components and VAD modules, to enable near-real-time audio processing. Your early feedback would be incredibly valuable, so if you have some time to try it out, I’d love to hear your thoughts! |
thank u
The text was updated successfully, but these errors were encountered: