Description
Q1. When whisper.cpp transcodes a 5-hour audio file finding mostly noise, it enters a fast mode transcoding at 90×.
BUG: In this mode 1–3-minute conversations are missed, and leave no trace in the output
>>> How can I avoid whisper missing short conversations?
For example, should the context be periodically destroyed
Q2. When audio is 30h+ the file size exceeds 4 GiB which 32-bit .wav cannot handle producing empty files.
>>> How can I transcode large audio files or infinite streams?
ffmpeg has -rf64 option for RF64 format https://en.wikipedia.org/wiki/RF64
is there better input format than wav?
I would prefer feeding RAW samples, float or specific PCM
Q3. >>> Is there some other way of improving transcoding word-yield considering the below commands?
hardware is 8 GiB RAM 2021 Apple M processors macOS and 2 parallel whisper instances
A custom Go binding is being considered
It is batch execution, so slow transcode is not a problem
Creating the audio stream:
ffmpeg -hide_banner -i i.mp4 -nostdin -vn -ac 1 -ar 16000 -f wav -
whisper command:
./whisper.cpp/build/bin/main --model whisper.cpp/models/ggml-small.en.bin
--file - --output-lrc --output-file o.lrc