Skip to content

whisper-cli 1.7.4 250113 discards model large-v3-turbo segment output from upstream-whisper #2696

Open
@resolutecake

Description

@resolutecake

Q1. When whisper.cpp transcodes a 5-hour audio file finding mostly noise, it enters a fast mode transcoding at 90×.
BUG: In this mode 1–3-minute conversations are missed, and leave no trace in the output

>>> How can I avoid whisper missing short conversations?

For example, should the context be periodically destroyed

Q2. When audio is 30h+ the file size exceeds 4 GiB which 32-bit .wav cannot handle producing empty files.

>>> How can I transcode large audio files or infinite streams?

ffmpeg has -rf64 option for RF64 format https://en.wikipedia.org/wiki/RF64
is there better input format than wav?
I would prefer feeding RAW samples, float or specific PCM

Q3. >>> Is there some other way of improving transcoding word-yield considering the below commands?

hardware is 8 GiB RAM 2021 Apple M processors macOS and 2 parallel whisper instances
A custom Go binding is being considered
It is batch execution, so slow transcode is not a problem

Creating the audio stream:

ffmpeg -hide_banner -i i.mp4 -nostdin -vn -ac 1 -ar 16000 -f wav -

whisper command:

./whisper.cpp/build/bin/main --model whisper.cpp/models/ggml-small.en.bin
--file - --output-lrc --output-file o.lrc

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions