whisper-cli 1.7.4 250113 discards model large-v3-turbo segment output from upstream-whisper

Q1. When whisper.cpp transcodes a 5-hour audio file finding mostly noise, it enters a fast mode transcoding at 90×.
BUG: In this mode 1–3-minute conversations are missed, and leave no trace in the output

**>>> How can I avoid whisper missing short conversations?**

For example, should the context be periodically destroyed

Q2. When audio is 30h+ the file size exceeds 4 GiB which 32-bit .wav cannot handle producing empty files. 

**>>> How can I transcode large audio files or infinite streams?**

ffmpeg has -rf64 option for RF64 format https://en.wikipedia.org/wiki/RF64
is there better input format than wav?
I would prefer feeding RAW samples, float or specific PCM

Q3. >>> **Is there some other way of improving transcoding word-yield considering the below commands?**

hardware is 8 GiB RAM 2021 Apple M processors macOS and 2 parallel whisper instances
A custom Go binding is being considered
It is batch execution, so slow transcode is not a problem

Creating the audio stream:
```
ffmpeg -hide_banner -i i.mp4 -nostdin -vn -ac 1 -ar 16000 -f wav -
```
whisper command:
```
./whisper.cpp/build/bin/main --model whisper.cpp/models/ggml-small.en.bin
--file - --output-lrc --output-file o.lrc
```

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

whisper-cli 1.7.4 250113 discards model large-v3-turbo segment output from upstream-whisper #2696

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

whisper-cli 1.7.4 250113 discards model large-v3-turbo segment output from upstream-whisper #2696

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions