Skip to content

How to map timestamp to seconds referring to Issue #3627 #3925

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
shahzebali42 opened this issue Oct 1, 2021 · 4 comments
Closed

How to map timestamp to seconds referring to Issue #3627 #3925

shahzebali42 opened this issue Oct 1, 2021 · 4 comments

Comments

@shahzebali42
Copy link

How to map timestamp to seconds Issue #3627

How to map timestamps to seconds using the formula :

segment_start + timestep/total_frames * segment_duration:

I am getting correct timestamps but i want to find respecting seconds of the audio.
Can anyone explain this formulae?

What do we have to put in these --> segment_start, segment_duration <-- variables to find respective seconds?

My audio's sample rate=16000 , sample width = 2

@stale
Copy link

stale bot commented Mar 2, 2022

This issue has been automatically marked as stale. If this issue is still affecting you, please leave any comment (for example, "bump"), and we'll keep it open. We are sorry that we haven't been able to prioritize it yet. If you have any new additional information, please include it with your comment!

@stale stale bot added the stale label Mar 2, 2022
@stale
Copy link

stale bot commented Apr 17, 2022

Closing this issue after a prolonged period of inactivity. If this issue is still present in the latest release, please create a new issue with up-to-date information. Thank you!

@stale stale bot closed this as completed Apr 17, 2022
@abarcovschi
Copy link

@shahzebali42 did you ever figure out where to get the values for segment_start and segment_duration?

@abarcovschi
Copy link

I figured out how to map from timesteps to seconds for each non-blank token. The formula I used is the following:

sec_time = frame_num * (audio_len / (num_frames * sample_rate))

where:

  • frame_num = the timestep of the symbol, as returned in the 'timesteps' field of Wl2Decoder.decode() outputs.
  • audio_len = the number of samples in the loaded audio file corresponding to the transcript (if using batched w2v2 acoustic model inference, will be zero padded to the length of the longest loaded audio file in the batch).
  • num_frames = the number of frames in the emission matrix returned by the w2v2 acoustic model inference for that audio file (if using batched inference, the number of frames for each audio file will be the same as in this case all loaded audio files are padded to the length of the longest audio file in the batch).
  • sample_rate = sample rate of loaded audio files (usually 16000 Hz).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants