Open
Description
Feature request
Pytorch is migrating video processing to torchcodec and it's pretty cool. It would be nice to migrate both the audio and video features to use torchcodec instead of torchaudio/video.
Motivation
My use case is I'm working on a multimodal AV model, and what's nice about torchcodec is I can extract the audio tensors directly from MP4 files. Also, I can easily resample video data to whatever fps I like on the fly. I haven't found an easy/efficient way to do this with torchvision.
Your contribution
I’m modifying the Video dataclass to use torchcodec in place of the current backend, starting from a stable commit for a project I’m working on. If it ends up working well, I’m happy to open a PR on main.