`fn.experimental.decoders.video` improvements #5814

jantonguirao · 2025-02-07T14:24:47Z

Category:

New feature

Description:

The VideoDecoder classes have been majorly refactored, implementing various padding strategies for video frames (constant, edge, reflect, and symmetric), and performing code cleanup.

Enhances the video decoder functionality with several key improvements:

Added support for frame padding with configurable modes:
- constant: Fill with specified values: e.g. 0, or (118, 185, 0)
- edge/repeat: Repeat last valid frame
- reflect_1001/symmetric: Mirror padding including edge
- reflect_101/reflect: Mirror padding excluding edge
- none: Return shorter sequences when insufficient frames
Added frame selection options:
- Arbitrary frame selection via frames argument
- Sequential frame selection with:
  - start_frame: First frame (included) from the range to decode.
  - end_frame: Last frame (excluded) from the range to decode. Can't be used together with sequence_length
  - sequence_length: Number of frames to return. Can't be used together with end_frame
  - stride: Number of frames to skip between selections
Added build index option to control the generation of a frame index

Examples

# Start frame, sequence length and stride, including padding with NVIDIA green 
# decodes frames 10, 12, 14, 16
video_decoder = dali.experimental.decoders.Video(
    encoded=encoded_video,
    start_frame=10, 
    sequence_length=4, 
    stride=2,
    pad_mode="constant",
    fill_value=[118, 185, 0]  # NVIDIA green
)

# Using end_frame instead of sequence_length
# Decodes frames 10, 12, 14, 16, 18
video_decoder = dali.experimental.decoders.Video(
    encoded=encoded_video,
    start_frame=10,
    end_frame=20,  
    stride=2,
    pad_mode="edge"  # Repeat last frame if needed
)

# Extract specific frames
# Decodes frames 0, 10, 11, 12, 20, 30
video_decoder = dali.experimental.decoders.Video(
    encoded=encoded_video,
    frames=[0, 10, 11, 12, 20, 30]
)

Additional information:

Affected modules and functionalities:

Video decoder

Key points relevant for the review:

Tests:

Checklist

Documentation

DALI team only

Requirements

Implements new requirements
Affects existing requirements
N/A

REQ IDs: N/A

JIRA TASK: N/A

dali/operators/decoder/video/video_decoder_base.h

dali/operators/decoder/video/video_decoder_cpu.cc

dali/operators/input/video_input.h

dali/test/python/decoder/test_video.py

Signed-off-by: Joaquin Anton Guirao <[email protected]>

jantonguirao · 2025-02-11T11:35:47Z

!build

dali-automaton · 2025-02-11T11:42:28Z

CI MESSAGE: [23788349]: BUILD STARTED

dali/operators/decoder/video/video_decoder_base.h

JanuszL · 2025-02-11T13:33:13Z

dali/operators/decoder/video/video_decoder_base.h

+      pad_value_ = spec_.template GetArgument<int>("pad_value");
+      DALI_ENFORCE(pad_value_ >= 0 && pad_value_ <= 255, "pad_value must be in range [0, 255]");


Is there any value in extending that to provide or <R, G, B> to have any fill value available?

dali/operators/decoder/video/video_decoder_base.h

dali-automaton · 2025-02-18T07:13:39Z

CI MESSAGE: [24098675]: BUILD FAILED

Signed-off-by: Joaquin Anton Guirao <[email protected]>

dali-automaton · 2025-02-18T15:45:43Z

CI MESSAGE: [24140703]: BUILD STARTED

dali-automaton · 2025-02-18T23:41:36Z

CI MESSAGE: [24140703]: BUILD FAILED

jantonguirao · 2025-02-19T15:02:38Z

!build

Signed-off-by: Joaquin Anton Guirao <[email protected]>

dali-automaton · 2025-02-19T15:06:25Z

CI MESSAGE: [24202068]: BUILD STARTED

dali-automaton · 2025-02-19T15:07:35Z

CI MESSAGE: [24202142]: BUILD STARTED

Signed-off-by: Joaquin Anton Guirao <[email protected]>

dali-automaton · 2025-02-19T16:12:57Z

CI MESSAGE: [24206271]: BUILD STARTED

dali-automaton · 2025-02-20T06:26:13Z

CI MESSAGE: [24206271]: BUILD FAILED

dali-automaton · 2025-02-20T06:26:56Z

CI MESSAGE: [24202068]: BUILD FAILED

Signed-off-by: Joaquin Anton Guirao <[email protected]>

dali-automaton · 2025-02-20T08:44:24Z

CI MESSAGE: [24266929]: BUILD STARTED

dali-automaton · 2025-02-20T12:24:23Z

CI MESSAGE: [24278482]: BUILD STARTED

dali-automaton · 2025-02-20T12:24:25Z

CI MESSAGE: [24278489]: BUILD STARTED

dali-automaton · 2025-02-20T12:25:16Z

CI MESSAGE: [24278548]: BUILD STARTED

dali/operators/decoder/video/video_decoder_base.h

mdabek-nvidia · 2025-02-20T09:22:09Z

dali/kernels/common/memset.h

@@ -0,0 +1,48 @@
+// Copyright (c) 2025, NVIDIA CORPORATION & AFFILIATES. All rights reserved.


This is much needed feature. Thank you for implementing this.
In the future, I advice making a separate PR out of this.

You are right. At some point I should have started to divide things.

mdabek-nvidia · 2025-02-20T10:22:55Z

dali/operators/decoder/video/video_decoder_cpu.cc

+* H.265/HEVC
+* VP8
+* VP9
+* MJPEG


If we rely on libavcodec or NVDEC I would just redirect to documentation of both libraries to check for video codec supported. Listing codecs here posses a risk, that no one will ever update the list once underlying libraries will change.

On the other hand, we explicitly check that the codec is one of the supported ones by the operator. If I think about it from the point of view of the user of DALI, I'd like to quickly know if my video files are going to be supported by the operator, without having to jump to the documentation of a dependency. I'd say both approaches have their value

Also new codec may require DALI to readjust the parsing code. It is not that transparent as image decoding.

dali/operators/decoder/video/video_decoder_cpu.cc

mdabek-nvidia · 2025-02-20T12:23:06Z

dali/operators/reader/loader/video/frames_decoder_base.cc

+namespace dali {
+
+int MemoryVideoFile::Read(unsigned char *buffer, int buffer_size) {
+  int left_in_file = size_ - position_;


It is possible, that (size_ - position_) < 0, e.g.:

Seek was called with new_position > size_ and mode==SEEK_SET

Read is called and position_ is > size_

Should we have an assert for such situation or change the below condition to:
if (left_in_file <= 0 )

mdabek-nvidia · 2025-02-20T12:38:32Z

dali/operators/decoder/video/video_decoder_base.h

+    DALI_ENFORCE(!(sequence_length_.HasValue() && end_frame_.HasValue()),
+                 "Cannot specify both `sequence_length` and `end_frame` arguments");
+
+    auto pad_mode_str = spec_.template GetArgument<std::string>("pad_mode");


Could if/else block be replaced with the std::unordered_map, so this code could be replaced with:

if (boundary_types.contains(pad_mode_str)) boundar_type_ = boundary_types [pad_mode_str]

JanuszL · 2025-02-20T12:41:06Z

dali/operators/decoder/video/video_decoder_base.h

+                   make_string("end_frame (", end, ") must be greater than start_frame (", start,
+                               "), for sample #", sample_idx));
+      sequence_len =
+          (end - start + stride - 1) / stride;  // Round up to include all frames in [start,end)


It could be too much for this PR, just a food for thought - maybe the user can provide seq_len and end frame with stride and skip start frame?

maybe in a follow up someday? :)

Sure, just food for thought.

dali-automaton · 2025-02-20T13:38:22Z

CI MESSAGE: [24278548]: BUILD FAILED

dali-automaton · 2025-02-20T14:22:30Z

CI MESSAGE: [24266929]: BUILD FAILED

Signed-off-by: Joaquin Anton Guirao <[email protected]>

dali-automaton · 2025-02-20T17:32:06Z

CI MESSAGE: [24289501]: BUILD STARTED

dali-automaton · 2025-02-21T01:01:42Z

CI MESSAGE: [24289501]: BUILD FAILED

The VideoDecoder classes have been majorly refactored, implementing various padding strategies for video frames (constant, edge, reflect, and symmetric), and performing code cleanup. Enhances the video decoder functionality with several key improvements: - Added support for frame padding with configurable modes: * `constant`: Fill with specified values: e.g. `0`, or `(118, 185, 0)` * `edge`/`repeat`: Repeat last valid frame * `reflect_1001`/`symmetric`: Mirror padding including edge * `reflect_101`/`reflect`: Mirror padding excluding edge * `none`: Return shorter sequences when insufficient frames - Added frame selection options: * Arbitrary frame selection via `frames` argument * Sequential frame selection with: - `start_frame`: First frame (included) from the range to decode. - `end_frame`: Last frame (excluded) from the range to decode. Can't be used together with `sequence_length` - `sequence_length`: Number of frames to return. Can't be used together with `end_frame` - `stride`: Number of frames to skip between selections - Added `build index` option to control the generation of a frame index Examples ``` # Start frame, sequence length and stride, including padding with NVIDIA green # decodes frames 10, 12, 14, 16 video_decoder = dali.experimental.decoders.Video( encoded=encoded_video, start_frame=10, sequence_length=4, stride=2, pad_mode="constant", fill_value=[118, 185, 0] # NVIDIA green ) # Using end_frame instead of sequence_length # Decodes frames 10, 12, 14, 16, 18 video_decoder = dali.experimental.decoders.Video( encoded=encoded_video, start_frame=10, end_frame=20, stride=2, pad_mode="edge" # Repeat last frame if needed ) # Extract specific frames # Decodes frames 0, 10, 11, 12, 20, 30 video_decoder = dali.experimental.decoders.Video( encoded=encoded_video, frames=[0, 10, 11, 12, 20, 30] ) ``` Signed-off-by: Joaquin Anton Guirao <[email protected]>

jantonguirao force-pushed the video_improvements branch 2 times, most recently from 0f7deff to 0b87b3d Compare February 7, 2025 14:50