-
Notifications
You must be signed in to change notification settings - Fork 639
fn.experimental.decoders.video
improvements
#5814
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
0f7deff
to
0b87b3d
Compare
Signed-off-by: Joaquin Anton Guirao <[email protected]>
Signed-off-by: Joaquin Anton Guirao <[email protected]>
Signed-off-by: Joaquin Anton Guirao <[email protected]>
c1137a9
to
bf3dbf2
Compare
Signed-off-by: Joaquin Anton Guirao <[email protected]>
bf3dbf2
to
85df30a
Compare
!build |
CI MESSAGE: [23788349]: BUILD STARTED |
pad_value_ = spec_.template GetArgument<int>("pad_value"); | ||
DALI_ENFORCE(pad_value_ >= 0 && pad_value_ <= 255, "pad_value must be in range [0, 255]"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there any value in extending that to provide or <R, G, B> to have any fill value available?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will do
CI MESSAGE: [24098675]: BUILD FAILED |
Signed-off-by: Joaquin Anton Guirao <[email protected]>
CI MESSAGE: [24140703]: BUILD STARTED |
CI MESSAGE: [24140703]: BUILD FAILED |
!build |
Signed-off-by: Joaquin Anton Guirao <[email protected]>
0775f2c
to
678ff2e
Compare
CI MESSAGE: [24202068]: BUILD STARTED |
CI MESSAGE: [24202142]: BUILD STARTED |
Signed-off-by: Joaquin Anton Guirao <[email protected]>
CI MESSAGE: [24206271]: BUILD STARTED |
CI MESSAGE: [24206271]: BUILD FAILED |
CI MESSAGE: [24202068]: BUILD FAILED |
Signed-off-by: Joaquin Anton Guirao <[email protected]>
CI MESSAGE: [24266929]: BUILD STARTED |
CI MESSAGE: [24278482]: BUILD STARTED |
CI MESSAGE: [24278489]: BUILD STARTED |
CI MESSAGE: [24278548]: BUILD STARTED |
@@ -0,0 +1,48 @@ | |||
// Copyright (c) 2025, NVIDIA CORPORATION & AFFILIATES. All rights reserved. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is much needed feature. Thank you for implementing this.
In the future, I advice making a separate PR out of this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You are right. At some point I should have started to divide things.
* H.265/HEVC | ||
* VP8 | ||
* VP9 | ||
* MJPEG |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we rely on libavcodec or NVDEC I would just redirect to documentation of both libraries to check for video codec supported. Listing codecs here posses a risk, that no one will ever update the list once underlying libraries will change.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On the other hand, we explicitly check that the codec is one of the supported ones by the operator. If I think about it from the point of view of the user of DALI, I'd like to quickly know if my video files are going to be supported by the operator, without having to jump to the documentation of a dependency. I'd say both approaches have their value
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also new codec may require DALI to readjust the parsing code. It is not that transparent as image decoding.
namespace dali { | ||
|
||
int MemoryVideoFile::Read(unsigned char *buffer, int buffer_size) { | ||
int left_in_file = size_ - position_; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is possible, that (size_ - position_) < 0, e.g.:
- Seek was called with
new_position > size_
andmode==SEEK_SET
- Read is called and position_ is > size_
Should we have an assert for such situation or change the below condition to:
if (left_in_file <= 0 )
DALI_ENFORCE(!(sequence_length_.HasValue() && end_frame_.HasValue()), | ||
"Cannot specify both `sequence_length` and `end_frame` arguments"); | ||
|
||
auto pad_mode_str = spec_.template GetArgument<std::string>("pad_mode"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could if/else block be replaced with the std::unordered_map, so this code could be replaced with:
if (boundary_types.contains(pad_mode_str))
boundar_type_ = boundary_types [pad_mode_str]
make_string("end_frame (", end, ") must be greater than start_frame (", start, | ||
"), for sample #", sample_idx)); | ||
sequence_len = | ||
(end - start + stride - 1) / stride; // Round up to include all frames in [start,end) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It could be too much for this PR, just a food for thought - maybe the user can provide seq_len and end frame with stride and skip start frame?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe in a follow up someday? :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, just food for thought.
fn.experimental.decoders.video
improvements
CI MESSAGE: [24278548]: BUILD FAILED |
CI MESSAGE: [24266929]: BUILD FAILED |
Signed-off-by: Joaquin Anton Guirao <[email protected]>
CI MESSAGE: [24289501]: BUILD STARTED |
CI MESSAGE: [24289501]: BUILD FAILED |
The VideoDecoder classes have been majorly refactored, implementing various padding strategies for video frames (constant, edge, reflect, and symmetric), and performing code cleanup. Enhances the video decoder functionality with several key improvements: - Added support for frame padding with configurable modes: * `constant`: Fill with specified values: e.g. `0`, or `(118, 185, 0)` * `edge`/`repeat`: Repeat last valid frame * `reflect_1001`/`symmetric`: Mirror padding including edge * `reflect_101`/`reflect`: Mirror padding excluding edge * `none`: Return shorter sequences when insufficient frames - Added frame selection options: * Arbitrary frame selection via `frames` argument * Sequential frame selection with: - `start_frame`: First frame (included) from the range to decode. - `end_frame`: Last frame (excluded) from the range to decode. Can't be used together with `sequence_length` - `sequence_length`: Number of frames to return. Can't be used together with `end_frame` - `stride`: Number of frames to skip between selections - Added `build index` option to control the generation of a frame index Examples ``` # Start frame, sequence length and stride, including padding with NVIDIA green # decodes frames 10, 12, 14, 16 video_decoder = dali.experimental.decoders.Video( encoded=encoded_video, start_frame=10, sequence_length=4, stride=2, pad_mode="constant", fill_value=[118, 185, 0] # NVIDIA green ) # Using end_frame instead of sequence_length # Decodes frames 10, 12, 14, 16, 18 video_decoder = dali.experimental.decoders.Video( encoded=encoded_video, start_frame=10, end_frame=20, stride=2, pad_mode="edge" # Repeat last frame if needed ) # Extract specific frames # Decodes frames 0, 10, 11, 12, 20, 30 video_decoder = dali.experimental.decoders.Video( encoded=encoded_video, frames=[0, 10, 11, 12, 20, 30] ) ``` Signed-off-by: Joaquin Anton Guirao <[email protected]>
The VideoDecoder classes have been majorly refactored, implementing various padding strategies for video frames (constant, edge, reflect, and symmetric), and performing code cleanup. Enhances the video decoder functionality with several key improvements: - Added support for frame padding with configurable modes: * `constant`: Fill with specified values: e.g. `0`, or `(118, 185, 0)` * `edge`/`repeat`: Repeat last valid frame * `reflect_1001`/`symmetric`: Mirror padding including edge * `reflect_101`/`reflect`: Mirror padding excluding edge * `none`: Return shorter sequences when insufficient frames - Added frame selection options: * Arbitrary frame selection via `frames` argument * Sequential frame selection with: - `start_frame`: First frame (included) from the range to decode. - `end_frame`: Last frame (excluded) from the range to decode. Can't be used together with `sequence_length` - `sequence_length`: Number of frames to return. Can't be used together with `end_frame` - `stride`: Number of frames to skip between selections - Added `build index` option to control the generation of a frame index Examples ``` # Start frame, sequence length and stride, including padding with NVIDIA green # decodes frames 10, 12, 14, 16 video_decoder = dali.experimental.decoders.Video( encoded=encoded_video, start_frame=10, sequence_length=4, stride=2, pad_mode="constant", fill_value=[118, 185, 0] # NVIDIA green ) # Using end_frame instead of sequence_length # Decodes frames 10, 12, 14, 16, 18 video_decoder = dali.experimental.decoders.Video( encoded=encoded_video, start_frame=10, end_frame=20, stride=2, pad_mode="edge" # Repeat last frame if needed ) # Extract specific frames # Decodes frames 0, 10, 11, 12, 20, 30 video_decoder = dali.experimental.decoders.Video( encoded=encoded_video, frames=[0, 10, 11, 12, 20, 30] ) ``` Signed-off-by: Joaquin Anton Guirao <[email protected]>
Category:
New feature
Description:
The VideoDecoder classes have been majorly refactored, implementing various padding strategies for video frames (constant, edge, reflect, and symmetric), and performing code cleanup.
Enhances the video decoder functionality with several key improvements:
Added support for frame padding with configurable modes:
constant
: Fill with specified values: e.g.0
, or(118, 185, 0)
edge
/repeat
: Repeat last valid framereflect_1001
/symmetric
: Mirror padding including edgereflect_101
/reflect
: Mirror padding excluding edgenone
: Return shorter sequences when insufficient framesAdded frame selection options:
frames
argumentstart_frame
: First frame (included) from the range to decode.end_frame
: Last frame (excluded) from the range to decode. Can't be used together withsequence_length
sequence_length
: Number of frames to return. Can't be used together withend_frame
stride
: Number of frames to skip between selectionsAdded
build index
option to control the generation of a frame indexExamples
Additional information:
Affected modules and functionalities:
Video decoder
Key points relevant for the review:
Tests:
Checklist
Documentation
DALI team only
Requirements
REQ IDs: N/A
JIRA TASK: N/A