Skip to content

fn.experimental.decoders.video improvements #5814

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 19 commits into from
Feb 21, 2025

Conversation

jantonguirao
Copy link
Contributor

@jantonguirao jantonguirao commented Feb 7, 2025

Category:

New feature

Description:

The VideoDecoder classes have been majorly refactored, implementing various padding strategies for video frames (constant, edge, reflect, and symmetric), and performing code cleanup.

Enhances the video decoder functionality with several key improvements:

  • Added support for frame padding with configurable modes:

    • constant: Fill with specified values: e.g. 0, or (118, 185, 0)
    • edge/repeat: Repeat last valid frame
    • reflect_1001/symmetric: Mirror padding including edge
    • reflect_101/reflect: Mirror padding excluding edge
    • none: Return shorter sequences when insufficient frames
  • Added frame selection options:

    • Arbitrary frame selection via frames argument
    • Sequential frame selection with:
      • start_frame: First frame (included) from the range to decode.
      • end_frame: Last frame (excluded) from the range to decode. Can't be used together with sequence_length
      • sequence_length: Number of frames to return. Can't be used together with end_frame
      • stride: Number of frames to skip between selections
  • Added build index option to control the generation of a frame index

Examples

# Start frame, sequence length and stride, including padding with NVIDIA green 
# decodes frames 10, 12, 14, 16
video_decoder = dali.experimental.decoders.Video(
    encoded=encoded_video,
    start_frame=10, 
    sequence_length=4, 
    stride=2,
    pad_mode="constant",
    fill_value=[118, 185, 0]  # NVIDIA green
)

# Using end_frame instead of sequence_length
# Decodes frames 10, 12, 14, 16, 18
video_decoder = dali.experimental.decoders.Video(
    encoded=encoded_video,
    start_frame=10,
    end_frame=20,  
    stride=2,
    pad_mode="edge"  # Repeat last frame if needed
)

# Extract specific frames
# Decodes frames 0, 10, 11, 12, 20, 30
video_decoder = dali.experimental.decoders.Video(
    encoded=encoded_video,
    frames=[0, 10, 11, 12, 20, 30]
)

Additional information:

Affected modules and functionalities:

Video decoder

Key points relevant for the review:

Tests:

  • Existing tests apply
  • New tests added
    • Python tests
    • GTests
    • Benchmark
    • Other
  • N/A

Checklist

Documentation

  • Existing documentation applies
  • Documentation updated
    • Docstring
    • Doxygen
    • RST
    • Jupyter
    • Other
  • N/A

DALI team only

Requirements

  • Implements new requirements
  • Affects existing requirements
  • N/A

REQ IDs: N/A

JIRA TASK: N/A

@jantonguirao jantonguirao force-pushed the video_improvements branch 2 times, most recently from 0f7deff to 0b87b3d Compare February 7, 2025 14:50
Signed-off-by: Joaquin Anton Guirao <[email protected]>
Signed-off-by: Joaquin Anton Guirao <[email protected]>
Signed-off-by: Joaquin Anton Guirao <[email protected]>
@jantonguirao jantonguirao force-pushed the video_improvements branch 2 times, most recently from c1137a9 to bf3dbf2 Compare February 10, 2025 16:43
Signed-off-by: Joaquin Anton Guirao <[email protected]>
@jantonguirao jantonguirao marked this pull request as ready for review February 11, 2025 11:34
@jantonguirao
Copy link
Contributor Author

!build

@jantonguirao jantonguirao changed the title Video improvements VideoDecoder improvements Feb 11, 2025
@dali-automaton
Copy link
Collaborator

CI MESSAGE: [23788349]: BUILD STARTED

Comment on lines 127 to 120
pad_value_ = spec_.template GetArgument<int>("pad_value");
DALI_ENFORCE(pad_value_ >= 0 && pad_value_ <= 255, "pad_value must be in range [0, 255]");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there any value in extending that to provide or <R, G, B> to have any fill value available?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will do

@dali-automaton
Copy link
Collaborator

CI MESSAGE: [24098675]: BUILD FAILED

Signed-off-by: Joaquin Anton Guirao <[email protected]>
@dali-automaton
Copy link
Collaborator

CI MESSAGE: [24140703]: BUILD STARTED

@dali-automaton
Copy link
Collaborator

CI MESSAGE: [24140703]: BUILD FAILED

@jantonguirao
Copy link
Contributor Author

!build

Signed-off-by: Joaquin Anton Guirao <[email protected]>
@dali-automaton
Copy link
Collaborator

CI MESSAGE: [24202068]: BUILD STARTED

@dali-automaton
Copy link
Collaborator

CI MESSAGE: [24202142]: BUILD STARTED

Signed-off-by: Joaquin Anton Guirao <[email protected]>
@dali-automaton
Copy link
Collaborator

CI MESSAGE: [24206271]: BUILD STARTED

@dali-automaton
Copy link
Collaborator

CI MESSAGE: [24206271]: BUILD FAILED

@dali-automaton
Copy link
Collaborator

CI MESSAGE: [24202068]: BUILD FAILED

Signed-off-by: Joaquin Anton Guirao <[email protected]>
@dali-automaton
Copy link
Collaborator

CI MESSAGE: [24266929]: BUILD STARTED

@dali-automaton
Copy link
Collaborator

CI MESSAGE: [24278482]: BUILD STARTED

@dali-automaton
Copy link
Collaborator

CI MESSAGE: [24278489]: BUILD STARTED

@dali-automaton
Copy link
Collaborator

CI MESSAGE: [24278548]: BUILD STARTED

@@ -0,0 +1,48 @@
// Copyright (c) 2025, NVIDIA CORPORATION & AFFILIATES. All rights reserved.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is much needed feature. Thank you for implementing this.
In the future, I advice making a separate PR out of this.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are right. At some point I should have started to divide things.

* H.265/HEVC
* VP8
* VP9
* MJPEG
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we rely on libavcodec or NVDEC I would just redirect to documentation of both libraries to check for video codec supported. Listing codecs here posses a risk, that no one will ever update the list once underlying libraries will change.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the other hand, we explicitly check that the codec is one of the supported ones by the operator. If I think about it from the point of view of the user of DALI, I'd like to quickly know if my video files are going to be supported by the operator, without having to jump to the documentation of a dependency. I'd say both approaches have their value

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also new codec may require DALI to readjust the parsing code. It is not that transparent as image decoding.

namespace dali {

int MemoryVideoFile::Read(unsigned char *buffer, int buffer_size) {
int left_in_file = size_ - position_;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is possible, that (size_ - position_) < 0, e.g.:

  1. Seek was called with new_position > size_ and mode==SEEK_SET
  2. Read is called and position_ is > size_

Should we have an assert for such situation or change the below condition to:
if (left_in_file <= 0 )

DALI_ENFORCE(!(sequence_length_.HasValue() && end_frame_.HasValue()),
"Cannot specify both `sequence_length` and `end_frame` arguments");

auto pad_mode_str = spec_.template GetArgument<std::string>("pad_mode");
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could if/else block be replaced with the std::unordered_map, so this code could be replaced with:

if (boundary_types.contains(pad_mode_str)) 
    boundar_type_ = boundary_types [pad_mode_str] 

make_string("end_frame (", end, ") must be greater than start_frame (", start,
"), for sample #", sample_idx));
sequence_len =
(end - start + stride - 1) / stride; // Round up to include all frames in [start,end)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It could be too much for this PR, just a food for thought - maybe the user can provide seq_len and end frame with stride and skip start frame?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe in a follow up someday? :)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, just food for thought.

@jantonguirao jantonguirao changed the title VideoDecoder improvements fn.experimental.decoders.video improvements Feb 20, 2025
@dali-automaton
Copy link
Collaborator

CI MESSAGE: [24278548]: BUILD FAILED

@dali-automaton
Copy link
Collaborator

CI MESSAGE: [24266929]: BUILD FAILED

Signed-off-by: Joaquin Anton Guirao <[email protected]>
@dali-automaton
Copy link
Collaborator

CI MESSAGE: [24289501]: BUILD STARTED

@dali-automaton
Copy link
Collaborator

CI MESSAGE: [24289501]: BUILD FAILED

@jantonguirao jantonguirao merged commit dfbab0f into NVIDIA:main Feb 21, 2025
5 of 6 checks passed
5had3z pushed a commit to 5had3z/DALI that referenced this pull request Feb 23, 2025
The VideoDecoder classes have been majorly refactored, implementing various padding strategies for video frames (constant, edge, reflect, and symmetric), and performing code cleanup.

Enhances the video decoder functionality with several key improvements:

- Added support for frame padding with configurable modes:

  * `constant`: Fill with specified values: e.g. `0`, or `(118, 185, 0)`
  * `edge`/`repeat`: Repeat last valid frame
  * `reflect_1001`/`symmetric`: Mirror padding including edge
  * `reflect_101`/`reflect`: Mirror padding excluding edge
  * `none`: Return shorter sequences when insufficient frames

- Added frame selection options:

  * Arbitrary frame selection via `frames` argument
  * Sequential frame selection with:
    - `start_frame`: First frame (included) from the range to decode.
    - `end_frame`: Last frame (excluded) from the range to decode. Can't be used together with `sequence_length`
    - `sequence_length`: Number of frames to return. Can't be used together with `end_frame`
    - `stride`: Number of frames to skip between selections

- Added `build index` option to control the generation of a frame index

Examples
```
# Start frame, sequence length and stride, including padding with NVIDIA green 
# decodes frames 10, 12, 14, 16
video_decoder = dali.experimental.decoders.Video(
    encoded=encoded_video,
    start_frame=10, 
    sequence_length=4, 
    stride=2,
    pad_mode="constant",
    fill_value=[118, 185, 0]  # NVIDIA green
)

# Using end_frame instead of sequence_length
# Decodes frames 10, 12, 14, 16, 18
video_decoder = dali.experimental.decoders.Video(
    encoded=encoded_video,
    start_frame=10,
    end_frame=20,  
    stride=2,
    pad_mode="edge"  # Repeat last frame if needed
)

# Extract specific frames
# Decodes frames 0, 10, 11, 12, 20, 30
video_decoder = dali.experimental.decoders.Video(
    encoded=encoded_video,
    frames=[0, 10, 11, 12, 20, 30]
)
```

Signed-off-by: Joaquin Anton Guirao <[email protected]>
5had3z pushed a commit to 5had3z/DALI that referenced this pull request Feb 23, 2025
The VideoDecoder classes have been majorly refactored, implementing various padding strategies for video frames (constant, edge, reflect, and symmetric), and performing code cleanup.

Enhances the video decoder functionality with several key improvements:

- Added support for frame padding with configurable modes:

  * `constant`: Fill with specified values: e.g. `0`, or `(118, 185, 0)`
  * `edge`/`repeat`: Repeat last valid frame
  * `reflect_1001`/`symmetric`: Mirror padding including edge
  * `reflect_101`/`reflect`: Mirror padding excluding edge
  * `none`: Return shorter sequences when insufficient frames

- Added frame selection options:

  * Arbitrary frame selection via `frames` argument
  * Sequential frame selection with:
    - `start_frame`: First frame (included) from the range to decode.
    - `end_frame`: Last frame (excluded) from the range to decode. Can't be used together with `sequence_length`
    - `sequence_length`: Number of frames to return. Can't be used together with `end_frame`
    - `stride`: Number of frames to skip between selections

- Added `build index` option to control the generation of a frame index

Examples
```
# Start frame, sequence length and stride, including padding with NVIDIA green 
# decodes frames 10, 12, 14, 16
video_decoder = dali.experimental.decoders.Video(
    encoded=encoded_video,
    start_frame=10, 
    sequence_length=4, 
    stride=2,
    pad_mode="constant",
    fill_value=[118, 185, 0]  # NVIDIA green
)

# Using end_frame instead of sequence_length
# Decodes frames 10, 12, 14, 16, 18
video_decoder = dali.experimental.decoders.Video(
    encoded=encoded_video,
    start_frame=10,
    end_frame=20,  
    stride=2,
    pad_mode="edge"  # Repeat last frame if needed
)

# Extract specific frames
# Decodes frames 0, 10, 11, 12, 20, 30
video_decoder = dali.experimental.decoders.Video(
    encoded=encoded_video,
    frames=[0, 10, 11, 12, 20, 30]
)
```

Signed-off-by: Joaquin Anton Guirao <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants