Return valid for all-nulls in reduce() with nunique include-nulls aggregation #19196

davidwendt · 2025-06-18T15:35:23Z

Description

Adds specialized handling of nunique aggregation with include-nulls setting for cudf::reduce() when the input column is all nulls. This is consistent with cudf::distinct_count() result.

Closes #19184

The reductions.cpp code was reworked with a utility function to handle all of the many empty/all-null cases for various aggregrations that cudf::reduce() supports. Also, the aggregate-dispatcher call was removed since all agg-kinds were executed by a single functor operator with a switch statement.

Checklist

I am familiar with the Contributing Guidelines.
New or existing tests cover these changes.
The documentation is up to date with these changes.

davidwendt · 2025-06-18T15:36:39Z

cpp/src/reductions/reductions.cpp

+  data_type output_dtype,
+  std::optional<std::reference_wrapper<scalar const>> init,
+  rmm::cuda_stream_view stream,
+  rmm::device_async_resource_ref mr)


This code has not changed. The functor operator() was simply changed to a regular function call.
So the change just moved the code logic to the left.

davidwendt · 2025-06-18T15:38:04Z

cpp/src/reductions/reductions.cpp

+                                            column_view col,
+                                            data_type output_dtype,
+                                            rmm::cuda_stream_view stream,
+                                            rmm::device_async_resource_ref mr)


This function just consolidates the many cases that were in the if-empty-all-nulls statement in the reduce() function below.
The main new change is the case NUNIQUE:

TomAugspurger · 2025-06-18T16:20:24Z

Thanks @davidwendt. Do you think the docs corresponding to https://docs.rapids.ai/api/cudf/stable/libcudf_docs/api_docs/aggregation_reduction/ need to be updated? Specifically the bit you pointed out yesterday:

If the column is empty or contains all null entries col.size()==col.null_count(), the output scalar value will be false for reduction type any and true for reduction type all. For all other reductions, the output scalar returns with is_valid()==false.

davidwendt · 2025-06-18T16:28:58Z

Thanks @davidwendt. Do you think the docs corresponding to https://docs.rapids.ai/api/cudf/stable/libcudf_docs/api_docs/aggregation_reduction/ need to be updated? Specifically the bit you pointed out yesterday:

Yes, I could use some advice on that. I don't know about listing all the special cases there.
Most of the time we still return None/null so I'm thinking of adding something vague like

For empty or all-null input, the result is generally a null scalar except for certain specific aggregations.

I suppose I could list the aggregations without specifically mentioning the result. Or put in a table with the results though that could get complicated.

TomAugspurger · 2025-06-18T18:30:05Z

I like your suggestion. Maybe modified slightly

For empty or all-null input, the result is generally a null scalar except for specific aggregations where the aggregation has a well-defined output for an empty input.

davidwendt · 2025-06-18T19:03:37Z

Marking this as a breaking change since the returned result has changed for the specific input case for the specific aggregation.

vyasr

I think this approach makes sense. As far as docs, a table would be nice, but it would probably get out of date. The fancy solution I would use for a Python library that doesn't require GPUs to run would be to inject the values during doc build by executing the necessary code, but since that's not something we can easily do I am fine sticking with what you have written.

vuule · 2025-06-24T22:56:13Z

How come a test for the affected case is not included in this PR? (ignore if I missed a discussion about this)

davidwendt · 2025-06-24T23:15:48Z

How come a test for the affected case is not included in this PR? (ignore if I missed a discussion about this)

You are right. I should include a gtest for this.

Return valid for all-nulls in reduce() with nunique include-nulls agg

89e548e

davidwendt self-assigned this Jun 18, 2025

davidwendt requested a review from a team as a code owner June 18, 2025 15:35

davidwendt requested review from kingcrimsontianyu and vuule June 18, 2025 15:35

davidwendt added feature request New feature or request 3 - Ready for Review Ready for review by team libcudf Affects libcudf (C++/CUDA) code. non-breaking Non-breaking change labels Jun 18, 2025

davidwendt commented Jun 18, 2025

View reviewed changes

davidwendt added breaking Breaking change and removed non-breaking Non-breaking change labels Jun 18, 2025

update doxygen for empty/all-null case

2915fcb

TomAugspurger mentioned this pull request Jun 18, 2025

Configurable blocksize mode for streaming executor in unit tests #19146

Draft

bdice approved these changes Jun 24, 2025

View reviewed changes

vyasr approved these changes Jun 24, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Return valid for all-nulls in reduce() with nunique include-nulls aggregation #19196

Return valid for all-nulls in reduce() with nunique include-nulls aggregation #19196

Uh oh!

davidwendt commented Jun 18, 2025

Uh oh!

davidwendt Jun 18, 2025

Uh oh!

davidwendt Jun 18, 2025

Uh oh!

TomAugspurger commented Jun 18, 2025

Uh oh!

davidwendt commented Jun 18, 2025

Uh oh!

TomAugspurger commented Jun 18, 2025

Uh oh!

davidwendt commented Jun 18, 2025

Uh oh!

vyasr left a comment

Uh oh!

vuule commented Jun 24, 2025

Uh oh!

davidwendt commented Jun 24, 2025

Uh oh!

Uh oh!

Return valid for all-nulls in reduce() with nunique include-nulls aggregation #19196

Are you sure you want to change the base?

Return valid for all-nulls in reduce() with nunique include-nulls aggregation #19196

Uh oh!

Conversation

davidwendt commented Jun 18, 2025

Description

Checklist

Uh oh!

davidwendt Jun 18, 2025

Choose a reason for hiding this comment

Uh oh!

davidwendt Jun 18, 2025

Choose a reason for hiding this comment

Uh oh!

TomAugspurger commented Jun 18, 2025

Uh oh!

davidwendt commented Jun 18, 2025

Uh oh!

TomAugspurger commented Jun 18, 2025

Uh oh!

davidwendt commented Jun 18, 2025

Uh oh!

vyasr left a comment

Choose a reason for hiding this comment

Uh oh!

vuule commented Jun 24, 2025

Uh oh!

davidwendt commented Jun 24, 2025

Uh oh!

Uh oh!