-
Notifications
You must be signed in to change notification settings - Fork 952
Return valid for all-nulls in reduce() with nunique include-nulls aggregation #19196
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: branch-25.08
Are you sure you want to change the base?
Return valid for all-nulls in reduce() with nunique include-nulls aggregation #19196
Conversation
data_type output_dtype, | ||
std::optional<std::reference_wrapper<scalar const>> init, | ||
rmm::cuda_stream_view stream, | ||
rmm::device_async_resource_ref mr) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This code has not changed. The functor operator()
was simply changed to a regular function call.
So the change just moved the code logic to the left.
column_view col, | ||
data_type output_dtype, | ||
rmm::cuda_stream_view stream, | ||
rmm::device_async_resource_ref mr) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This function just consolidates the many cases that were in the if-empty-all-nulls statement in the reduce()
function below.
The main new change is the case NUNIQUE:
Thanks @davidwendt. Do you think the docs corresponding to https://docs.rapids.ai/api/cudf/stable/libcudf_docs/api_docs/aggregation_reduction/ need to be updated? Specifically the bit you pointed out yesterday:
|
Yes, I could use some advice on that. I don't know about listing all the special cases there.
I suppose I could list the aggregations without specifically mentioning the result. Or put in a table with the results though that could get complicated. |
I like your suggestion. Maybe modified slightly
|
Marking this as a breaking change since the returned result has changed for the specific input case for the specific aggregation. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this approach makes sense. As far as docs, a table would be nice, but it would probably get out of date. The fancy solution I would use for a Python library that doesn't require GPUs to run would be to inject the values during doc build by executing the necessary code, but since that's not something we can easily do I am fine sticking with what you have written.
How come a test for the affected case is not included in this PR? (ignore if I missed a discussion about this) |
You are right. I should include a gtest for this. |
Description
Adds specialized handling of nunique aggregation with include-nulls setting for
cudf::reduce()
when the input column is all nulls. This is consistent withcudf::distinct_count()
result.Closes #19184
The
reductions.cpp
code was reworked with a utility function to handle all of the many empty/all-null cases for various aggregrations thatcudf::reduce()
supports. Also, the aggregate-dispatcher call was removed since all agg-kinds were executed by a single functor operator with a switch statement.Checklist