Open
Description
Is your feature request related to a problem? Please describe
I find that dpct::dp4a calls a cuda device function for the cuda backend. Following your implementation, a function may be added for the hip backend intel/llvm#16848
Describe the solution you would like
No response
Describe alternatives you have considered
No response
Additional context
No response