-
Notifications
You must be signed in to change notification settings - Fork 551
[HIP][device] 4 __shfl_sync functions are missing #1491
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
|
We have some work left in the device compiler to support certain cuda 9 device side features such as the sync APIs. Also note that most AMD devices have a "warp size" of 64, so any code using a 32 bit mask is already broken. |
Hi, I was trying to hipify a code and there are a few calls to __shfl_down_sync. The __shfl_down is deprecated, so it can not be used with CUDA 11. What would be the best approach? |
@gmarkomanolis What I do when using
The specific constant I'm using there ( |
|
What do you mean by |
I think mentions of |
https://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#warp-shuffle-functions Deprecation Notice: __shfl, __shfl_up, __shfl_down, and __shfl_xor have been deprecated in CUDA 9.0 for all devices. Removal Notice: When targeting devices with compute capability 7.x or higher, __shfl, __shfl_up, __shfl_down, and __shfl_xor are no longer available and their sync variants should be used instead. I will check though if it is on the code's side as it is not mine. Thanks for the answer. |
Thanks a lot. |
Hey, @emankov, any update on |
Any update on this? I am specifically looking for a solution to __shfl_sync |
If your code uses a mask of 0xffffffff, then you can just replace your _sync calls with the non-sync ones and it should work fine. |
Merge `source/lib/src/cuda` and `source/lib/src/rocm` into `source/lib/src/gpu`. - Define macros `gpuGetLastError`, `gpuDeviceSynchronize`, `gpuMemcpy`, `gpuMemcpyDeviceToHost`, `gpuMemcpyHostToDevice`, and `gpuMemset` to make them available for both CUDA and ROCm. - Use `<<< >>> syntax` for both CUDA and ROCm. Per ROCm/hip@cf78d85, it has been supported in HIP since 2018. - Fix several int const numbers that should be double or float. - For tabulate: - Fix `WARP_SIZE` for ROCm. Per pytorch/pytorch#64302, WARP_SIZE can be 32 or 64, so it should not be hardcoded to 64. - Add `GpuShuffleSync`. Per ROCm/hip#1491, `__shfl_sync` is not supported by HIP. - After merging the code, #1274 should also work for ROCm. - Use the same `ii` for #830 and #2357. Although both of them work, `ii` has different meanings in these two PRs, but now it should be the same. - However, `ii` in `tabulate_fusion_se_a_fifth_order_polynomial` (rocm) added by #2532 is wrong. After merging the codes, it should be corrected. - Optimization in #830 was not applied to ROCm. - `__syncwarp` is not supported by ROCm. - After merging the code, #2661 will be applied to ROCm. Although TF ROCm stream is still blocking (https://github.com/tensorflow/tensorflow/blob/9d1262082e761cd85d6726bcbdfdef331d6d72c6/tensorflow/compiler/xla/stream_executor/rocm/rocm_driver.cc#L566), we don't know whether it will change to non-blocking. - There are several other differences between CUDA and ROCm. --------- Signed-off-by: Jinzhe Zeng <[email protected]>
@Kaveh01 Apologies for the lack of response. Can you please test with latest ROCm 6.1.0 (HIP 6.1)? If resolved, please close ticket. Thanks! |
@ppanchad-amd You could have just said that the |
I am using rocm 6.1.3 yet I still keep getting this issue. "error: use of undeclared identifier '__shfl_down_sync'" |
The *_sync functions are not available in 6.1, see, e.g. https://github.com/ROCm/clr/tree/rocm-6.1.x/hipamd/include/hip/amd_detail . The develop branch has an implementation which may appear in a future release. |
The develop implementation mentioned above has restrictions on its use that match the restrictions stated for pascal in the cuda guide. |
The C++ Language Extensions documentation for ROCm 6.1.2 / HIP 6.1.40092 describes this as if the
Note that this is the only reference to ROCm 6.2 in the entire document, the following sections simply list all the |
Apologies for the unclear documentation. These functions are available and disabled by default in 6.2 as stated, usable via a preprocessor macro. If there are issues with their functionality, feel free to comment and we can reopen this thread, or you can submit a new issue. |
CUDA 9
__shfl_sync
function is missing. I can use the deprecated__shfl
but it would bebe better to have the new function.
Test code:
The text was updated successfully, but these errors were encountered: