[HIP][device] 4 __shfl_sync functions are missing #1491

Kaveh01 · 2019-10-01T09:12:26Z

CUDA 9 __shfl_sync function is missing. I can use the deprecated __shfl but it would be
be better to have the new function.
Test code:

   __global__
static void shflTest(int lid){
    int tid = threadIdx.x;
    float value = tid + 0.1f;
    int* ivalue = reinterpret_cast<int*>(&value);

    //use the integer shfl
    int ix = __shfl(ivalue[0],5,32);
    int iy = __shfl_sync(0xFFFFFFFF, ivalue[0],5,32);

    float x = reinterpret_cast<float*>(&ix)[0];
    float y = reinterpret_cast<float*>(&iy)[0];

    if(tid == lid){
        printf("shfl tmp %d %d\n",ix,iy);
        printf("shfl final %f %f\n",x,y);
    }
}

int main()
{
    shflTest<<<1,32>>>(0);
    cudaDeviceSynchronize();
    return 0;
}

The text was updated successfully, but these errors were encountered:

emankov · 2019-10-01T11:06:12Z

__shfl_up_sync, __shfl_down_sync, and __shfl_xor_sync as well.

b-sumner · 2019-10-01T14:55:42Z

We have some work left in the device compiler to support certain cuda 9 device side features such as the sync APIs. Also note that most AMD devices have a "warp size" of 64, so any code using a 32 bit mask is already broken.

gmarkomanolis · 2021-02-13T11:23:35Z

Hi, I was trying to hipify a code and there are a few calls to __shfl_down_sync. The __shfl_down is deprecated, so it can not be used with CUDA 11. What would be the best approach?

acowley · 2021-02-13T17:46:39Z

@gmarkomanolis What I do when using hipify-perl as part of a build process is include a construction like,

#ifdef __HIP_PLATFORM_HCC__
#define SHFL_DOWN(val, offset) __shfl_down(val, offset)
#else
#define SHFL_DOWN(val, offset) __shfl_down_sync(0xffffffff, val, offset)
#endif

The specific constant I'm using there (__HIP_PLATFORM_HCC) is old, so a newer one would be better.

emankov · 2021-02-13T17:55:46Z

Hi, I was trying to hipify a code and there are a few calls to __shfl_down_sync. The __shfl_down is deprecated, so it can not be used with CUDA 11. What would be the best approach?

__shfl_down is deprecated since CUDA 9.0, but it is not removed and still can be used even by CUDA 11.2.1.

emankov · 2021-02-13T17:58:06Z

The specific constant I'm using there (__HIP_PLATFORM_HCC) is old, so a newer one would be better.

What do you mean by old?

acowley · 2021-02-13T18:12:52Z

I think mentions of hcc are being removed over time.

gmarkomanolis · 2021-02-13T18:44:31Z

Hi, I was trying to hipify a code and there are a few calls to __shfl_down_sync. The __shfl_down is deprecated, so it can not be used with CUDA 11. What would be the best approach?

__shfl_down is deprecated since CUDA 9.0, but it is not removed and still can be used even by CUDA 11.2.1.

https://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#warp-shuffle-functions

Deprecation Notice: __shfl, __shfl_up, __shfl_down, and __shfl_xor have been deprecated in CUDA 9.0 for all devices.

Removal Notice: When targeting devices with compute capability 7.x or higher, __shfl, __shfl_up, __shfl_down, and __shfl_xor are no longer available and their sync variants should be used instead.

I will check though if it is on the code's side as it is not mine. Thanks for the answer.

gmarkomanolis · 2021-02-13T18:45:41Z

@gmarkomanolis What I do when using hipify-perl as part of a build process is include a construction like,
#ifdef __HIP_PLATFORM_HCC__
#define SHFL_DOWN(val, offset) __shfl_down(val, offset)
#else
#define SHFL_DOWN(val, offset) __shfl_down_sync(0xffffffff, val, offset)
#endif
The specific constant I'm using there (__HIP_PLATFORM_HCC) is old, so a newer one would be better.

Thanks a lot.

jammm · 2021-07-14T03:16:38Z

Hey, @emankov, any update on __shfl_sync ? It would be great to have this implemented I think.

leachim · 2022-09-23T12:47:51Z

Any update on this? I am specifically looking for a solution to __shfl_sync

jammm · 2022-09-23T12:52:14Z

If your code uses a mask of 0xffffffff, then you can just replace your _sync calls with the non-sync ones and it should work fine.

Merge `source/lib/src/cuda` and `source/lib/src/rocm` into `source/lib/src/gpu`. - Define macros `gpuGetLastError`, `gpuDeviceSynchronize`, `gpuMemcpy`, `gpuMemcpyDeviceToHost`, `gpuMemcpyHostToDevice`, and `gpuMemset` to make them available for both CUDA and ROCm. - Use `<<< >>> syntax` for both CUDA and ROCm. Per ROCm/hip@cf78d85, it has been supported in HIP since 2018. - Fix several int const numbers that should be double or float. - For tabulate: - Fix `WARP_SIZE` for ROCm. Per pytorch/pytorch#64302, WARP_SIZE can be 32 or 64, so it should not be hardcoded to 64. - Add `GpuShuffleSync`. Per ROCm/hip#1491, `__shfl_sync` is not supported by HIP. - After merging the code, #1274 should also work for ROCm. - Use the same `ii` for #830 and #2357. Although both of them work, `ii` has different meanings in these two PRs, but now it should be the same. - However, `ii` in `tabulate_fusion_se_a_fifth_order_polynomial` (rocm) added by #2532 is wrong. After merging the codes, it should be corrected. - Optimization in #830 was not applied to ROCm. - `__syncwarp` is not supported by ROCm. - After merging the code, #2661 will be applied to ROCm. Although TF ROCm stream is still blocking (https://github.com/tensorflow/tensorflow/blob/9d1262082e761cd85d6726bcbdfdef331d6d72c6/tensorflow/compiler/xla/stream_executor/rocm/rocm_driver.cc#L566), we don't know whether it will change to non-blocking. - There are several other differences between CUDA and ROCm. --------- Signed-off-by: Jinzhe Zeng <[email protected]>

ppanchad-amd · 2024-04-30T15:56:00Z

@Kaveh01 Apologies for the lack of response. Can you please test with latest ROCm 6.1.0 (HIP 6.1)? If resolved, please close ticket. Thanks!

lahwaacz · 2024-04-30T16:26:35Z

@ppanchad-amd You could have just said that the _sync functions were added to the C++ kernel language in some ROCm/HIP version 🤷

Vishal-S-P · 2024-07-02T19:52:18Z

I am using rocm 6.1.3 yet I still keep getting this issue. "error: use of undeclared identifier '__shfl_down_sync'"

b-sumner · 2024-07-02T20:07:05Z

The *_sync functions are not available in 6.1, see, e.g. https://github.com/ROCm/clr/tree/rocm-6.1.x/hipamd/include/hip/amd_detail . The develop branch has an implementation which may appear in a future release.

b-sumner · 2024-07-02T20:18:11Z

The develop implementation mentioned above has restrictions on its use that match the restrictions stated for pascal in the cuda guide.

lahwaacz · 2024-07-03T08:06:14Z

The C++ Language Extensions documentation for ROCm 6.1.2 / HIP 6.1.40092 describes this as if the __sync functions were already a thing.

Note that the __sync variants are made available in ROCm 6.2

Note that this is the only reference to ROCm 6.2 in the entire document, the following sections simply list all the _sync variants without any reference to the future ROCm version. Why are future features documented in earlier releases? It seems like somebody just copy-pasted it from NVIDIA 🤷

schung-amd · 2024-10-01T14:17:11Z

Apologies for the unclear documentation. These functions are available and disabled by default in 6.2 as stated, usable via a preprocessor macro. If there are issues with their functionality, feel free to comment and we can reopen this thread, or you can submit a new issue.

emankov added the hip label Oct 1, 2019

emankov changed the title ~~__shfl_sync is missing.~~ [HIP][device] 4 __shfl_sync functions are missing Oct 1, 2019

leofang mentioned this issue Nov 24, 2020

Improve scan kernel cupy/cupy#4315

Closed

SlyEcho mentioned this issue May 20, 2023

CUDA performance optimizations ggml-org/llama.cpp#1530

Merged

njzjz mentioned this issue Sep 19, 2023

merge cuda and rocm files deepmodeling/deepmd-kit#2844

Merged

ppanchad-amd closed this as completed May 28, 2024

ppanchad-amd reopened this Jul 2, 2024

ppanchad-amd added the Under Investigation label Sep 30, 2024

schung-amd closed this as completed Oct 1, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[HIP][device] 4 __shfl_sync functions are missing #1491

[HIP][device] 4 __shfl_sync functions are missing #1491

Kaveh01 commented Oct 1, 2019

emankov commented Oct 1, 2019

b-sumner commented Oct 1, 2019

gmarkomanolis commented Feb 13, 2021

acowley commented Feb 13, 2021

emankov commented Feb 13, 2021

emankov commented Feb 13, 2021

acowley commented Feb 13, 2021

gmarkomanolis commented Feb 13, 2021

gmarkomanolis commented Feb 13, 2021

jammm commented Jul 14, 2021

leachim commented Sep 23, 2022

jammm commented Sep 23, 2022

ppanchad-amd commented Apr 30, 2024

lahwaacz commented Apr 30, 2024

Vishal-S-P commented Jul 2, 2024

b-sumner commented Jul 2, 2024

b-sumner commented Jul 2, 2024

lahwaacz commented Jul 3, 2024

schung-amd commented Oct 1, 2024

[HIP][device] 4 __shfl_sync functions are missing #1491

[HIP][device] 4 __shfl_sync functions are missing #1491

Comments

Kaveh01 commented Oct 1, 2019

emankov commented Oct 1, 2019

b-sumner commented Oct 1, 2019

gmarkomanolis commented Feb 13, 2021

acowley commented Feb 13, 2021

emankov commented Feb 13, 2021

emankov commented Feb 13, 2021

acowley commented Feb 13, 2021

gmarkomanolis commented Feb 13, 2021

gmarkomanolis commented Feb 13, 2021

jammm commented Jul 14, 2021

leachim commented Sep 23, 2022

jammm commented Sep 23, 2022

ppanchad-amd commented Apr 30, 2024

lahwaacz commented Apr 30, 2024

Vishal-S-P commented Jul 2, 2024

b-sumner commented Jul 2, 2024

b-sumner commented Jul 2, 2024

lahwaacz commented Jul 3, 2024

schung-amd commented Oct 1, 2024