[PHI][CINN] Fix cum kernel for big tensor #72562

lshpku · 2025-04-30T07:08:28Z

PR Category

Operator Mechanism

PR Types

Bug fixes

Description

修复Cum Kernel在大Tensor下的非法配置和访存越界问题

Cum是一个组合算子，涉及到以下几个基础算子，本PR做的改造如下：

算子	作用	改造方式	性能变化
BlockScanKernel	对最后一维进行Scan	下标转int64_t，限制gridDim，增加内循环兜底	无变化
MatrixTranspose	进行 [ H, W ] => [ W, H ] 转换，用于非最后一维的Scan	同上	降低1%
MatrixRowReverse	对最后一维进行翻转，用于反向（反向的Scan方向是反过来的）	同上；去掉BlockReverse函数（cc 2.0开始就支持翻转的凝聚访存，没必要用shm缓存一次）	提升0.5% （因为去掉了shm）

注：MatrixTranspose本来可以用现成的TilingSwapDim1And2的，但是我看了下那个算子正确性也没法保证，还是先就地改造好了

对PaddleAPITest中的config进行了测试，均能运行，且与numpy.cumsum的精度比对通过；部分大shape下精度误差较大，跟reduce算法有关，有待后续改造

Pcard-85711

paddle-bot · 2025-04-30T07:08:32Z

你的PR提交成功，感谢你对开源项目的贡献!
请关注后续CI自动化测试结果，详情请参考Paddle-CI手册。
Your PR has been submitted. Thanks for your contribution!
Please wait for the result of CI firstly. See Paddle CI Manual for details.

[PHI] Fix cum kernel for big tensor

c518499

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[PHI][CINN] Fix cum kernel for big tensor #72562

[PHI][CINN] Fix cum kernel for big tensor #72562

lshpku commented Apr 30, 2025 •

edited

Loading

paddle-bot bot commented Apr 30, 2025

[PHI][CINN] Fix cum kernel for big tensor #72562

Are you sure you want to change the base?

[PHI][CINN] Fix cum kernel for big tensor #72562

Conversation

lshpku commented Apr 30, 2025 • edited Loading

PR Category

PR Types

Description

paddle-bot bot commented Apr 30, 2025

lshpku commented Apr 30, 2025 •

edited

Loading