Skip to content

[PHI][CINN] Fix cum kernel for big tensor #72562

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: develop
Choose a base branch
from

Conversation

lshpku
Copy link
Contributor

@lshpku lshpku commented Apr 30, 2025

PR Category

Operator Mechanism

PR Types

Bug fixes

Description

修复Cum Kernel在大Tensor下的非法配置和访存越界问题

Cum是一个组合算子,涉及到以下几个基础算子,本PR做的改造如下:

算子 作用 改造方式 性能变化
BlockScanKernel 对最后一维进行Scan 下标转int64_t,限制gridDim,增加内循环兜底 无变化
MatrixTranspose 进行 [ H, W ] => [ W, H ] 转换,用于非最后一维的Scan 同上 降低1%
MatrixRowReverse 对最后一维进行翻转,用于反向(反向的Scan方向是反过来的) 同上;去掉BlockReverse函数(cc 2.0开始就支持翻转的凝聚访存,没必要用shm缓存一次) 提升0.5%
(因为去掉了shm)

注:MatrixTranspose本来可以用现成的TilingSwapDim1And2的,但是我看了下那个算子正确性也没法保证,还是先就地改造好了

对PaddleAPITest中的config进行了测试,均能运行,且与numpy.cumsum的精度比对通过;部分大shape下精度误差较大,跟reduce算法有关,有待后续改造


Pcard-85711

Copy link

paddle-bot bot commented Apr 30, 2025

你的PR提交成功,感谢你对开源项目的贡献!
请关注后续CI自动化测试结果,详情请参考Paddle-CI手册
Your PR has been submitted. Thanks for your contribution!
Please wait for the result of CI firstly. See Paddle CI Manual for details.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant