-
Notifications
You must be signed in to change notification settings - Fork 945
Build and test with CUDA 12.9.0 #18721
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: branch-25.06
Are you sure you want to change the base?
Conversation
Thanks Bradley! 🙏 Do we want to update the Spark image to CUDA 12.9 as well ( cc @sameerz )?
Also it looks like pre-commit got stuck on Edit: We noted offline that there was a GitHub incident today. Likely that caused the pre-commit issue. It has since cleared up |
Seeing the following build error in CI │ │ $BUILD_PREFIX/bin/../lib/gcc/x86_64-conda-linux-gnu/13.3.0/../../../../x86_64-conda-linux-gnu/bin/ld: cannot find -lnvToolsExt: No such file or directory
│ │ collect2: error: ld returned 1 exit status
│ │ [3/8] Building CUDA object CMakeFiles/custom_p The That said, think we are just using NVTX headers (like Think lines like this...
...can either be replaced with...
...as is done here Line 998 in 874ecb4
|
After discussing with the Spark team offline, they have filed a CUDA 12.9 update issue: NVIDIA/spark-rapids#12679 |
Seeing the following test failure in conda C++ tests: [ FAILED ] 3 tests, listed below:
[ FAILED ] CollectTestFixedWidth/22.CollectSet, where TypeParam = numeric::fixed_point<__int128,(numeric::Radix)10>
[ FAILED ] ReductionHistogramTest/8.Histogram, where TypeParam = numeric::fixed_point<__int128,(numeric::Radix)10>
[ FAILED ] ReductionHistogramTest/8.MergeHistogram, where TypeParam = numeric::fixed_point<__int128,(numeric::Radix)10>
3 FAILED TESTS
CMake Error at run_gpu_test.cmake:35 (execute_process):
execute_process failed command indexes:
1: "Child return code: 1" |
Also seeing the following test failure in conda and wheel Python tests on CI: FAILED tests/test_dataframe.py::test_decimal_quantile[Decimal128Dtype-higher-1] - AssertionError: DataFrame.iloc[:, 1] (column name="val") are different
DataFrame.iloc[:, 1] (column name="val") values are different (100.0 %)
[index]: [1.0]
[left]: [98.14]
[right]: [453.23]
At positional index 0, first diff: 98.14 != 453.23
FAILED tests/test_dataframe.py::test_decimal_quantile[Decimal128Dtype-higher-q3] - AssertionError: DataFrame.iloc[:, 1] (column name="val") are different |
@@ -7,7 +7,7 @@ jobs: | |||
spark-rapids-jni-build: | |||
runs-on: linux-amd64-cpu8 | |||
container: | |||
image: rapidsai/ci-spark-rapids-jni:rockylinux8-cuda12.8.0 | |||
image: rapidsai/ci-spark-rapids-jni:rockylinux8-cuda12.9.0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@pxLi have bumped the Spark RAPIDS CI image to CUDA 12.9.0 above
Is there anything else we need to do?
These failures are all with 128bit decimals which smells like miscompilation somewhere. |
This PR uses CUDA 12.9.0 to build and test.
xref: rapidsai/build-planning#173