Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Build] Docker build failure with ROCm 6.0 using official Dockerfile for v1.19.2: Segmentation fault in clang++ during composable_kernel compilation #23807

Open
CarlosLimarino opened this issue Feb 25, 2025 · 0 comments
Labels
build build issues; typically submitted using template contributions welcome external contributions welcome

Comments

@CarlosLimarino
Copy link

CarlosLimarino commented Feb 25, 2025

Describe the issue

Building the ONNX Runtime Docker image using the official Dockerfile.rocm for version v1.19.2 with ROCm 6.0 fails with a segmentation fault during the CMake build process. The error occurs when compiling a file related to composable_kernel and FMHA (fmha_fwd_d64_fp8_batch_b128x64x32x64x32x64_r2x1x1_w32x32x32_qr_vc_squant.cpp.o).

This issue prevents the successful building of the ROCm Docker image for ONNX Runtime v1.19.2 using the official Dockerfile.

Environment:

Base Docker Image: rocm/pytorch:rocm6.0_ubuntu20.04_py3.9_pytorch_2.1.1
ONNX Runtime Version: v1.19.2
ROCm Version: 6.0.0
Dockerfile: Dockerfile.rocm (content provided below)
Build Log: build.log.txt (attached below)

Steps to reproduce:

  1. Create a file named Dockerfile.rocm with the following content:
# --------------------------------------------------------------
# Copyright (c) Microsoft Corporation. All rights reserved.
# Licensed under the MIT License.
# --------------------------------------------------------------
# Dockerfile to run ONNXRuntime with ROCm integration
#--------------------------------------------------------------------------

FROM rocm/pytorch:rocm6.0_ubuntu20.04_py3.9_pytorch_2.1.1

ARG ONNXRUNTIME_REPO=https://github.com/Microsoft/onnxruntime
ARG ONNXRUNTIME_BRANCH=v1.19.2

WORKDIR /code

ENV PATH /code/cmake-3.27.3-linux-x86_64/bin:${PATH}

# Prepare onnxruntime repository & build onnxruntime
RUN git clone --single-branch --branch ${ONNXRUNTIME_BRANCH} --recursive ${ONNXRUNTIME_REPO} onnxruntime &&\
    /bin/sh onnxruntime/dockerfiles/scripts/install_common_deps.sh &&\
    cd onnxruntime &&\
    /bin/sh ./build.sh --allow_running_as_root --config Release --build_wheel --update --build --parallel --cmake_extra_defines\
            ONNXRUNTIME_VERSION=$(cat ./VERSION_NUMBER) --use_rocm --rocm_home=/opt/rocm &&\
    pip install /code/onnxruntime/build/Linux/Release/dist/*.whl &&\
    cd ..
  1. Build the docker image using:

docker build -t onnxruntime_rocm -f Dockerfile.rocm .

Error Log Snippet:

clang++: error: unable to execute command: Segmentation fault (core dumped)
clang++: error: clang frontend command failed due to signal (use -v to see invocation)
AMD clang version 17.0.0
...
subprocess.CalledProcessError: Command '['/code/cmake-3.27.3-linux-x86_64/bin/cmake', '--build', '/code/onnxruntime/build/Linux/Release', '--config', 'Release', '--', '-j24']' returned non-zero exit status 2.

Urgency

No response

Target platform

x86-64

Build script

# --------------------------------------------------------------
# Copyright (c) Microsoft Corporation. All rights reserved.
# Licensed under the MIT License.
# --------------------------------------------------------------
# Dockerfile to run ONNXRuntime with ROCm integration
#--------------------------------------------------------------------------

FROM rocm/pytorch:rocm6.0_ubuntu20.04_py3.9_pytorch_2.1.1

ARG ONNXRUNTIME_REPO=https://github.com/Microsoft/onnxruntime
ARG ONNXRUNTIME_BRANCH=v1.19.2

WORKDIR /code

ENV PATH /code/cmake-3.27.3-linux-x86_64/bin:${PATH}

# Prepare onnxruntime repository & build onnxruntime
RUN git clone --single-branch --branch ${ONNXRUNTIME_BRANCH} --recursive ${ONNXRUNTIME_REPO} onnxruntime &&\
    /bin/sh onnxruntime/dockerfiles/scripts/install_common_deps.sh &&\
    cd onnxruntime &&\
    /bin/sh ./build.sh --allow_running_as_root --config Release --build_wheel --update --build --parallel --cmake_extra_defines\
            ONNXRUNTIME_VERSION=$(cat ./VERSION_NUMBER) --use_rocm --rocm_home=/opt/rocm &&\
    pip install /code/onnxruntime/build/Linux/Release/dist/*.whl &&\
    cd ..

Error / output

build.log.txt

Visual Studio Version

No response

GCC / Compiler Version

No response

@CarlosLimarino CarlosLimarino added the build build issues; typically submitted using template label Feb 25, 2025
@github-actions github-actions bot added the ep:ROCm questions/issues related to ROCm execution provider label Feb 25, 2025
@snnn snnn added contributions welcome external contributions welcome and removed ep:ROCm questions/issues related to ROCm execution provider labels Feb 25, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
build build issues; typically submitted using template contributions welcome external contributions welcome
Projects
None yet
Development

No branches or pull requests

2 participants