-
Notifications
You must be signed in to change notification settings - Fork 3
How to Build
This guide describes the steps to build Android/Windows releases of the QNN backend for llama.cpp.
- Install the latest Docker Engine following the official steps: Install Docker Engine
- Clone the llama-cpp-qnn-builder repository
git clone https://github.com/chraac/llama-cpp-qnn-builder.git cd llama-cpp-qnn-builder
Note: Please update to the latest
main
branch as we're using NDK r23. There are optimization flags that weren't correctly applied inRelease
builds in earlier versions. See: https://github.com/android/ndk/issues/1740
-
Navigate to the project root directory and run the build script:
./docker/docker_compose_compile.sh
-
The console output will look similar to this, and executables will be located in
build_qnn_arm64-v8a/bin/
:
Parameter | Short | Description | Default |
---|---|---|---|
--rebuild |
-r |
Force rebuild of the project | false |
--repo-dir |
Specify llama.cpp repository directory | ../llama.cpp |
|
--debug |
-d |
Build in Debug mode | Release |
--print-build-time |
Display build and test execution times | false |
|
--asan |
Enable AddressSanitizer | false |
|
--build-linux-x64 |
Build for Linux x86_64 platform | android arm64-v8a |
|
--perf-log |
Enable Hexagon performance tracking | false |
|
--enable-hexagon-backend |
Enable Hexagon backend support | false |
|
--hexagon-npu-only |
Build Hexagon NPU backend only | false |
|
--disable-hexagon-and-qnn |
Disable both Hexagon and QNN backends | false |
|
--qnn-only |
Build QNN backend only | false |
|
--enable-dequant |
Enable quantized tensor support in Hexagon | false |
# Basic build (default: Release mode, QNN + Hexagon backends)
./docker/docker_compose_compile.sh
# Debug build with Hexagon NPU backend
./docker/docker_compose_compile.sh -d --enable-hexagon-backend
# Debug build with Hexagon NPU backend only
./docker/docker_compose_compile.sh -d --hexagon-npu-only
# Debug build with Hexagon NPU backend and quantized tensor support
./docker/docker_compose_compile.sh -d --hexagon-npu-only --enable-dequant
# QNN-only build with performance logging
./docker/docker_compose_compile.sh --qnn-only --perf-log
# Force rebuild with debug symbols and build timing
./docker/docker_compose_compile.sh -r -d --print-build-time
-
Download Qualcomm AI Engine Direct SDK
- Get it from Qualcomm Developer Portal
- Extract to a folder (e.g.,
C:/ml/qnn_sdk/qairt/2.31.0.250130/
)
-
Install Visual Studio 2022
- Ensure the following components are installed:
-
Clang toolchain for ARM64 compilation
-
CMake tools for Visual Studio
-
- Ensure the following components are installed:
-
Install Hexagon SDK (for Hexagon NPU backend)
- To compile the
hexagon-npu
backend, you need to install the latest Hexagon SDK - Follow the official documentation:
- First install the Qualcomm Package Manager (QPM)
- Then use QPM to install the Hexagon SDK
- Set the environment variable
HEXAGON_SDK_ROOT
to point to your installation directory
Note: The Hexagon SDK is only required if you plan to build with
--enable-hexagon-backend
or--hexagon-npu-only
flags. - To compile the
-
Open the Project
- Launch Visual Studio 2022
- Click
Continue without code
- Go to
File
→Open
→CMake
- Navigate to the
llama.cpp
root directory and selectCMakeLists.txt
-
Configure CMake Presets
Edit
llama.cpp/CMakePresets.json
and modify thearm64-windows-llvm
configuration:{ "name": "arm64-windows-llvm", "hidden": true, "architecture": { "value": "arm64", "strategy": "external" }, "toolset": { "value": "host=x64", "strategy": "external" }, "cacheVariables": { - "CMAKE_TOOLCHAIN_FILE": "${sourceDir}/cmake/arm64-windows-llvm.cmake" + "CMAKE_TOOLCHAIN_FILE": "${sourceDir}/cmake/arm64-windows-llvm.cmake", + "GGML_QNN": "ON", + "GGML_QNN_SDK_PATH": "C:/ml/qnn_sdk/qairt/2.31.0.250130/", + "BUILD_SHARED_LIBS": "OFF" } },
Important: Replace
C:/ml/qnn_sdk/qairt/2.31.0.250130/
with your actual QNN SDK path. -
Select Build Configuration
- In Visual Studio, select the
arm64-windows-llvm-debug
configuration from the dropdown
- In Visual Studio, select the
-
Build the Project
- Go to
Build
→Build All
- Output files will be located in
build-arm64-windows-llvm-debug/bin/
- Go to
After successful compilation, you'll find the following executables:
-
llama-cli.exe
- Main inference executable -
llama-bench.exe
- Benchmarking tool -
test-backend-ops.exe
- Backend operation tests