How to Build

This guide describes the steps to build Android/Windows releases of the QNN backend for llama.cpp.

Android

Prerequisites

Install the latest Docker Engine following the official steps: Install Docker Engine

Clone the llama-cpp-qnn-builder repository

git clone https://github.com/chraac/llama-cpp-qnn-builder.git
cd llama-cpp-qnn-builder

Note: Please update to the latest main branch as we're using NDK r23. There are optimization flags that weren't correctly applied in Release builds in earlier versions. See: https://github.com/android/ndk/issues/1740

Building

Navigate to the project root directory and run the build script:
```
./docker/docker_compose_compile.sh
```
The console output will look similar to this, and executables will be located in build_qnn_arm64-v8a/bin/:

Build Script Parameters

Parameter	Short	Description	Default
`--rebuild`	`-r`	Force rebuild of the project	`false`
`--repo-dir`		Specify llama.cpp repository directory	`../llama.cpp`
`--debug`	`-d`	Build in Debug mode	`Release`
`--print-build-time`		Display build and test execution times	`false`
`--asan`		Enable AddressSanitizer	`false`
`--build-linux-x64`		Build for Linux x86_64 platform	`android arm64-v8a`
`--perf-log`		Enable Hexagon performance tracking	`false`
`--enable-hexagon-backend`		Enable Hexagon backend support	`false`
`--hexagon-npu-only`		Build Hexagon NPU backend only	`false`
`--disable-hexagon-and-qnn`		Disable both Hexagon and QNN backends	`false`
`--qnn-only`		Build QNN backend only	`false`
`--enable-dequant`		Enable quantized tensor support in Hexagon	`false`

Build Examples

# Basic build (default: Release mode, QNN + Hexagon backends)
./docker/docker_compose_compile.sh

# Debug build with Hexagon NPU backend
./docker/docker_compose_compile.sh -d --enable-hexagon-backend

# Debug build with Hexagon NPU backend only
./docker/docker_compose_compile.sh -d --hexagon-npu-only

# Debug build with Hexagon NPU backend and quantized tensor support
./docker/docker_compose_compile.sh -d --hexagon-npu-only --enable-dequant

# QNN-only build with performance logging
./docker/docker_compose_compile.sh --qnn-only --perf-log

# Force rebuild with debug symbols and build timing
./docker/docker_compose_compile.sh -r -d --print-build-time

Windows

Prerequisites

Download Qualcomm AI Engine Direct SDK
- Get it from Qualcomm Developer Portal
- Extract to a folder (e.g., C:/ml/qnn_sdk/qairt/2.31.0.250130/)
Install Visual Studio 2022
- Ensure the following components are installed:
  - Clang toolchain for ARM64 compilation
  - CMake tools for Visual Studio
Install Hexagon SDK (for Hexagon NPU backend)
- To compile the hexagon-npu backend, you need to install the latest Hexagon SDK
- Follow the official documentation:
  1. First install the Qualcomm Package Manager (QPM)
  2. Then use QPM to install the Hexagon SDK
- Set the environment variable HEXAGON_SDK_ROOT to point to your installation directory
Note: The Hexagon SDK is only required if you plan to build with --enable-hexagon-backend or --hexagon-npu-only flags.

Build Steps

Open the Project
- Launch Visual Studio 2022
- Click Continue without code
- Go to File → Open → CMake
- Navigate to the llama.cpp root directory and select CMakeLists.txt

Configure CMake Presets

Edit llama.cpp/CMakePresets.json and modify the arm64-windows-llvm configuration:

{
    "name": "arm64-windows-llvm", 
    "hidden": true,
    "architecture": { "value": "arm64", "strategy": "external" },
    "toolset": { "value": "host=x64", "strategy": "external" },
    "cacheVariables": {
-        "CMAKE_TOOLCHAIN_FILE": "${sourceDir}/cmake/arm64-windows-llvm.cmake"
+        "CMAKE_TOOLCHAIN_FILE": "${sourceDir}/cmake/arm64-windows-llvm.cmake",
+        "GGML_QNN": "ON",
+        "GGML_QNN_SDK_PATH": "C:/ml/qnn_sdk/qairt/2.31.0.250130/",
+        "BUILD_SHARED_LIBS": "OFF"
    }
},

Important: Replace C:/ml/qnn_sdk/qairt/2.31.0.250130/ with your actual QNN SDK path.

Select Build Configuration
- In Visual Studio, select the arm64-windows-llvm-debug configuration from the dropdown
Build the Project
- Go to Build → Build All
- Output files will be located in build-arm64-windows-llvm-debug/bin/

Build Output

After successful compilation, you'll find the following executables:

llama-cli.exe - Main inference executable
llama-bench.exe - Benchmarking tool
test-backend-ops.exe - Backend operation tests

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

How to Build

Android

Prerequisites

Building

Build Script Parameters

Build Examples

Windows

Prerequisites

Build Steps

Build Output

Uh oh!

Uh oh!

Uh oh!

Clone this wiki locally