Skip to content

feat: flash attention support for hexagon-npu #16

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 22 commits into from
Jun 22, 2025
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
41 changes: 41 additions & 0 deletions .github/workflows/python_tests.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,41 @@
# This workflow will install Python dependencies, run tests and lint with a single version of Python
# For more information see: https://docs.github.com/en/actions/automating-builds-and-tests/building-and-testing-python

name: Python unittests

on:
push:
branches:
- main
pull_request:
branches:
- main

permissions:
contents: read

jobs:
python-unittest-scripts:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v3
- name: Set up Python 3.11
uses: actions/setup-python@v3
with:
python-version: "3.11"
- name: Install dependencies
run: |
python -m pip install --upgrade pip
pip install flake8 pytest
pip install -r ${GITHUB_WORKSPACE}/scripts/requirements.txt
- name: Lint with flake8
run: |
# stop the build if there are Python syntax errors or undefined names
flake8 . --count --select=E9,F63,F7,F82 --show-source --statistics
# exit-zero treats all errors as warnings. The GitHub editor is 127 chars wide
flake8 . --count --exit-zero --max-complexity=10 --max-line-length=127 --statistics
- name: Test with pytest
run: |
export PYTHONPATH="$PYTHONPATH:${GITHUB_WORKSPACE}:${GITHUB_WORKSPACE}/scripts:${GITHUB_WORKSPACE}/scripts/tests"
cd scripts/tests
python3 -m test_log_parser -v
1 change: 1 addition & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -6,3 +6,4 @@ build_qnn_*
temp/*
*.txt
run_logs
*.pyc
25 changes: 17 additions & 8 deletions docs/how-to-build.md
Original file line number Diff line number Diff line change
Expand Up @@ -17,7 +17,7 @@ This guide describes the steps to build Android/Windows releases of the QNN back

1. Navigate to the project root directory and run the build script:
```bash
./docker/docker_compose_compile_and_share.sh
./docker/docker_compose_compile.sh
```

2. The console output will look similar to this, and executables will be located in `build_qnn_arm64-v8a/bin/`:
Expand All @@ -44,22 +44,22 @@ This guide describes the steps to build Android/Windows releases of the QNN back

```bash
# Basic build (default: Release mode, QNN + Hexagon backends)
./docker/docker_compose_compile_and_share.sh
./docker/docker_compose_compile.sh

# Debug build with Hexagon NPU backend
./docker/docker_compose_compile_and_share.sh -d --enable-hexagon-backend
./docker/docker_compose_compile.sh -d --enable-hexagon-backend

# Debug build with Hexagon NPU backend only
./docker/docker_compose_compile_and_share.sh -d --hexagon-npu-only
./docker/docker_compose_compile.sh -d --hexagon-npu-only

# Debug build with Hexagon NPU backend and quantized tensor support
./docker/docker_compose_compile_and_share.sh -d --hexagon-npu-only --enable-dequant
./docker/docker_compose_compile.sh -d --hexagon-npu-only --enable-dequant

# QNN-only build with performance logging
./docker/docker_compose_compile_and_share.sh --qnn-only --perf-log
./docker/docker_compose_compile.sh --qnn-only --perf-log

# Force rebuild with debug symbols and build timing
./docker/docker_compose_compile_and_share.sh -r -d --print-build-time
./docker/docker_compose_compile.sh -r -d --print-build-time
```

## Windows
Expand All @@ -80,6 +80,15 @@ This guide describes the steps to build Android/Windows releases of the QNN back

![VS2022 CMake Installation](https://github.com/user-attachments/assets/9a36dde5-0e41-4421-9161-e9b09cd32eb1)

3. **Install Hexagon SDK (for Hexagon NPU backend)**
- To compile the `hexagon-npu` backend, you need to install the latest Hexagon SDK
- Follow the [official documentation](https://docs.qualcomm.com/bundle/publicresource/topics/80-77512-1/hexagon-dsp-sdk-getting-started.html?product=1601111740010422):
1. First install the Qualcomm Package Manager (QPM)
2. Then use QPM to install the Hexagon SDK
- Set the environment variable `HEXAGON_SDK_ROOT` to point to your installation directory

> **Note**: The Hexagon SDK is only required if you plan to build with `--enable-hexagon-backend` or `--hexagon-npu-only` flags.

### Build Steps

1. **Open the Project**
Expand Down Expand Up @@ -124,4 +133,4 @@ This guide describes the steps to build Android/Windows releases of the QNN back
After successful compilation, you'll find the following executables:
- `llama-cli.exe` - Main inference executable
- `llama-bench.exe` - Benchmarking tool
- `test-backend-ops.exe` - Backend operation tests
- `test-backend-ops.exe` - Backend operation tests
2 changes: 1 addition & 1 deletion llama.cpp
Submodule llama.cpp updated 246 files
Empty file added scripts/__init__.py
Empty file.
9 changes: 8 additions & 1 deletion scripts/batch_run_benchmarks_and_save_log.ps1
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,10 @@ param (
[switch]$Verbose,

[Alias('-s')]
[switch]$Skip8b
[switch]$Skip8b,

[Alias('-f')]
[switch]$FlashAttention
)

$_scriptPath = Split-Path -Parent $MyInvocation.MyCommand.Path
Expand Down Expand Up @@ -37,6 +40,10 @@ if ($Verbose) {
$extraArgs = "-v"
}

if ($FlashAttention) {
Copy link
Preview

Copilot AI Jun 18, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The variable $extraArgs is only set inside the if ($Verbose) block and may be undefined here. Initialize $extraArgs = "" before any conditionals to prevent null or concatenation errors.

Copilot uses AI. Check for mistakes.

$extraArgs += " --flash-attn 1"
}

$logFilePath = "$_scriptPath/../run_logs/$LogFileName"

# Create logs directory if it doesn't exist
Expand Down
15 changes: 12 additions & 3 deletions scripts/batch_run_benchmarks_and_save_log.sh
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,7 @@ _model_list=('meta-llama_Meta-Llama-3.2-1B-Instruct' 'meta-llama_Meta-Llama-3.2-
_should_push_to_device=0
_verbose_log=0
_skip_8b_model=0
_flash_attn=0

# parse arguments to get the log file name
while [[ $# -gt 0 ]]; do
Expand All @@ -30,6 +31,10 @@ while [[ $# -gt 0 ]]; do
_skip_8b_model=1
shift
;;
-f | --flash-attn)
_flash_attn=1
shift
;;
*)
echo "Invalid option $1"
exit 1
Expand All @@ -45,9 +50,13 @@ if [ $_skip_8b_model -eq 1 ]; then
_model_list=('meta-llama_Meta-Llama-3.2-1B-Instruct' 'meta-llama_Meta-Llama-3.2-3B-Instruct')
fi

extra_args=""
_extra_args=""
if [ $_verbose_log -eq 1 ]; then
extra_args="-v"
_extra_args="-v"
fi

if [ $_flash_attn -eq 1 ]; then
_extra_args="${_extra_args} --flash-attn 1"
fi

log_file_path="$_script_path/../run_logs/$_log_file_name"
Expand All @@ -57,7 +66,7 @@ function run_benchmark() {
local model_name=$1
local command_string="cd $_device_path && "
command_string+="LLAMA_CACHE=$_device_path/.cache LD_LIBRARY_PATH=./ ADSP_LIBRARY_PATH=./ "
command_string+="./llama-bench --progress ${extra_args} -mmp 0 -p 512 -n 128 -m ${_device_model_path}/$model_name"
command_string+="./llama-bench --progress ${_extra_args} -mmp 0 -p 512 -n 128 -m ${_device_model_path}/$model_name"
adb shell $command_string
}

Expand Down
Loading