-
Notifications
You must be signed in to change notification settings - Fork 0
feat: flash attention support for hexagon-npu #16
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
… method; enhance tests for FLASH_ATTN_EXT
…th new properties and methods
…parsing and improve OpTensor handling
Co-authored-by: Copilot <[email protected]>
Co-authored-by: Copilot <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
Adds Flash Attention support for the Hexagon NPU backend by enhancing log parsing, test coverage, CLI scripts, and documentation.
- Extend
log_parser
and its tests to recognizeFLASH_ATTN_EXT
operations - Introduce
--flash-attn
flags in device‐run and benchmark scripts (bash and PowerShell) - Update build docs with Hexagon SDK requirements and add a Python test workflow
Reviewed Changes
Copilot reviewed 12 out of 12 changed files in this pull request and generated 2 comments.
Show a summary per file
File | Description |
---|---|
scripts/tests/test_log_parser.py | Expanded unit tests to cover FLASH_ATTN_EXT parsing |
scripts/run_device_model.sh | Added -f/--flash-attn argument and forwarding in bash script |
scripts/run_device_model.ps1 | Added -f/--FlashAttention switch and arg assembly in PS script |
scripts/log_parser.py | Implemented parsing and item subclassing for FLASH_ATTN_EXT |
scripts/batch_run_benchmarks_and_save_log.sh | Added --flash-attn support for bash benchmark runner |
scripts/batch_run_benchmarks_and_save_log.ps1 | Added -f/--FlashAttention support for PowerShell benchmark |
llama.cpp | Updated submodule commit reference |
docs/how-to-build.md | Added instructions to install Hexagon SDK |
.github/workflows/python_tests.yml | New workflow to run Python unit tests for log parser |
Comments suppressed due to low confidence (3)
scripts/tests/test_log_parser.py:8
- [nitpick] The test name suggests checking a method called
get_name
, but it actually verifiesdtype
,shape
, andpermute
behavior. Consider renaming totest_op_tensor_initialization_sets_dtype_shape_and_permute
for clarity.
def test_given_op_tensor_when_get_name_then_return_correct_name(self):
.github/workflows/python_tests.yml:41
- This step is labeled
Test with pytest
but invokes the unittest module incorrectly. Either switch topytest
invocation or usepython -m unittest test_log_parser -v
to run the tests as written.
python3 -m test_log_parser -v
scripts/log_parser.py:131
- [nitpick] The return type hint
dict[str:(...)]
is invalid Python syntax. Consider usingdict[str, Union[OpTensor.DataType, int, float, bool, list]]
or the-> Dict[str, Any]
pattern to satisfy type checkers.
def __parse_prop(prop: str) -> dict[str:(OpTensor.DataType | int | float | bool | list)]:
@@ -37,6 +40,10 @@ if ($Verbose) { | |||
$extraArgs = "-v" | |||
} | |||
|
|||
if ($FlashAttention) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The variable $extraArgs
is only set inside the if ($Verbose)
block and may be undefined here. Initialize $extraArgs = ""
before any conditionals to prevent null or concatenation errors.
Copilot uses AI. Check for mistakes.
This reverts commit 63c8829.
Related to chraac/llama.cpp#45