Faster Q4_K_R4 and Q5_K_R4 on AVX2/Zen4 #182

ikawrakow · 2025-01-30T07:26:43Z

TG is about the same. PP-512 comparison between main and this PR for LLaMA-3.1-8B on a Ryzen-5975WX (AVX2) and a Ryzen-7950X (Zen4)

model	backend	threads	test	t/s (main)	t/s (PR)	Speedup
llama 8B Q4_K_S	AVX2	32	pp512	291.90 ± 0.64	327.98 ± 0.51	1.124
llama 8B Q5_K_S	AVX2	32	pp512	273.59 ± 0.37	302.13 ± 0.61	1.104
llama 8B Q4_K_S	Zen4	16	pp512	258.78 ± 1.05	267.69 ± 0.31	1.034
llama 8B Q5_K_S	Zen4	16	pp512	246.19 ± 0.65	249.12 ± 0.42	1.012

The improvement on Zen4 is very minor. The benefit there is bloat reduction as I'm now reusing the same implementation as AVX2.

We now arrive at PP-512 = 328 t/s for LLaMA-3.1-8B on a Ryzen-5975WX CPU, up from 291 t/s when I last measured on 3c5f872. With FA and Q8_0 K-cache we get to 339.5 t/s.

We arrive at 302 t/s for LLaMA-3.1-8B on a Ryzen-5975WX CPU, up from 273 t/s.

After the changes I made to AVX2, it ends up being slightly faster compared to what I had for Zen4.

Kawrakow added 7 commits January 29, 2025 15:32

Slightly faster AVX2 implementation for q4_k_r4

118baf3

Even better AVX2 implementation for q4_k_r4

dbe3b58

We now arrive at PP-512 = 328 t/s for LLaMA-3.1-8B on a Ryzen-5975WX CPU, up from 291 t/s when I last measured on 3c5f872. With FA and Q8_0 K-cache we get to 339.5 t/s.

Fix llama-bench labels that I broke with #181

d07ba66

Faster AVX2 implementation for q5_k_q4

98bbe5d

We arrive at 302 t/s for LLaMA-3.1-8B on a Ryzen-5975WX CPU, up from 273 t/s.

Use AVX2 implementation of q4_k_r4 and q5_k_r4 also on Zen4

6136d8e

After the changes I made to AVX2, it ends up being slightly faster compared to what I had for Zen4.

Minor tweak

c7841bb

Cleanup

195d7ef

ikawrakow merged commit 2e6b523 into main Jan 30, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Faster Q4_K_R4 and Q5_K_R4 on AVX2/Zen4 #182

Faster Q4_K_R4 and Q5_K_R4 on AVX2/Zen4 #182

Uh oh!

ikawrakow commented Jan 30, 2025

Uh oh!

Uh oh!

Faster Q4_K_R4 and Q5_K_R4 on AVX2/Zen4 #182

Faster Q4_K_R4 and Q5_K_R4 on AVX2/Zen4 #182

Uh oh!

Conversation

ikawrakow commented Jan 30, 2025

Uh oh!

Uh oh!