How to improve decode rate on phone using GPU? #13923

Arya-Hari · 2025-05-30T14:47:14Z

Arya-Hari
May 30, 2025

Right now, decode rates on using llama.cpp and termux on my phone are pretty bad when using GPU (for the Llama 3.2 3B model). Any suggestions on how to get it up? Looking for around 20 to 25 tokens/sec.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

How to improve decode rate on phone using GPU? #13923

Uh oh!

{{title}}

Uh oh!

Replies: 0 comments

Select a reply

Uh oh!

How to improve decode rate on phone using GPU? #13923

Uh oh!

Arya-Hari May 30, 2025

Replies: 0 comments

Arya-Hari
May 30, 2025