RISC-V (TH1520&D1) benchmark and hack for <1GB DDR device #288

Zepan · 2023-03-19T10:14:34Z

Hi,
Just test on RISC-V board:
4xC910 2.0G TH1520 LicheePi4A (https://sipeed.com/licheepi4a) with 16GB LPDDR4X.
about 6s/token without any instruction acceleration, and it should be <5s/token when boost to 2.5GHz.

llama_model_load: ggml ctx size = 668.34 MB
llama_model_load: memory_size =   512.00 MB, n_mem = 16384
llama_model_load: .................................... done
llama_model_load: model size =  4017.27 MB / num tensors = 291

system_info: n_threads = 4 / 4 | AVX = 0 | AVX2 = 0 | AVX512 = 0 | FMA = 0 | NEON = 0 | ARM_FMA = 0 | F16C = 0 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 0 | SSE3 = 0 | VSX = 0 | 

main: prompt: 'They'
main: number of tokens in prompt = 2
     1 -> ''
 15597 -> 'They'

sampling parameters: temp = 0.800000, top_k = 40, top_p = 0.950000, repeat_last_n = 64, repeat_penalty = 1.300000


They are now available for sale at the cost of Rs 20,5

main: mem per token = 14368644 bytes
main:     load time =    91.25 ms
main:   sample time =    39.22 ms
main:  predict time = 105365.27 ms / 6197.96 ms per token
main:    total time = 129801.62 ms

1xC906 1.0G D1 LicheeRV with 1GB DDR3.
about 180s/token without any instruction acceleration, it is very slow due to lack of memory.

main: mem per token = 14368644 bytes
main:     load time =  1412.77 ms
main:   sample time =   185.77 ms
main:  predict time = 3171739.00 ms / 186572.88 ms per token
main:    total time = 3609667.50 ms

Note the ggml ctx size is 668MB, not 4668MB, I hack the code for low memory(>=512MB) device to run llama, and it is not use swap memory, as regard sd card as memory will demage sd card soon.
Should this feature need add in?

And here is a time-lapse photography for D1 run llama 7B model, it is super slow even in 120X speedup, but it works!

llama_d1_2xsmall.mp4

The text was updated successfully, but these errors were encountered:

diimdeep · 2023-03-19T12:30:28Z

I am very interested in trying to run your code on 1GB ARM device, feel free to share it in your repo!

BarfingLemurs · 2023-03-19T13:04:40Z

@Zepan Unfortunately, we cannot see the video!
How did you modify this codebase to support lower ram devices?
Currently, I can run on a 6gb phone, but this could be good to fit 4gb phone!

Green-Sky · 2023-03-19T13:18:36Z

Unfortunately, we cannot see the video!

works in external video player.

1xC906 1.0G D1 LicheeRV with 1GB DDR3.
about 180s/token without any instruction acceleration, it is very slow due to lack of memory.

main: mem per token = 14368644 bytes
main: load time = 1412.77 ms
main: sample time = 185.77 ms
main: predict time = 3171739.00 ms / 186572.88 ms per token
main: total time = 3609667.50 ms

Note the ggml ctx size is 668MB, not 4668MB, I hack the code for low memory(>=512MB) device to run llama, and it is not use swap memory, as regard sd card as memory will demage sd card soon.
Should this feature need add in?

You can apply this patch #294 to cut llama_model_load: memory_size = 512.00 MB in half (edit: now in master)

Zepan · 2023-03-19T15:31:47Z

I am very interested in trying to run your code on 1GB ARM device, feel free to share it in your repo!

Here is my repo: https://github.com/Zepan/llama.cpp
just use mmap to reduce Model memory, as it is readonly.
compare to swap method, it won't hurt SD/eMMC, but will have really slow speed due to storage bandwidth.

kassane · 2023-03-20T15:57:42Z

@Zepan,

Missing some fixes on quantize.cpp.

error: error(compilation): clang failed with stderr: /home/kassane/llama-sipeed/quantize.cpp:139:33: warning: cast from 'const char *' to 'char *' drops const qualifier [-Wcast-qual]
/home/kassane/llama-sipeed/quantize.cpp:140:33: warning: cast from 'const char *' to 'char *' drops const qualifier [-Wcast-qual]
/home/kassane/llama-sipeed/quantize.cpp:148:19: error: no member named 'score' in 'gpt_vocab'
/home/kassane/llama-sipeed/quantize.cpp:270:35: warning: comparison of integers of different signs: 'int' and 'std::vector<long>::size_type' (aka 'unsigned long') [-Wsign-compare]
/home/kassane/llama-sipeed/quantize.cpp:274:35: warning: comparison of integers of different signs: 'int' and 'std::vector<long>::size_type' (aka 'unsigned long') [-Wsign-compare]
/home/kassane/llama-sipeed/quantize.cpp:292:31: warning: comparison of integers of different signs: 'int' and 'std::vector<long>::size_type' (aka 'unsigned long') [-Wsign-compare]
/home/kassane/llama-sipeed/quantize.cpp:297:31: warning: comparison of integers of different signs: 'int' and 'std::vector<long>::size_type' (aka 'unsigned long') [-Wsign-compare]

Zepan · 2023-03-21T01:47:45Z

@Zepan,

Missing some fixes on quantize.cpp.

error: error(compilation): clang failed with stderr: /home/kassane/llama-sipeed/quantize.cpp:139:33: warning: cast from 'const char *' to 'char *' drops const qualifier [-Wcast-qual]
/home/kassane/llama-sipeed/quantize.cpp:140:33: warning: cast from 'const char *' to 'char *' drops const qualifier [-Wcast-qual]
/home/kassane/llama-sipeed/quantize.cpp:148:19: error: no member named 'score' in 'gpt_vocab'
/home/kassane/llama-sipeed/quantize.cpp:270:35: warning: comparison of integers of different signs: 'int' and 'std::vector<long>::size_type' (aka 'unsigned long') [-Wsign-compare]
/home/kassane/llama-sipeed/quantize.cpp:274:35: warning: comparison of integers of different signs: 'int' and 'std::vector<long>::size_type' (aka 'unsigned long') [-Wsign-compare]
/home/kassane/llama-sipeed/quantize.cpp:292:31: warning: comparison of integers of different signs: 'int' and 'std::vector<long>::size_type' (aka 'unsigned long') [-Wsign-compare]
/home/kassane/llama-sipeed/quantize.cpp:297:31: warning: comparison of integers of different signs: 'int' and 'std::vector<long>::size_type' (aka 'unsigned long') [-Wsign-compare]

I don't have this error in my repo, and I don't change quantize.cpp. you can comment quantize in Makefile and try again.

nixjoe · 2023-05-30T06:26:15Z

you yi si , any more hardware test? say rk3588?

github-actions · 2024-04-10T01:08:06Z

This issue was closed because it has been inactive for 14 days since being marked as stale.

gjmulder added enhancement New feature or request hardware Hardware related need more info The OP should provide more details about the issue labels Mar 19, 2023

kassane mentioned this issue Mar 20, 2023

RISC-V support? #165

Closed

github-actions bot added the stale label Mar 25, 2024

github-actions bot closed this as completed Apr 10, 2024

Bearsaerker mentioned this issue Mar 12, 2025

Eval bug: Gemma 3 extremly slow prompt processing when using quantized kv cache. #12352

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

RISC-V (TH1520&D1) benchmark and hack for <1GB DDR device #288

RISC-V (TH1520&D1) benchmark and hack for <1GB DDR device #288

Zepan commented Mar 19, 2023

diimdeep commented Mar 19, 2023 •

edited

Loading

BarfingLemurs commented Mar 19, 2023

Green-Sky commented Mar 19, 2023 •

edited

Loading

Zepan commented Mar 19, 2023

kassane commented Mar 20, 2023

Zepan commented Mar 21, 2023

nixjoe commented May 30, 2023

github-actions bot commented Apr 10, 2024

RISC-V (TH1520&D1) benchmark and hack for <1GB DDR device #288

RISC-V (TH1520&D1) benchmark and hack for <1GB DDR device #288

Comments

Zepan commented Mar 19, 2023

diimdeep commented Mar 19, 2023 • edited Loading

BarfingLemurs commented Mar 19, 2023

Green-Sky commented Mar 19, 2023 • edited Loading

Zepan commented Mar 19, 2023

kassane commented Mar 20, 2023

Zepan commented Mar 21, 2023

nixjoe commented May 30, 2023

github-actions bot commented Apr 10, 2024

diimdeep commented Mar 19, 2023 •

edited

Loading

Green-Sky commented Mar 19, 2023 •

edited

Loading