-
Notifications
You must be signed in to change notification settings - Fork 59
Add mikupad to ik_llama as an alternative WebUI #558
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Now that I have removed the hardcoded extension loading, I do think this is in a state where it can be used by others (and potentially provide feedback), but I will still be working on completing things from the "To-do" list above until it is ready for review (and will update the post above). |
Heya @saood06 I had some time this morning to kick the tires on this PR. My high level understanding is that this PR adds new web endpoint for Mikupad as an alternative to the default built-in web interface. I don't typically use the built-in web interface, but I did by mest to try it out. Here is my experience: 👈logs and screenshots# get setup
$ cd ik_llama.cpp
$ git fetch upstream
$ git checkout s6/mikupad
$ git rev-parse --short HEAD
3a634c7a
# i already had the sqllite OS level lib installed apparently:
$ pacman -Ss libsql
core/sqlite 3.50.2-1 [installed]
A C library that implements an SQL database engine
# compile
$ cmake -B build -DGGML_CUDA=ON -DGGML_VULKAN=OFF -DGGML_RPC=OFF -DGGML_BLAS=OFF -DGGML_CUDA_F16=ON -DGGML_SCHED_MAX_COPIES=1 -DGGML_CUDA_IQK_FORCE_BF16=1
$ cmake --build build --config Release -j $(nproc) Then I tested my usual command like so: # run llama-server
model=/mnt/astrodata/llm/models/ubergarm/Qwen3-14B-GGUF/Qwen3-14B-IQ4_KS.gguf
CUDA_VISIBLE_DEVICES="0" \
./build/bin/llama-server \
--model "$model" \
--alias ubergarm/Qwen3-14B-IQ4_KS \
-fa \
-ctk f16 -ctv f16 \
-c 32768 \
-ngl 99 \
--threads 1 \
--host 127.0.0.1 \
--port 8080 When I open a browser to 127.0.0.1:8080 I get a nice looking Web UI that is simple and sleek with a just a few options for easy quick configuring: Then I added the extra arguments you mention above and run again: # run llama-server
model=/mnt/astrodata/llm/models/ubergarm/Qwen3-14B-GGUF/Qwen3-14B-IQ4_KS.gguf
CUDA_VISIBLE_DEVICES="0" \
./build/bin/llama-server \
--model "$model" \
--alias ubergarm/Qwen3-14B-IQ4_KS \
-fa \
-ctk f16 -ctv f16 \
-c 32768 \
-ngl 99 \
--threads 1 \
--host 127.0.0.1 \
--port 8080 \
--path ./examples/server/public_mikupad \
--sql-save-file sqlite-save.sql This time a different color background appears but seems throw an async error in the web debug console as shown in this screenshot: The server seems to be throwing 500's so maybe I didn't go to the correct endpoint or do I need to do something else to properly access it? NFO [ init] initializing slots | tid="140147414781952" timestamp=1751293931 n_slots=1
INFO [ init] new slot | tid="140147414781952" timestamp=1751293931 id_slot=0 n_ctx_slot=32768
INFO [ main] model loaded | tid="140147414781952" timestamp=1751293931
INFO [ main] chat template | tid="140147414781952" timestamp=1751293931 chat_example="<|im_start|>system\nYou are a helpful assistant<|im_end|>\n<|im_start|>user\nHello<|im_end|>\n<|im_start|>assistant\nHi there<|im_end|>\n<|im_start|>user\nHow are you?<|im_end|>\n<|im_start|>assistant\n" built_in=true
INFO [ main] HTTP server listening | tid="140147414781952" timestamp=1751293931 n_threads_http="31" port="8080" hostname="127.0.0.1"
INFO [ update_slots] all slots are idle | tid="140147414781952" timestamp=1751293931
INFO [ log_server_request] request | tid="140145881767936" timestamp=1751293939 remote_addr="127.0.0.1" remote_port=54320 status=200 method="GET" path="/" params={}
INFO [ log_server_request] request | tid="140145881767936" timestamp=1751293939 remote_addr="127.0.0.1" remote_port=54320 status=200 method="GET" path="/version" params={}
INFO [ log_server_request] request | tid="140145881767936" timestamp=1751293939 remote_addr="127.0.0.1" remote_port=54320 status=500 method="POST" path="/load" params={}
INFO [ log_server_request] request | tid="140145873375232" timestamp=1751293944 remote_addr="127.0.0.1" remote_port=54336 status=200 method="GET" path="/" params={}
INFO [ log_server_request] request | tid="140145873375232" timestamp=1751293944 remote_addr="127.0.0.1" remote_port=54336 status=200 method="GET" path="/version" params={}
INFO [ log_server_request] request | tid="140145873375232" timestamp=1751293944 remote_addr="127.0.0.1" remote_port=54336 status=500 method="POST" path="/load" params={}
INFO [ log_server_request] request | tid="140145873375232" timestamp=1751293944 remote_addr="127.0.0.1" remote_port=54336 status=404 method="GET" path="/favicon.ico" params={} |
I am interested in this. Mikupad is excellent for testing prompt formatting and sampling, with how it shows logprobs over generated tokens. It's also quite fast with big blocks of text. |
Glad to hear it. I agree. I love being able to see probs for each token (and even be able to pick a replacement from the specified tokens). If you are an existing mikupad user you may need to use the DB migration script I put in lmg-anon/mikupad#113 if you want to migrate a whole database, migrating individual sessions via import and export should work just fine I think.
You are doing the correct steps, I was able to reproduce the issue of not working with a fresh sql file (so far my testing was done with backup databases with existing data). Thanks for testing, I'll let you know when it works so that you can test it again if you so choose. |
Thanks for confirming, correct I didn't have a |
Just pushed a fix. ( The issue was with something that is on my to-do list to refactor and potentially remove but for now a quick fix for the code as is). Edit: The fix is in the html only so no compile or even relaunch needed just a reload should fix it |
Aye! It fired right up this time and I was able to play with it a little and have a successful generation. It is cool how it I can mouse over the tokens to see the probabilities! |
Nice.
Yes, I like to turn on the "Color by probability" to be able to see low probability tokens at a glance. It might also be useful to you for benchmarking quants or models (saving and cloning prompts). |
This is getting surprisingly little testing. Nevertheless we can merge whenever @saood06 feels it is ready and removes the "draft" label. |
This PR adds mikupad (and new endpoints to
server.cpp
that mikupad uses to manage its sql database).It can be launched with
--path ../../examples/server/public_mikupad --sql-save-file [...]
with an optional--sqlite-zstd-ext-file [...]
.The path serves the index.html, but the methods the endpoint rely on are only enabled when a
sql-save-file
is passed.The provided mikupad file has the following changes from the original:
This does add sqlite_modern_cpp as a library to common, alongside the other third party libraries this project already uses such as
nlohmann/json
,stb_image
,base64.hpp
.It also supports dynamically loading phiresky/sqlite-zstd which for allows one to use compressed sql databases, results may vary but for me it is very useful:
To-do:
nextSessionId
andselectedSessionId
to a new table (and maybe haveselectedSessionId
be in the URL as a fragment)Potential roadmap items:
n_ctx
) and not from user input, and also changing or even removing the usage of that variable (or just from the UI). It is used for setting maximums for Penalty Range for some samplers (useful but could be frustrating if set wrong as knowing that is not very clear), and to truncate it seems in some situation (not useful in my view).I am still looking for feedback even in this draft state (either on use, the code or even the Roadmap/To-do list).
An image of the new resizable sessions section (

All
group is always on top, and contains all prompts, number is how many prompts in that group ):