Add mikupad to ik_llama as an alternative WebUI #558

saood06 · 2025-06-26T08:53:29Z

This PR adds mikupad (and new endpoints to server.cpp that mikupad uses to manage its sql database).

It can be launched with --path ../../examples/server/public_mikupad --sql-save-file [...] with an optional --sqlite-zstd-ext-file [...].

The path serves the index.html, but the methods the endpoint rely on are only enabled when a sql-save-file is passed.

The provided mikupad file has the following changes from the original:

it is built on top of Add support for seperate name table (ServerDBAdapter support only) lmg-anon/mikupad#113 which cut my initial loadtime from minutes to seconds
streamlined code (and UI sections), removing support for other LLM endpoints and data storage models
fixed a longstanding bug with highlight misalignment (using the fix that was mentioned in the issue discussion)
made the sidebar and sessions sections resizable (see image below)
add a second list of auto-grouped sessions (currently done by exact name match updated dynamically, but might add ways to configure it [hide some, add more with custom matching rules] )

This does add sqlite_modern_cpp as a library to common, alongside the other third party libraries this project already uses such as nlohmann/json, stb_image, base64.hpp.

It also supports dynamically loading phiresky/sqlite-zstd which for allows one to use compressed sql databases, results may vary but for me it is very useful:

size before	size after	row count
31.04GB	3.40GB	14752
8.62GB	581.33MB	8042
12.54 GB	2.04 GB	1202

To-do:

Potential roadmap items:

Add a mode that creates new sessions on branching or prediction
SQLite Wasm option
Allow for slot saves to be in the database. This would allow for it to be compressed (similar to prompts there can often be a lot of redundancy between saves).
Add a new pure black version of Monospace dark (for OLED screens).
Add the ability to mask tokens from being processed (for use with think tokens as they are supposed to be removed once the response is finished).
max content length should be obtained from server (based on n_ctx) and not from user input, and also changing or even removing the usage of that variable (or just from the UI). It is used for setting maximums for Penalty Range for some samplers (useful but could be frustrating if set wrong as knowing that is not very clear), and to truncate it seems in some situation (not useful in my view).

I am still looking for feedback even in this draft state (either on use, the code or even the Roadmap/To-do list).

An image of the new resizable sessions section (All group is always on top, and contains all prompts, number is how many prompts in that group ):

saood06 · 2025-06-28T01:46:03Z

Now that I have removed the hardcoded extension loading, I do think this is in a state where it can be used by others (and potentially provide feedback), but I will still be working on completing things from the "To-do" list above until it is ready for review (and will update the post above).

ubergarm · 2025-06-30T14:34:30Z

Heya @saood06 I had some time this morning to kick the tires on this PR.

My high level understanding is that this PR adds new web endpoint for Mikupad as an alternative to the default built-in web interface.

I don't typically use the built-in web interface, but I did by mest to try it out. Here is my experience:

👈logs and screenshots

# get setup
$ cd ik_llama.cpp
$ git fetch upstream
$ git checkout s6/mikupad
$ git rev-parse --short HEAD
3a634c7a

# i already had the sqllite OS level lib installed apparently:
$ pacman -Ss libsql
core/sqlite 3.50.2-1 [installed]
    A C library that implements an SQL database engine

# compile
$ cmake -B build -DGGML_CUDA=ON -DGGML_VULKAN=OFF -DGGML_RPC=OFF -DGGML_BLAS=OFF -DGGML_CUDA_F16=ON -DGGML_SCHED_MAX_COPIES=1 -DGGML_CUDA_IQK_FORCE_BF16=1
$ cmake --build build --config Release -j $(nproc)

Then I tested my usual command like so:

# run llama-server
model=/mnt/astrodata/llm/models/ubergarm/Qwen3-14B-GGUF/Qwen3-14B-IQ4_KS.gguf
CUDA_VISIBLE_DEVICES="0" \
  ./build/bin/llama-server \
    --model "$model" \
    --alias ubergarm/Qwen3-14B-IQ4_KS \
    -fa \
    -ctk f16 -ctv f16 \
    -c 32768 \
    -ngl 99 \
    --threads 1 \
    --host 127.0.0.1 \
    --port 8080

When I open a browser to 127.0.0.1:8080 I get a nice looking Web UI that is simple and sleek with a just a few options for easy quick configuring:

Then I added the extra arguments you mention above and run again:

# run llama-server
model=/mnt/astrodata/llm/models/ubergarm/Qwen3-14B-GGUF/Qwen3-14B-IQ4_KS.gguf
CUDA_VISIBLE_DEVICES="0" \
  ./build/bin/llama-server \
    --model "$model" \
    --alias ubergarm/Qwen3-14B-IQ4_KS \
    -fa \
    -ctk f16 -ctv f16 \
    -c 32768 \
    -ngl 99 \
    --threads 1 \
    --host 127.0.0.1 \
    --port 8080 \
    --path ./examples/server/public_mikupad \
    --sql-save-file sqlite-save.sql

This time a different color background appears but seems throw an async error in the web debug console as shown in this screenshot:

The server seems to be throwing 500's so maybe I didn't go to the correct endpoint or do I need to do something else to properly access it?

NFO [                    init] initializing slots | tid="140147414781952" timestamp=1751293931 n_slots=1
INFO [                    init] new slot | tid="140147414781952" timestamp=1751293931 id_slot=0 n_ctx_slot=32768
INFO [                    main] model loaded | tid="140147414781952" timestamp=1751293931
INFO [                    main] chat template | tid="140147414781952" timestamp=1751293931 chat_example="<|im_start|>system\nYou are a helpful assistant<|im_end|>\n<|im_start|>user\nHello<|im_end|>\n<|im_start|>assistant\nHi there<|im_end|>\n<|im_start|>user\nHow are you?<|im_end|>\n<|im_start|>assistant\n" built_in=true
INFO [                    main] HTTP server listening | tid="140147414781952" timestamp=1751293931 n_threads_http="31" port="8080" hostname="127.0.0.1"
INFO [            update_slots] all slots are idle | tid="140147414781952" timestamp=1751293931
INFO [      log_server_request] request | tid="140145881767936" timestamp=1751293939 remote_addr="127.0.0.1" remote_port=54320 status=200 method="GET" path="/" params={}
INFO [      log_server_request] request | tid="140145881767936" timestamp=1751293939 remote_addr="127.0.0.1" remote_port=54320 status=200 method="GET" path="/version" params={}
INFO [      log_server_request] request | tid="140145881767936" timestamp=1751293939 remote_addr="127.0.0.1" remote_port=54320 status=500 method="POST" path="/load" params={}
INFO [      log_server_request] request | tid="140145873375232" timestamp=1751293944 remote_addr="127.0.0.1" remote_port=54336 status=200 method="GET" path="/" params={}
INFO [      log_server_request] request | tid="140145873375232" timestamp=1751293944 remote_addr="127.0.0.1" remote_port=54336 status=200 method="GET" path="/version" params={}
INFO [      log_server_request] request | tid="140145873375232" timestamp=1751293944 remote_addr="127.0.0.1" remote_port=54336 status=500 method="POST" path="/load" params={}
INFO [      log_server_request] request | tid="140145873375232" timestamp=1751293944 remote_addr="127.0.0.1" remote_port=54336 status=404 method="GET" path="/favicon.ico" params={}

Downtown-Case · 2025-06-30T15:42:14Z

I am interested in this.

Mikupad is excellent for testing prompt formatting and sampling, with how it shows logprobs over generated tokens. It's also quite fast with big blocks of text.

saood06 · 2025-06-30T18:30:02Z

I am interested in this.

Mikupad is excellent for testing prompt formatting and sampling, with how it shows logprobs over generated tokens. It's also quite fast with big blocks of text.

Glad to hear it. I agree. I love being able to see probs for each token (and even be able to pick a replacement from the specified tokens).

If you are an existing mikupad user you may need to use the DB migration script I put in lmg-anon/mikupad#113 if you want to migrate a whole database, migrating individual sessions via import and export should work just fine I think.

This time a different color background appears but seems throw an async error in the web debug console as shown in this screenshot:
...
The server seems to be throwing 500's so maybe I didn't go to the correct endpoint or do I need to do something else to properly access it?

You are doing the correct steps, I was able to reproduce the issue of not working with a fresh sql file (so far my testing was done with backup databases with existing data). Thanks for testing, I'll let you know when it works so that you can test it again if you so choose.

ubergarm · 2025-06-30T19:41:28Z

You are doing the correct steps, I was able to reproduce the issue of not working with a fresh sql file (so far my testing was done with backup databases with existing data). Thanks for testing, I'll let you know when it works so that you can test it again if you so choose.

Thanks for confirming, correct I didn't have a .sql file already in place but just made up that name. Happy to try again whenever u are ready!

saood06 · 2025-06-30T19:54:11Z

Thanks for confirming, correct I didn't have a .sql file already in place but just made up that name. Happy to try again whenever u are ready!

Just pushed a fix. ( The issue was with something that is on my to-do list to refactor and potentially remove but for now a quick fix for the code as is).

Edit: The fix is in the html only so no compile or even relaunch needed just a reload should fix it

ubergarm · 2025-06-30T22:34:56Z

@saood06

Aye! It fired right up this time and I was able to play with it a little and have a successful generation. It is cool how it I can mouse over the tokens to see the probabilities!

saood06 · 2025-06-30T22:40:08Z

Aye! It fired right up this time and I was able to play with it a little and have a successful generation.

Nice.

It is cool how it I can mouse over the tokens to see the probabilities!

Yes, I like to turn on the "Color by probability" to be able to see low probability tokens at a glance.

It might also be useful to you for benchmarking quants or models (saving and cloning prompts).

ikawrakow · 2025-07-02T08:09:57Z

This is getting surprisingly little testing. Nevertheless we can merge whenever @saood06 feels it is ready and removes the "draft" label.

saood06 added 4 commits June 26, 2025 02:19

mikupad.html in ik_llama.cpp (functional but WIP)

4c7579e

Remove hardcoded extension and add error handling to extension loading

8fd4774

Update version number and add features array to version

66c8dc4

Make version endpoint always accessible

3a634c7

Fix case with empty sql

ddc7ec4

saood06 added 2 commits July 5, 2025 01:24

Add useful error message when launched without sql file

a1cef1d

Add sigma sampler

64550a3

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add mikupad to ik_llama as an alternative WebUI #558

Add mikupad to ik_llama as an alternative WebUI #558

Uh oh!

saood06 commented Jun 26, 2025 •

edited

Loading

Uh oh!

saood06 commented Jun 28, 2025

Uh oh!

ubergarm commented Jun 30, 2025 •

edited

Loading

Uh oh!

Downtown-Case commented Jun 30, 2025 •

edited

Loading

Uh oh!

saood06 commented Jun 30, 2025

Uh oh!

ubergarm commented Jun 30, 2025

Uh oh!

saood06 commented Jun 30, 2025 •

edited

Loading

Uh oh!

ubergarm commented Jun 30, 2025

Uh oh!

saood06 commented Jun 30, 2025 •

edited

Loading

Uh oh!

ikawrakow commented Jul 2, 2025

Uh oh!

Uh oh!

Add mikupad to ik_llama as an alternative WebUI #558

Are you sure you want to change the base?

Add mikupad to ik_llama as an alternative WebUI #558

Uh oh!

Conversation

saood06 commented Jun 26, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

saood06 commented Jun 28, 2025

Uh oh!

ubergarm commented Jun 30, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Downtown-Case commented Jun 30, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

saood06 commented Jun 30, 2025

Uh oh!

ubergarm commented Jun 30, 2025

Uh oh!

saood06 commented Jun 30, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ubergarm commented Jun 30, 2025

Uh oh!

saood06 commented Jun 30, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ikawrakow commented Jul 2, 2025

Uh oh!

Uh oh!

saood06 commented Jun 26, 2025 •

edited

Loading

ubergarm commented Jun 30, 2025 •

edited

Loading

Downtown-Case commented Jun 30, 2025 •

edited

Loading

saood06 commented Jun 30, 2025 •

edited

Loading

saood06 commented Jun 30, 2025 •

edited

Loading