Skip to content

Add mikupad to ik_llama as an alternative WebUI #558

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 7 commits into
base: main
Choose a base branch
from
Draft

Conversation

saood06
Copy link
Collaborator

@saood06 saood06 commented Jun 26, 2025

This PR adds mikupad (and new endpoints to server.cpp that mikupad uses to manage its sql database).

It can be launched with --path ../../examples/server/public_mikupad --sql-save-file [...] with an optional --sqlite-zstd-ext-file [...].

The path serves the index.html, but the methods the endpoint rely on are only enabled when a sql-save-file is passed.

The provided mikupad file has the following changes from the original:

  • it is built on top of Add support for seperate name table (ServerDBAdapter support only) lmg-anon/mikupad#113 which cut my initial loadtime from minutes to seconds
  • streamlined code (and UI sections), removing support for other LLM endpoints and data storage models
  • fixed a longstanding bug with highlight misalignment (using the fix that was mentioned in the issue discussion)
  • made the sidebar and sessions sections resizable (see image below)
  • add a second list of auto-grouped sessions (currently done by exact name match updated dynamically, but might add ways to configure it [hide some, add more with custom matching rules] )

This does add sqlite_modern_cpp as a library to common, alongside the other third party libraries this project already uses such as nlohmann/json, stb_image, base64.hpp.

It also supports dynamically loading phiresky/sqlite-zstd which for allows one to use compressed sql databases, results may vary but for me it is very useful:

size before size after row count
31.04GB 3.40GB 14752
8.62GB 581.33MB 8042
12.54 GB 2.04 GB 1202

To-do:

  • Dynamically load extension
  • Update version endpoint with new version (needed because the table changes make it incompatible with the old version) and add features enabled array
  • update the html to display a useful error message (guiding them on how to pass a sql file on launch) if sql feature is not enabled
  • Support top-n σ sampler (untested)
  • Update license (including a potential new AUTHORS file for mikupad)
  • Documentation
  • I think compile will fail if it can't find sqlite so fix that if that is the case
  • Make use of Add an endpoint that lists all the saved prompt caches to server #502 and support for loading and saving the KV cache
  • move template selected to sampling, and make sampling have it's own saves like sessions (and available templates) do. (Make it easy to have preset profiles of templates/sampler, and also would also make it so that when you create a new session it can prefill in the prompt based on the chosen template, instead of the "miku prompt" which features the mistral template like it does now).
  • Move nextSessionId and selectedSessionId to a new table (and maybe have selectedSessionId be in the URL as a fragment)
  • Implement compression (add endpoints that do zstd_enable_transparent, zstd_incremental_maintenance, and maybe VACUUM and also add UI for that in the html if the feature is enabled)

Potential roadmap items:

  • Add a mode that creates new sessions on branching or prediction
  • SQLite Wasm option
  • Allow for slot saves to be in the database. This would allow for it to be compressed (similar to prompts there can often be a lot of redundancy between saves).
  • Add a new pure black version of Monospace dark (for OLED screens).
  • Add the ability to mask tokens from being processed (for use with think tokens as they are supposed to be removed once the response is finished).
  • max content length should be obtained from server (based on n_ctx) and not from user input, and also changing or even removing the usage of that variable (or just from the UI). It is used for setting maximums for Penalty Range for some samplers (useful but could be frustrating if set wrong as knowing that is not very clear), and to truncate it seems in some situation (not useful in my view).

I am still looking for feedback even in this draft state (either on use, the code or even the Roadmap/To-do list).

An image of the new resizable sessions section (All group is always on top, and contains all prompts, number is how many prompts in that group ):
image

@saood06
Copy link
Collaborator Author

saood06 commented Jun 28, 2025

Now that I have removed the hardcoded extension loading, I do think this is in a state where it can be used by others (and potentially provide feedback), but I will still be working on completing things from the "To-do" list above until it is ready for review (and will update the post above).

@ubergarm
Copy link
Contributor

ubergarm commented Jun 30, 2025

Heya @saood06 I had some time this morning to kick the tires on this PR.

My high level understanding is that this PR adds new web endpoint for Mikupad as an alternative to the default built-in web interface.

I don't typically use the built-in web interface, but I did by mest to try it out. Here is my experience:

👈logs and screenshots
# get setup
$ cd ik_llama.cpp
$ git fetch upstream
$ git checkout s6/mikupad
$ git rev-parse --short HEAD
3a634c7a

# i already had the sqllite OS level lib installed apparently:
$ pacman -Ss libsql
core/sqlite 3.50.2-1 [installed]
    A C library that implements an SQL database engine

# compile
$ cmake -B build -DGGML_CUDA=ON -DGGML_VULKAN=OFF -DGGML_RPC=OFF -DGGML_BLAS=OFF -DGGML_CUDA_F16=ON -DGGML_SCHED_MAX_COPIES=1 -DGGML_CUDA_IQK_FORCE_BF16=1
$ cmake --build build --config Release -j $(nproc)

Then I tested my usual command like so:

# run llama-server
model=/mnt/astrodata/llm/models/ubergarm/Qwen3-14B-GGUF/Qwen3-14B-IQ4_KS.gguf
CUDA_VISIBLE_DEVICES="0" \
  ./build/bin/llama-server \
    --model "$model" \
    --alias ubergarm/Qwen3-14B-IQ4_KS \
    -fa \
    -ctk f16 -ctv f16 \
    -c 32768 \
    -ngl 99 \
    --threads 1 \
    --host 127.0.0.1 \
    --port 8080

When I open a browser to 127.0.0.1:8080 I get a nice looking Web UI that is simple and sleek with a just a few options for easy quick configuring:

ik_llama-saood06-mikupad-pr558

Then I added the extra arguments you mention above and run again:

# run llama-server
model=/mnt/astrodata/llm/models/ubergarm/Qwen3-14B-GGUF/Qwen3-14B-IQ4_KS.gguf
CUDA_VISIBLE_DEVICES="0" \
  ./build/bin/llama-server \
    --model "$model" \
    --alias ubergarm/Qwen3-14B-IQ4_KS \
    -fa \
    -ctk f16 -ctv f16 \
    -c 32768 \
    -ngl 99 \
    --threads 1 \
    --host 127.0.0.1 \
    --port 8080 \
    --path ./examples/server/public_mikupad \
    --sql-save-file sqlite-save.sql

This time a different color background appears but seems throw an async error in the web debug console as shown in this screenshot:

ik_llama-saood06-mikupad-pr558-test-2

The server seems to be throwing 500's so maybe I didn't go to the correct endpoint or do I need to do something else to properly access it?

NFO [                    init] initializing slots | tid="140147414781952" timestamp=1751293931 n_slots=1
INFO [                    init] new slot | tid="140147414781952" timestamp=1751293931 id_slot=0 n_ctx_slot=32768
INFO [                    main] model loaded | tid="140147414781952" timestamp=1751293931
INFO [                    main] chat template | tid="140147414781952" timestamp=1751293931 chat_example="<|im_start|>system\nYou are a helpful assistant<|im_end|>\n<|im_start|>user\nHello<|im_end|>\n<|im_start|>assistant\nHi there<|im_end|>\n<|im_start|>user\nHow are you?<|im_end|>\n<|im_start|>assistant\n" built_in=true
INFO [                    main] HTTP server listening | tid="140147414781952" timestamp=1751293931 n_threads_http="31" port="8080" hostname="127.0.0.1"
INFO [            update_slots] all slots are idle | tid="140147414781952" timestamp=1751293931
INFO [      log_server_request] request | tid="140145881767936" timestamp=1751293939 remote_addr="127.0.0.1" remote_port=54320 status=200 method="GET" path="/" params={}
INFO [      log_server_request] request | tid="140145881767936" timestamp=1751293939 remote_addr="127.0.0.1" remote_port=54320 status=200 method="GET" path="/version" params={}
INFO [      log_server_request] request | tid="140145881767936" timestamp=1751293939 remote_addr="127.0.0.1" remote_port=54320 status=500 method="POST" path="/load" params={}
INFO [      log_server_request] request | tid="140145873375232" timestamp=1751293944 remote_addr="127.0.0.1" remote_port=54336 status=200 method="GET" path="/" params={}
INFO [      log_server_request] request | tid="140145873375232" timestamp=1751293944 remote_addr="127.0.0.1" remote_port=54336 status=200 method="GET" path="/version" params={}
INFO [      log_server_request] request | tid="140145873375232" timestamp=1751293944 remote_addr="127.0.0.1" remote_port=54336 status=500 method="POST" path="/load" params={}
INFO [      log_server_request] request | tid="140145873375232" timestamp=1751293944 remote_addr="127.0.0.1" remote_port=54336 status=404 method="GET" path="/favicon.ico" params={}

@Downtown-Case
Copy link

Downtown-Case commented Jun 30, 2025

I am interested in this.

Mikupad is excellent for testing prompt formatting and sampling, with how it shows logprobs over generated tokens. It's also quite fast with big blocks of text.

@saood06
Copy link
Collaborator Author

saood06 commented Jun 30, 2025

I am interested in this.

Mikupad is excellent for testing prompt formatting and sampling, with how it shows logprobs over generated tokens. It's also quite fast with big blocks of text.

Glad to hear it. I agree. I love being able to see probs for each token (and even be able to pick a replacement from the specified tokens).

If you are an existing mikupad user you may need to use the DB migration script I put in lmg-anon/mikupad#113 if you want to migrate a whole database, migrating individual sessions via import and export should work just fine I think.

This time a different color background appears but seems throw an async error in the web debug console as shown in this screenshot:
...
The server seems to be throwing 500's so maybe I didn't go to the correct endpoint or do I need to do something else to properly access it?

You are doing the correct steps, I was able to reproduce the issue of not working with a fresh sql file (so far my testing was done with backup databases with existing data). Thanks for testing, I'll let you know when it works so that you can test it again if you so choose.

@ubergarm
Copy link
Contributor

You are doing the correct steps, I was able to reproduce the issue of not working with a fresh sql file (so far my testing was done with backup databases with existing data). Thanks for testing, I'll let you know when it works so that you can test it again if you so choose.

Thanks for confirming, correct I didn't have a .sql file already in place but just made up that name. Happy to try again whenever u are ready!

@saood06
Copy link
Collaborator Author

saood06 commented Jun 30, 2025

Thanks for confirming, correct I didn't have a .sql file already in place but just made up that name. Happy to try again whenever u are ready!

Just pushed a fix. ( The issue was with something that is on my to-do list to refactor and potentially remove but for now a quick fix for the code as is).

Edit: The fix is in the html only so no compile or even relaunch needed just a reload should fix it

@ubergarm
Copy link
Contributor

@saood06

Aye! It fired right up this time and I was able to play with it a little and have a successful generation. It is cool how it I can mouse over the tokens to see the probabilities!

mikupad-testing-works

@saood06
Copy link
Collaborator Author

saood06 commented Jun 30, 2025

Aye! It fired right up this time and I was able to play with it a little and have a successful generation.

Nice.

It is cool how it I can mouse over the tokens to see the probabilities!

Yes, I like to turn on the "Color by probability" to be able to see low probability tokens at a glance.

It might also be useful to you for benchmarking quants or models (saving and cloning prompts).

@ikawrakow
Copy link
Owner

This is getting surprisingly little testing. Nevertheless we can merge whenever @saood06 feels it is ready and removes the "draft" label.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants