wgml is a set of Rust libraries exposing WebGPU shaders and kernels for local Large Language Models (LLMs) inference on the GPU. It is cross-platform and runs on the web. wgml can be used as a rust library to assemble your own transformer from the provided operators (and write your owns on top of it).
Aside from the library, two binary crates are provided:
- wgml-bench is a basic benchmarking utility for measuring calculation times for matrix multiplication with various quantization formats.
- wgml-chat is a basic chat GUI application for loading GGUF files and chat with the model. It can be run natively or on the browser. Check out its README for details on how to run it. You can run it from your browser with the online demo.