wgml − GPU local inference every platform

wgml is a set of Rust libraries exposing WebGPU shaders and kernels for local Large Language Models (LLMs) inference on the GPU. It is cross-platform and runs on the web. wgml can be used as a rust library to assemble your own transformer from the provided operators (and write your owns on top of it).

Aside from the library, two binary crates are provided:

wgml-bench is a basic benchmarking utility for measuring calculation times for matrix multiplication with various quantization formats.
wgml-chat is a basic chat GUI application for loading GGUF files and chat with the model. It can be run natively or on the browser. Check out its README for details on how to run it. You can run it from your browser with the online demo.

⚠️ wgml is still under heavy development and might be lacking some important features. Contributions are welcome!

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
.cargo		.cargo
.github/workflows		.github/workflows
.idea		.idea
assets/gguf		assets/gguf
crates		crates
.gitignore		.gitignore
Cargo.toml		Cargo.toml
LICENSE-APACHE.txt		LICENSE-APACHE.txt
LICENSE-MIT.txt		LICENSE-MIT.txt
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Licenses found

Repository files navigation

wgml − GPU local inference every platform

About

Licenses found

Releases

Packages

Languages

License

Licenses found

dimforge/wgml

Folders and files

Latest commit

History

Repository files navigation

wgml − GPU local inference every platform

About

Resources

License

Licenses found

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages