A wrapper around llama.cpp that provides a server with verifiable inference capabilities. Blama enables verifiable AI inference, ensuring transparency and trust in model outputs.
- High Performance: Built on top of the optimized llama.cpp engine
- RESTful API: Easy-to-use HTTP server interface
- Model Support: Compatible with GGUF format models
- C++ compiler with C++17 support
- CMake 3.14+
- Git
- Start the server:
./blama-server path/to/your/model.gguf
- Make complete text requests:
curl -X POST http://localhost:7331/complete \
-H "Content-Type: application/json" \
-d '{
"prompt": 'The first man to',
"max_tokens": 100
}'
- Verify completion results:
curl -X POST http://localhost:7331/verify_completion \
-H "Content-Type: application/json" \
-d '{
"request": <Here should be added the request to /complete>,
"response": <Here should be added the response from /complete>
}'
Read more in the document here.
Blama implements a verification system that ensures the model predictions are correct based on the output logits of each token generation.
-
Each inference request generates an array of token step generation results. Each token step has an array of logits (top 10) taken from the context.
-
The same request + response then is send back for verification
-
Each verification request will create the same model and fill the context with the response's token steps. During the context filling we'll produce again the same token steps but with the logits from the current context.
-
Compare the the logits from the request and those returned during context filling. The algorithm can be checked here
- Any GGUF-compatible model that is compatible with llama.cpp
# List available presets
cmake --list-presets
# Configure with a preset
cmake --preset debug
# Build with a preset
cmake --build --preset debug
- llama.cpp for the high-performance inference engine
- Meta AI for the Llama model architecture
- The open source community for contributions and feedback
- Issues: GitHub Issues
Note: This project is under active development. APIs may change between versions.