Nunchaku

Nunchaku is a high-performance inference engine from MIT-Han-Lab optimized for 4-bit neural networks Nunchaku uses novel 4-bit SVDQuant quantization via DeepCompressor

Nunchaku can speed up inference by 2-5x compared to standard 16-bit or 8-bit models!

Important

Nunchaku is supported only on CUDA platforms using Turing/Ampere/Ada/Blackwell GPUs
Nunchaku requires Python 3.11 or 3.12

Note

On Blackweel GPUs, Nunchaku will default to FP4 methods
while on other GPU generations it will use INT4 methods

Install

SD.Next will attempt to auto-install pre-built wheels when possible,
but if you encounter issues, see Manual build section

Configure

To enable Nunchaku support, set appropriate quantization options in Settings -> Quantization

Enable for modules:
Enable or disable Nunchaku for specific modules
Currently supports Transformers and TE
Nunchaku attention:
Overrides current attention module with Nunchaku's custom fp16 attention mechanism
Nunchaku offloading:
Overrides current offloading method with Nunchaku's custom offloading method

Support

At the moment, Nunchaku supports following models:

FLUX.1 both Dev and Schnell variants
SANA 1.0-1600M variant
T5 XXL variant text-encoder
as used by SD35, FLUX.1, HiDream models

Important

SD.Next will auto-download Nunchaku's prequantized modules as needed on first access
Nunchaku replaces model's DiT module with a custom pre-quantized one,
Any model fine-tune will be ignored

Note

For FLUX.1 Nunchaku supports multiple base modeles:

Black-Forrest FLUX.1 Dev and Schnell
Shuttle Jaguar finetune

Notes

Warning

Nunchaku is EXPERIMENTAL and many normal features are not supported yet

Nunchaku is compatible with some advanced features like:

LoRA loading
however, Nunchaku uses custom LoRA loader so not all LoRAs may be supported
Para-attention first-block-cache
enable in Settings -> Pipeline modifiers

Unsupported and/or known limitations:

Batch size
Model unload causes memory leaks

Manual build

Install CUDA

Warning

Requires CUDA dev installation with NVCC

URL: https://developer.nvidia.com/cuda-12-6-3-download-archive

Install docs

URL: https://github.com/mit-han-lab/nunchaku/blob/main/README.md#build-from-source

Quick-steps

Note

Build process will take a while, so be patient

cd sdnext
source venv/bin/activate
cd ..
git clone https://git clone https://github.com/mit-han-lab/nunchaku
cd nunchaku
git submodule init
git submodule update
pip install torch torchvision torchaudio ninja wheel sentencepiece protobuf
python setup.py develop

Found nvcc version: 12.6.85
Detected SM targets: ['89']
running develop
...
Adding nunchaku 0.2.0+torch2.6 to easy-install.pth file
Installed /home/vlado/branches/nunchaku
Processing dependencies for nunchaku==0.2.0+torch2.6
...
Finished processing dependencies for nunchaku==0.2.0+torch2.6

python

>>> import sys
>>> import platform
>>> import torch
>>> sys.version_info
sys.version_info(major=3, minor=12, micro=3, releaselevel='final', serial=0)
>>> platform.system()
'Linux'
>>> torch.__version__
'2.6.0+cu126'
>>> torch.version.cuda
'12.6'
>>> torch.cuda.get_device_name(0)
'NVIDIA GeForce RTX 4090'
>>> import nunchaku
>>> nunchaku.__path__
['/home/vlado/dev/nunchaku/nunchaku']

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Nunchaku

Nunchaku

Install

Configure

Support

Notes

Manual build

Install CUDA

Install docs

Quick-steps

Clone this wiki locally