-
-
Notifications
You must be signed in to change notification settings - Fork 472
Nunchaku
Nunchaku is a high-performance inference engine from MIT-Han-Lab optimized for 4-bit neural networks Nunchaku uses novel 4-bit SVDQuant quantization via DeepCompressor
Nunchaku can speed up inference by 2-5x compared to standard 16-bit or 8-bit models!
Important
Nunchaku is supported only on CUDA platforms using Turing/Ampere/Ada/Blackwell GPUs
Nunchaku requires Python 3.11 or 3.12
Note
On Blackweel GPUs, Nunchaku will default to FP4
methods
while on other GPU generations it will use INT4
methods
SD.Next will attempt to auto-install pre-built wheels when possible,
but if you encounter issues, see Manual build section
To enable Nunchaku support, set appropriate quantization options in Settings -> Quantization
-
Enable for modules:
Enable or disable Nunchaku for specific modules
Currently supports Transformers and TE -
Nunchaku attention:
Overrides current attention module with Nunchaku's custom fp16 attention mechanism -
Nunchaku offloading:
Overrides current offloading method with Nunchaku's custom offloading method
At the moment, Nunchaku supports following models:
- FLUX.1 both Dev and Schnell variants
- SANA 1.0-1600M variant
-
T5 XXL variant text-encoder
as used by SD35, FLUX.1, HiDream models
Important
SD.Next will auto-download Nunchaku's prequantized modules as needed on first access
Nunchaku replaces model's DiT module with a custom pre-quantized one,
Any model fine-tune will be ignored
Note
For FLUX.1 Nunchaku supports multiple base modeles:
- Black-Forrest FLUX.1 Dev and Schnell
- Shuttle Jaguar finetune
Warning
Nunchaku is EXPERIMENTAL and many normal features are not supported yet
Nunchaku is compatible with some advanced features like:
-
LoRA loading
however, Nunchaku uses custom LoRA loader so not all LoRAs may be supported -
Para-attention first-block-cache
enable in Settings -> Pipeline modifiers
Unsupported and/or known limitations:
- Batch size
- Model unload causes memory leaks
Warning
Requires CUDA
dev installation with NVCC
URL: https://developer.nvidia.com/cuda-12-6-3-download-archive
URL: https://github.com/mit-han-lab/nunchaku/blob/main/README.md#build-from-source
Note
Build process will take a while, so be patient
cd sdnext
source venv/bin/activate
cd ..
git clone https://git clone https://github.com/mit-han-lab/nunchaku
cd nunchaku
git submodule init
git submodule update
pip install torch torchvision torchaudio ninja wheel sentencepiece protobuf
python setup.py develop
Found nvcc version: 12.6.85
Detected SM targets: ['89']
running develop
...
Adding nunchaku 0.2.0+torch2.6 to easy-install.pth file
Installed /home/vlado/branches/nunchaku
Processing dependencies for nunchaku==0.2.0+torch2.6
...
Finished processing dependencies for nunchaku==0.2.0+torch2.6
python
>>> import sys
>>> import platform
>>> import torch
>>> sys.version_info
sys.version_info(major=3, minor=12, micro=3, releaselevel='final', serial=0)
>>> platform.system()
'Linux'
>>> torch.__version__
'2.6.0+cu126'
>>> torch.version.cuda
'12.6'
>>> torch.cuda.get_device_name(0)
'NVIDIA GeForce RTX 4090'
>>> import nunchaku
>>> nunchaku.__path__
['/home/vlado/dev/nunchaku/nunchaku']