Skip to content

Nunchaku

Vladimir Mandic edited this page Apr 24, 2025 · 7 revisions

Nunchaku

Nunchaku is a high-performance inference engine from MIT-Han-Lab optimized for 4-bit neural networks Nunchaku uses novel 4-bit SVDQuant quantization via DeepCompressor

Nunchaku can speed up inference by 2-5x compared to standard 16-bit or 8-bit models!

Important

Nunchaku is supported only on CUDA platforms using Turing/Ampere/Ada/Blackwell GPUs
Nunchaku requires Python 3.11 or 3.12

Note

On Blackweel GPUs, Nunchaku will default to FP4 methods
while on other GPU generations it will use INT4 methods

Install

SD.Next will attempt to auto-install pre-built wheels when possible,
but if you encounter issues, see Manual build section

Configure

To enable Nunchaku support, set appropriate quantization options in Settings -> Quantization

  • Enable for modules:
    Enable or disable Nunchaku for specific modules
    Currently supports Transformers and TE
  • Nunchaku attention:
    Overrides current attention module with Nunchaku's custom fp16 attention mechanism
  • Nunchaku offloading:
    Overrides current offloading method with Nunchaku's custom offloading method

Support

At the moment, Nunchaku supports following models:

  • FLUX.1 both Dev and Schnell variants
  • SANA 1.0-1600M variant
  • T5 XXL variant text-encoder
    as used by SD35, FLUX.1, HiDream models

Important

SD.Next will auto-download Nunchaku's prequantized modules as needed on first access
Nunchaku replaces model's DiT module with a custom pre-quantized one,
Any model fine-tune will be ignored

Note

For FLUX.1 Nunchaku supports multiple base modeles:

  • Black-Forrest FLUX.1 Dev and Schnell
  • Shuttle Jaguar finetune

Notes

Warning

Nunchaku is EXPERIMENTAL and many normal features are not supported yet

Nunchaku is compatible with some advanced features like:

  • LoRA loading
    however, Nunchaku uses custom LoRA loader so not all LoRAs may be supported
  • Para-attention first-block-cache
    enable in Settings -> Pipeline modifiers

Unsupported and/or known limitations:

  • Batch size
  • Model unload causes memory leaks

Manual build

Install CUDA

Warning

Requires CUDA dev installation with NVCC

URL: https://developer.nvidia.com/cuda-12-6-3-download-archive

Install docs

URL: https://github.com/mit-han-lab/nunchaku/blob/main/README.md#build-from-source

Quick-steps

Note

Build process will take a while, so be patient

cd sdnext
source venv/bin/activate
cd ..
git clone https://git clone https://github.com/mit-han-lab/nunchaku
cd nunchaku
git submodule init
git submodule update
pip install torch torchvision torchaudio ninja wheel sentencepiece protobuf
python setup.py develop
Found nvcc version: 12.6.85
Detected SM targets: ['89']
running develop
...
Adding nunchaku 0.2.0+torch2.6 to easy-install.pth file
Installed /home/vlado/branches/nunchaku
Processing dependencies for nunchaku==0.2.0+torch2.6
...
Finished processing dependencies for nunchaku==0.2.0+torch2.6

python

>>> import sys
>>> import platform
>>> import torch
>>> sys.version_info
sys.version_info(major=3, minor=12, micro=3, releaselevel='final', serial=0)
>>> platform.system()
'Linux'
>>> torch.__version__
'2.6.0+cu126'
>>> torch.version.cuda
'12.6'
>>> torch.cuda.get_device_name(0)
'NVIDIA GeForce RTX 4090'
>>> import nunchaku
>>> nunchaku.__path__
['/home/vlado/dev/nunchaku/nunchaku']
Clone this wiki locally