GitHub - feizc/Vespa: Video Diffusion State Space Models

Vespa🐝: Video Diffusion State Space Models

This repo contains PyTorch model definitions, pre-trained weights and training/sampling code for our paper video diffusion state space models. Our model use clip/t5 as text encoder and mamba-based diffusion model. Its distinctive advantage lies in ites reduced spatial complexity, which renders it exceptionally adept at processing long videos or high-resolution images, eliminating the necessity for window operations.

The following cases are generated by model with prompt "sad".

1. Environments

Python 3.10
- conda create -n your_env_name python=3.10
Requirements file
- pip install -r requirements.txt
Install causal_conv1d and mamba
- pip install -e causal_conv1d
- pip install -e mamba

2. Training

We provide a training script for VeSpa in train.py. This script can be used to train video diffusion state space models.

To launch DiS-M/2 (64x64) in the raw space training with N GPUs on one node:

torchrun --nnodes=1 --nproc_per_node=N train.py \
--model VeSpa-M/2 \
--model-type video \
--dataset-type ucf \
--data-path  /path/to/datat \
--anna-path /path/to/annate \
--image-size 64 \
--lr 1e-4

3. Evaluation

We include a sample.py script which samples images from a DiS model. Besides, we support other metrics evaluation, e.g., FLOPS and model parameters, in test.py script.

python sample.py \
--model VeSpa-M/2 \
--ckpt /path/to/model \
--image-size 64 \
--prompt sad

4. BibTeX

@article{FeiVespa2024,
  title={Video Diffusion State Space Models},
  author={Zhengcong Fei, Mingyuan Fan, Yujun Liu, Changqian Yu, Jusnshi Huang},
  year={2024},
  journal={arXiv preprint},
}

5. Acknowledgments

The codebase is based on the awesome DiS, DiT, mamba, U-ViT, and Vim repos.

Name		Name	Last commit message	Last commit date
Latest commit History 31 Commits
causal-conv1d		causal-conv1d
diffusion		diffusion
mamba		mamba
scripts		scripts
tools		tools
LICENSE		LICENSE
README.md		README.md
accelerate_train.py		accelerate_train.py
clip.py		clip.py
models_vespa.py		models_vespa.py
parallel_train.sh		parallel_train.sh
requirements.txt		requirements.txt
sample.py		sample.py
t5.py		t5.py
test.py		test.py
train.py		train.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Vespa🐝: Video Diffusion State Space Models

1. Environments

2. Training

3. Evaluation

4. BibTeX

5. Acknowledgments

About

Releases

Packages

Languages

License

feizc/Vespa

Folders and files

Latest commit

History

Repository files navigation

Vespa🐝: Video Diffusion State Space Models

1. Environments

2. Training

3. Evaluation

4. BibTeX

5. Acknowledgments

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages