An enhanced implementation of "Packing Input Frame Context in Next-Frame Prediction Models for Video Generation", optimized for cloud deployment and with extended controls for precise video generation.
Links: Paper, Project Page
FramePack uses a next-frame prediction approach that generates videos progressively:
- Constant Memory Usage: Compresses input contexts to a fixed size, making VRAM usage invariant to video length
- Progressive Generation: Create minute-long videos even on laptop GPUs (minimum 6GB VRAM)
- Immediate Visual Feedback: See frames as they're generated rather than waiting for the entire video
- Section Controls: Precisely control different segments of your video with custom prompts and reference images
This fork extends the original FramePack with several major improvements:
- RunPod Integration: Fully optimized deployment to RunPod with a comprehensive deployment guide
- Secure SSH Access: Enhanced security through SSH port forwarding without exposing endpoints
- Multi-Section Video Generation: Define different prompts and reference images for each section of your video
- Progressive Transition System: Seamless transitions between different video segments
- Intuitive UI: User-friendly interface for defining section settings with visual feedback
- Improved Parameter Organization: Better organized settings with logical grouping
- Parameter Persistence: Save your favorite settings for future use
- Performance Optimizations: Options for TeaCache, FP8 quantization, and memory management
- LoRA Support: Easily apply LoRA models to customize generation style
- End Frame Control: Define both start and end frames for your video
- Camera Movement Controls: Adjust the level of camera motion in your generations
- Video Compression Settings: Fine-tune output quality with customizable compression
Requirements:
- Nvidia GPU in RTX 30XX, 40XX, 50XX series that supports fp16 and bf16. The GTX 10XX/20XX are not tested.
- Linux or Windows operating system.
- At least 6GB GPU memory.
To generate 1-minute video (60 seconds) at 30fps (1800 frames) using 13B model, the minimal required GPU memory is 6GB. (Yes 6 GB, not a typo. Laptop GPUs are okay.)
Windows:
Installing on Windows requires a few extra steps, especially for SageAttention:
-
Basic Installation:
We recommend having an independent Python 3.11 installation.
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu126 pip install -r requirements.txt
-
Installing SageAttention on Windows:
SageAttention requires Triton, which traditionally has been challenging to install on Windows. There are two options for installing SageAttention:
SageAttention 2.1.1
a. Install Triton for Windows:
- Use the Windows-compatible fork of Triton:
pip install https://github.com/woct0rdho/triton-windows/releases/download/v3.2.0-windows.post10/triton-3.2.0-cp311-cp311-win_amd64.whl
(Choose the correct wheel for your Python version)
b. Install SageAttention 2.1.1:
- Once Triton is installed, you can install SageAttention:
- For Python 3.11 with PyTorch 2.6.0 (CUDA 12.6), you can directly install the prebuilt wheel:
pip install https://github.com/woct0rdho/SageAttention/releases/download/v2.1.1-windows/sageattention-2.1.1+cu126torch2.6.0-cp311-cp311-win_amd64.whl
-
Starting the Application:
python app.py
Note that it supports
--share
,--port
,--server
, and so on.
The software now prioritizes SageAttention when available, falling back to PyTorch native attention when not available. This optimization improves performance while maintaining quality.
Linux:
We recommend having an independent Python 3.11.
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu126
pip install -r requirements.txt
To start the GUI, run:
python app.py
Note that it supports --share
, --port
, --server
, and so on.
You can install attention kernels for improved performance:
- SageAttention 2.1.1: For Linux with PyTorch 2.6.0, check the latest builds at: https://github.com/thuml/SageAttention/releases
For deploying to cloud platforms like RunPod, refer to the RunPod Deployment Guide which provides step-by-step instructions for:
- Setting up a secure RunPod environment
- Configuring SSH for secure access
- Using port forwarding for private access to the application
- Managing workspace directories and caching
The section controls feature allows for precise control over different segments of your generated video:
-
Reverse Index Mapping: Since FramePack generates videos in reverse (end → start), the section numbering is mapped accordingly:
- UI Section 0: Start of video (Generated Last)
- UI Section 1: Middle of video (Generated Second)
- UI Section 2: End of video (Generated First)
-
Section-Specific Prompts: Each section can have its own prompt that guides that segment's generation
- Valid keys for each section are mapped based on the generation order
- Prompts cascade forward - if a section has no specific prompt, it uses the previous section's
-
Section-Specific Images: Each section can have a distinct reference image
- Similar to prompts, images cascade forward through sections
- This allows for visual transitions between different scenes or subjects
-
Progressive Generation: The system generates sections from the end of the video toward the beginning
- Each new section uses the growing history buffer of previously generated frames
- This ensures temporal consistency while allowing creative direction changes
- Start with TeaCache On: Use TeaCache for faster iterations while exploring ideas
- Final Renders: Disable TeaCache for your final high-quality renders
- Section Controls: Remember that section 0 appears at the start of the video, but is generated last
- Smooth Transitions: For smoother transitions, use similar prompts or images between adjacent sections
- Memory Management: Adjust GPU memory preservation if you encounter out-of-memory errors
- Start Simple: Begin with shorter videos to understand the system before generating minute-long content
Many more examples are in Project Page.
To build the Docker image locally:
docker build -t wongfei2009/framepack:latest -f Dockerfile .
After building, you can push the image to Docker Hub:
docker push wongfei2009/framepack:latest
To run the FramePack container:
docker run -d --name framepack \
--gpus all \
-p 7860:7860 \
-v /path/to/your/models:/workspace/framepack/local_models \
-v /path/to/your/outputs:/workspace/framepack/outputs \
-v /path/to/your/cache:/workspace/framepack/huggingface_cache \
wongfei2009/framepack:latest
This will:
- Use all available GPUs
- Expose the Gradio interface on port 7860
- Mount three volumes for models, outputs, and cache
- Run the container in detached mode
You can then access the web interface at http://localhost:7860