Cosmos World Foundation Models come in three model types which can all be customized in post-training: cosmos-predict, cosmos-transfer, and cosmos-reason:
Predict | Transfer | Reason | |
---|---|---|---|
Type | World Generation | Multi-Controlnet | Reasoning VLM |
Function | Predict novel future frames given initial frames | Transfer existing control frames into photoreal frames within a video clip | Reason against frames within a video clip |
Use Cases | Data Generation & Policy Evaluation | Data Augmentation | Data Curation |
Inputs | Text, Image, Video | Multiple Video Modalities such as RGB, Depth, Segmentation, and more. | Video & Text |
Outputs | Video | Video | Text |
Our world foundation models are purpose-built to accelerate improving performance in downstream model tasks in various stages, as illustrated here in the flywheel.