SimMMDG: A Simple and Effective Framework for Multi-modal Domain Generalization

Hao Dong¹ Ismail Nejjar² Han Sun² Eleni Chatzi¹ Olga Fink²

¹ETH Zurich, ²EPFL

• NeurIPS 2023 •

Overview of SimMMDG. We split the features of each modality into modality-specific and modality-shared parts. For the modality-shared part, we use supervised contrastive learning to map the features with the same label to be as close as possible. For modality-specific features, we use a distance loss to encourage them to be far from modality-shared features, promoting diversity within each modality. Additionally, we introduce a cross-modal translation module that regularizes features and enhances generalization across missing modalities.

Update: We have a new survey paper on Multimodal Adaptation and Generalization

Code

The code was tested using Python 3.10.4, torch 1.11.0+cu113 and NVIDIA GeForce RTX 3090.

Environments:

mmcv-full 1.2.7
mmaction2 0.13.0

EPIC-Kitchens Dataset

Prepare

Download Pretrained Weights

Download Audio model link, rename it as vggsound_avgpool.pth.tar and place under the EPIC-rgb-flow-audio/pretrained_models directory
Download SlowFast model for RGB modality link and place under the EPIC-rgb-flow-audio/pretrained_models directory
Download SlowOnly model for Flow modality link and place under the EPIC-rgb-flow-audio/pretrained_models directory

Download EPIC-Kitchens Dataset

bash download_script.sh

Download Audio files EPIC-KITCHENS-audio.zip.

Unzip all files and the directory structure should be modified to match:

Click for details...

├── MM-SADA_Domain_Adaptation_Splits
├── rgb
|   ├── train
|   |   ├── D1
|   |   |   ├── P08_01.wav
|   |   |   ├── P08_01
|   |   |   |     ├── frame_0000000000.jpg
|   |   |   |     ├── ...
|   |   |   ├── P08_02.wav
|   |   |   ├── P08_02
|   |   |   ├── ...
|   |   ├── D2
|   |   ├── D3
|   ├── test
|   |   ├── D1
|   |   ├── D2
|   |   ├── D3


├── flow
|   ├── train
|   |   ├── D1
|   |   |   ├── P08_01 
|   |   |   |   ├── u
|   |   |   |   |   ├── frame_0000000000.jpg
|   |   |   |   |   ├── ...
|   |   |   |   ├── v
|   |   |   ├── P08_02
|   |   |   ├── ...
|   |   ├── D2
|   |   ├── D3
|   ├── test
|   |   ├── D1
|   |   ├── D2
|   |   ├── D3

Video and Audio

Click for details...

cd EPIC-rgb-flow-audio

python train_video_flow_audio_EPIC_SimMMDG.py --use_video --use_audio -s D2 D3 -t D1 --lr 1e-4 --bsz 16 --nepochs 20 --datapath /path/to/EPIC-KITCHENS/

python train_video_flow_audio_EPIC_SimMMDG.py --use_video --use_audio -s D1 D3 -t D2 --lr 1e-4 --bsz 16 --nepochs 20 --datapath /path/to/EPIC-KITCHENS/

python train_video_flow_audio_EPIC_SimMMDG.py --use_video --use_audio -s D1 D2 -t D3 --lr 1e-4 --bsz 16 --nepochs 25 --datapath /path/to/EPIC-KITCHENS/

Video and Flow

Click for details...

cd EPIC-rgb-flow-audio

python train_video_flow_audio_EPIC_SimMMDG.py --use_video --use_flow -s D2 D3 -t D1 --lr 1e-4 --bsz 16 --nepochs 15 --datapath /path/to/EPIC-KITCHENS/

python train_video_flow_audio_EPIC_SimMMDG.py --use_video --use_flow -s D1 D3 -t D2 --lr 1e-4 --bsz 16 --nepochs 15 --datapath /path/to/EPIC-KITCHENS/

python train_video_flow_audio_EPIC_SimMMDG.py --use_video --use_flow -s D1 D2 -t D3 --lr 1e-4 --bsz 16 --nepochs 15 --datapath /path/to/EPIC-KITCHENS/

Flow and Audio

Click for details...

cd EPIC-rgb-flow-audio

python train_video_flow_audio_EPIC_SimMMDG.py --use_flow --use_audio -s D2 D3 -t D1 --lr 1e-4 --bsz 16 --nepochs 10 --datapath /path/to/EPIC-KITCHENS/

python train_video_flow_audio_EPIC_SimMMDG.py --use_flow --use_audio -s D1 D3 -t D2 --lr 1e-4 --bsz 16 --nepochs 20 --datapath /path/to/EPIC-KITCHENS/

python train_video_flow_audio_EPIC_SimMMDG.py --use_flow --use_audio -s D1 D2 -t D3 --lr 1e-4 --bsz 16 --nepochs 20 --datapath /path/to/EPIC-KITCHENS/

Video and Flow and Audio

Click for details...

cd EPIC-rgb-flow-audio

python train_video_flow_audio_EPIC_SimMMDG.py --use_video --use_flow --use_audio -s D2 D3 -t D1 --lr 1e-4 --bsz 16 --nepochs 10 --trans_hidden_num 1024 --datapath /path/to/EPIC-KITCHENS/

python train_video_flow_audio_EPIC_SimMMDG.py --use_video --use_flow --use_audio -s D1 D3 -t D2 --lr 1e-4 --bsz 16 --nepochs 20 --datapath /path/to/EPIC-KITCHENS/

python train_video_flow_audio_EPIC_SimMMDG.py --use_video --use_flow --use_audio -s D1 D2 -t D3 --lr 1e-4 --bsz 16 --nepochs 15 --alpha_trans 1.0 --datapath /path/to/EPIC-KITCHENS/

HAC Dataset

This dataset can be downloaded at link.

Unzip all files and the directory structure should be modified to match:

Click for details...

HAC
├── human
|   ├── videos
|   |   ├── ...
|   ├── flow
|   |   ├── ...
|   ├── audio
|   |   ├── ...

├── animal
|   ├── videos
|   |   ├── ...
|   ├── flow
|   |   ├── ...
|   ├── audio
|   |   ├── ...

├── cartoon
|   ├── videos
|   |   ├── ...
|   ├── flow
|   |   ├── ...
|   ├── audio
|   |   ├── ...

Download the pretrained weights similar to EPIC-Kitchens Dataset and put under the HAC-rgb-flow-audio/pretrained_models directory.

Video and Audio

Click for details...

cd HAC-rgb-flow-audio

python train_video_flow_audio_HAC_SimMMDG.py --use_video --use_audio -s 'animal' 'cartoon' -t 'human' --lr 1e-4 --bsz 16 --nepochs 10 --datapath /path/to/HAC/

python train_video_flow_audio_HAC_SimMMDG.py --use_video --use_audio -s 'human' 'cartoon' -t 'animal' --lr 1e-4 --bsz 16 --nepochs 10 --datapath /path/to/HAC/

python train_video_flow_audio_HAC_SimMMDG.py --use_video --use_audio -s 'human' 'animal' -t 'cartoon' --lr 1e-4 --bsz 16 --nepochs 10 --datapath /path/to/HAC/

Video and Flow

Click for details...

cd HAC-rgb-flow-audio

python train_video_flow_audio_HAC_SimMMDG.py --use_video --use_flow -s 'animal' 'cartoon' -t 'human' --lr 1e-4 --bsz 16 --nepochs 20 --datapath /path/to/HAC/

python train_video_flow_audio_HAC_SimMMDG.py --use_video --use_flow -s 'human' 'cartoon' -t 'animal' --lr 1e-4 --bsz 16 --nepochs 20 --datapath /path/to/HAC/

python train_video_flow_audio_HAC_SimMMDG.py --use_video --use_flow -s 'human' 'animal' -t 'cartoon' --lr 1e-4 --bsz 16 --nepochs 20 --datapath /path/to/HAC/

Flow and Audio

Click for details...

cd HAC-rgb-flow-audio

python train_video_flow_audio_HAC_SimMMDG.py --use_flow --use_audio -s 'animal' 'cartoon' -t 'human' --lr 1e-4 --bsz 16 --nepochs 15 --datapath /path/to/HAC/

python train_video_flow_audio_HAC_SimMMDG.py --use_flow --use_audio -s 'human' 'cartoon' -t 'animal' --lr 1e-4 --bsz 16 --nepochs 15 --datapath /path/to/HAC/

python train_video_flow_audio_HAC_SimMMDG.py --use_flow --use_audio -s 'human' 'animal' -t 'cartoon' --lr 1e-4 --bsz 16 --nepochs 20 --datapath /path/to/HAC/

Video and Flow and Audio

Click for details...

cd HAC-rgb-flow-audio

python train_video_flow_audio_HAC_SimMMDG.py --use_video --use_flow --use_audio -s 'animal' 'cartoon' -t 'human' --lr 1e-4 --bsz 16 --nepochs 15 --datapath /path/to/HAC/

python train_video_flow_audio_HAC_SimMMDG.py --use_video --use_flow --use_audio -s 'human' 'cartoon' -t 'animal' --lr 1e-4 --bsz 16 --nepochs 10 --datapath /path/to/HAC/

python train_video_flow_audio_HAC_SimMMDG.py --use_video --use_flow --use_audio -s 'human' 'animal' -t 'cartoon' --lr 1e-4 --bsz 16 --nepochs 15 --datapath /path/to/HAC/

Contact

If you have any questions, please send an email to [email protected]

Citation

If you find our work useful in your research please consider citing our paper:

@inproceedings{dong2023SimMMDG,
    title={Sim{MMDG}: A Simple and Effective Framework for Multi-modal Domain Generalization},
    author={Dong, Hao and Nejjar, Ismail and Sun, Han and Chatzi, Eleni and Fink, Olga},
    booktitle={Advances in Neural Information Processing Systems (NeurIPS)},
    year={2023}
}

Related Projects

MOOSA: Towards Multimodal Open-Set Domain Generalization and Adaptation through Self-supervision

AEO: Towards Robust Multimodal Open-set Test-time Adaptation via Adaptive Entropy-aware Optimization

Survey: Advances in Multimodal Adaptation and Generalization: From Traditional Approaches to Foundation Models

MultiOOD: Scaling Out-of-Distribution Detection for Multiple Modalities

Acknowledgement

Many thanks to the open-source project DomainAdaptation.

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
EPIC-rgb-flow-audio		EPIC-rgb-flow-audio
HAC-rgb-flow-audio		HAC-rgb-flow-audio
dataset		dataset
imgs		imgs
README.md		README.md
download_script.sh		download_script.sh
generate_sound_files.py		generate_sound_files.py
requirement.txt		requirement.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

SimMMDG: A Simple and Effective Framework for Multi-modal Domain Generalization

• NeurIPS 2023 •

Code

EPIC-Kitchens Dataset

Prepare

Download Pretrained Weights

Download EPIC-Kitchens Dataset

Video and Audio

Video and Flow

Flow and Audio

Video and Flow and Audio

HAC Dataset

Video and Audio

Video and Flow

Flow and Audio

Video and Flow and Audio

Contact

Citation

Related Projects

Acknowledgement

About

Releases

Packages

Languages

donghao51/SimMMDG

Folders and files

Latest commit

History

Repository files navigation

SimMMDG: A Simple and Effective Framework for Multi-modal Domain Generalization

• NeurIPS 2023 •

Code

EPIC-Kitchens Dataset

Prepare

Download Pretrained Weights

Download EPIC-Kitchens Dataset

Video and Audio

Video and Flow

Flow and Audio

Video and Flow and Audio

HAC Dataset

Video and Audio

Video and Flow

Flow and Audio

Video and Flow and Audio

Contact

Citation

Related Projects

Acknowledgement

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages