This repo contains the codebase of a research project focused on adapting vision-language models like CLIP to downstream datasets via prompt learning:
- [PRE: Vision-Language Prompt Learning with Reparameterization Encoder], 2023.
We introduce Prompt Learning with Reparameterization Encoder (PRE) - a simple and efficient method that enhances the generalization ability of the learnable prompt to unseen classes while maintaining the capacity to learn Base classes. Instead of directly optimizing the prompts, PRE employs a prompt encoder to reparameterize the input prompt embeddings, enhancing the exploration of task-specific knowledge from few-shot samples. Extensive evaluation shows that PRE is an efficient method, i.e., achieves better performance within good training time.

Methods | Prompts | Base | New | H | Training-time |
---|---|---|---|---|---|
CLIP | hand-crafted | 68.81 | 74.43 | 71.42 | - |
CoOp | textual | 83.32 | 66.92 | 73.34 | 6ms/image |
ProGrad | textual | 82.96 | 70.30 | 75.58 | 22ms/image |
CoCoOp | textual+visual | 80.89 | 70.99 | 74.47 | 160ms/image |
PRE | textual | 82.14 | 71.88 | 76.27 | 6.3ms/image |
This code is built on top of the awesome toolbox Dassl.pytorch so you need to install the dassl
environment first. Simply follow the instructions described here to install dassl
as well as PyTorch. After that, run pip install -r requirements.txt
under PRE/
to install a few more packages required by CLIP (this should be done when dassl
is activated). Then, you are ready to go.
Follow DATASETS.md to install the datasets.
Click a paper below to see the detailed instructions on how to run the code on CoOp and CoCoOp to reproduce the results.
- Learning to Prompt for Vision-Language Models
- Conditional Prompt Learning for Vision-Language Models
Follow PRE.md to see the detailed instructions on how to run the PRE method to reproduce the results.
-
The raw numerical results for PRE can be found at this google drive link.
-
The pre-trained weights of PRE (M=4) on Caltech101, OxfordPets, OxfordFlowers, DTD, EuroSAT, FGVC-Aircraft, and Stanford_Cars based on ViT-B/16 can be downloaded altogether on the /output folder in this GitHub project. The weights can be used to reproduce the results in Table 2 of PRE's paper (i.e., the results on all evaluated datasets). To load the weights and run the evaluation code, you will need to specify
--model-dir
and--load-epoch
- run the base2new_test.sh file in /scripts folder.