共 20 条
- [1] HE K M, ZHANG X Y, REN S Q, Et al., Deep residual learning for image recognition, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770-778, (2016)
- [2] DOSOVITSKIY A, BEYER L, KOLESNIKOV A, Et al., An image is worth 16 伊16 words: transformers for image recognition at scale, 9th International Conference on Learning Representations, pp. 1-12, (2021)
- [3] RADFORD A, KIM J W, HALLACY C, Et al., Learning transferable visual models from natural language supervision[J / OL], Proceedings of the 38th International Conference on Machine Learning, pp. 8748-8763, (2021)
- [4] LU H Y, FEI N Y, HUO Y Q, Et al., COTS: collaborative two-stream vision-language pre-training model for cross-modal retrieval, 2022 IEEE/ CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 15671-15680, (2022)
- [5] SCHICK T, SCH? TZE H., Exploiting cloze-questions for few-shot text classification and natural language inference, Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Main Volume, pp. 255-269, (2021)
- [6] SHIN T, RAZEGHI Y, LOGAN R L, Et al., Auto-Prompt: eliciting knowledge from language models with automatically generated prompts, Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pp. 4222-4235, (2020)
- [7] ZHOU K Y, YANG J K, LOY C C, Et al., Learning to prompt for vision-language models, International Journal of Computer Vision, 130, 9, pp. 2337-2348, (2022)
- [8] LESTER B, AL-RFOU R, CONSTANT N., The power of scale for parameter-efficient prompt tuning, Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, pp. 3045-3059, (2021)
- [9] VASWANI A, SHAZEER N, PARMAR N, Et al., Attention is all You need, Proceedings of the 31st International Conference on Neural Information Processing Systems, pp. 6000-6010, (2017)
- [10] JIA M L, TANG L M, CHEN B C, Et al., Visual prompt tuning, European Conference on Computer Vision, pp. 709-727, (2022)