共 82 条
- [11] From Images to Textual Prompts: Zero-shot Visual Question Answering with Frozen Large Language Models [J]. 2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2023, : 10867 - 10877
- [12] Guo ZR, 2024, Arxiv, DOI arXiv:2407.05374
- [13] Han W, 2021, PROCEEDINGS OF THE 2021 INTERNATIONAL CONFERENCE ON MULTIMODAL INTERACTION, ICMI 2021, P6, DOI 10.1145/3462244.3479919
- [14] Hu Yifan, 2024, Multimedia Tools Appl., P1
- [15] Huan Ruohong, 2023, IEEE Trans. Multimed.
- [16] Incomplete Cross-modal Retrieval with Dual-Aligned Variational Autoencoders [J]. MM '20: PROCEEDINGS OF THE 28TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, 2020, : 3283 - 3291
- [17] Gradient-based learning applied to document recognition [J]. PROCEEDINGS OF THE IEEE, 1998, 86 (11) : 2278 - 2324
- [18] Stacked Cross Attention for Image-Text Matching [J]. COMPUTER VISION - ECCV 2018, PT IV, 2018, 11208 : 212 - 228
- [19] Multimodal Prompting with Missing Modalities for Visual Recognition [J]. 2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2023, : 14943 - 14952
- [20] Lester B, 2022, Arxiv, DOI arXiv:2208.05577