共 44 条
- [1] Ba J. L., 2016, arXiv, DOI DOI 10.48550/ARXIV.1607.06450
- [2] IMRAM: Iterative Matching with Recurrent Attention Memory for Cross-Modal Image-Text Retrieval [J]. 2020 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2020), 2020, : 12652 - 12660
- [3] Cho KYHY, 2014, Arxiv, DOI arXiv:1409.1259
- [4] Collobert R., 2011, BIGLEARN NIPS WORKSH
- [5] Linking Image and Text with 2-Way Nets [J]. 30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, : 1855 - 1865
- [6] Faghri F, 2018, Arxiv, DOI [arXiv:1707.05612, 10.48550/ARXIV.1707.05612]
- [7] Structured Multi-modal Feature Embedding and Alignment for Image-Sentence Retrieval [J]. PROCEEDINGS OF THE 29TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2021, 2021, : 5185 - 5193
- [8] Fast R-CNN [J]. 2015 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2015, : 1440 - 1448
- [9] Look, Imagine and Match: Improving Textual-Visual Cross-Modal Retrieval with Generative Models [J]. 2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2018, : 7181 - 7189
- [10] Deep Residual Learning for Image Recognition [J]. 2016 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2016, : 770 - 778