共 26 条
[1]
Bottom-Up and Top-Down Attention for Image Captioning and Visual Question Answering
[J].
2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR),
2018,
:6077-6086
[3]
Faghri F., ARXIV PREPRINT ARXIV
[4]
Deep Residual Learning for Image Recognition
[J].
2016 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR),
2016,
:770-778
[5]
Learning Semantic Concepts and Order for Image and Sentence Matching
[J].
2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR),
2018,
:6163-6171
[6]
Instance-aware Image and Sentence Matching with Selective Multimodal LSTM
[J].
30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017),
2017,
:7254-7262
[7]
Revisiting Visual Question Answering Baselines
[J].
COMPUTER VISION - ECCV 2016, PT VIII,
2016, 9912
:727-739
[8]
Karpathy A, 2015, PROC CVPR IEEE, P3128, DOI 10.1109/CVPR.2015.7298932
[9]
Stacked Cross Attention for Image-Text Matching
[J].
COMPUTER VISION - ECCV 2018, PT IV,
2018, 11208
:212-228
[10]
Visual Semantic Reasoning for Image-Text Matching
[J].
2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019),
2019,
:4653-4661