共 36 条
[1]
Bottom-Up and Top-Down Attention for Image Captioning and Visual Question Answering
[J].
2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR),
2018,
:6077-6086
[2]
Chen TL, 2020, Arxiv, DOI arXiv:2002.08510
[3]
"Factual" or "Emotional": Stylized Image Captioning with Adaptive Learning and Attention
[J].
COMPUTER VISION - ECCV 2018, PT X,
2018, 11214
:527-543
[4]
Beyond triplet loss: a deep quadruplet network for person re-identification
[J].
30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017),
2017,
:1320-1329
[5]
Linking Image and Text with 2-Way Nets
[J].
30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017),
2017,
:1855-1865
[6]
Faghri F, 2018, Arxiv, DOI [arXiv:1707.05612, DOI 10.48550/ARXIV.1707.05612]
[7]
Frome A., 2013, Advances in neural information processing systems, V26, P2121
[8]
Deep Residual Learning for Image Recognition
[J].
2016 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR),
2016,
:770-778
[9]
ACMM: Aligned Cross-Modal Memory for Few-Shot Image and Sentence Matching
[J].
2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019),
2019,
:5773-5782
[10]
Instance-aware Image and Sentence Matching with Selective Multimodal LSTM
[J].
30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017),
2017,
:7254-7262