共 65 条
[1]
Bottom-Up and Top-Down Attention for Image Captioning and Visual Question Answering
[J].
2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR),
2018,
:6077-6086
[2]
Bahdanau D, 2016, Arxiv, DOI [arXiv:1409.0473, 10.48550/arXiv.1409.0473,1409.0473, DOI 10.48550/ARXIV.1409.0473,1409.0473]
[3]
Joint Visual Semantic Reasoning: Multi-Stage Decoder for Text Recognition
[J].
2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2021),
2021,
:14920-14929
[5]
IMRAM: Iterative Matching with Recurrent Attention Memory for Cross-Modal Image-Text Retrieval
[J].
2020 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2020),
2020,
:12652-12660
[6]
Faghri F., 2018, P BRIT MACH VIS C BM, P1
[7]
Stacked Latent Attention for Multimodal Reasoning
[J].
2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR),
2018,
:1072-1080
[8]
Frome A., 2013, P NIPS, V26
[9]
Look, Imagine and Match: Improving Textual-Visual Cross-Modal Retrieval with Generative Models
[J].
2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR),
2018,
:7181-7189
[10]
Deep Residual Learning for Image Recognition
[J].
2016 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR),
2016,
:770-778