共 65 条
[1]
Bottom-Up and Top-Down Attention for Image Captioning and Visual Question Answering
[J].
2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR),
2018,
:6077-6086
[2]
SPICE: Semantic Propositional Image Caption Evaluation
[J].
COMPUTER VISION - ECCV 2016, PT V,
2016, 9909
:382-398
[3]
[Anonymous], 2018, P EUR C COMP VIS ECC, DOI DOI 10.3892/MMR.2018.9013
[4]
[Anonymous], 2020, ARXIV200414255, DOI DOI 10.1145/3397271.3401093
[5]
Picture it in your mind: generating high level visual representations from textual descriptions
[J].
INFORMATION RETRIEVAL JOURNAL,
2018, 21 (2-3)
:208-229
[6]
IMRAM: Iterative Matching with Recurrent Attention Memory for Cross-Modal Image-Text Retrieval
[J].
2020 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2020),
2020,
:12652-12660
[7]
Chen Y, 2019, arXiv
[8]
Show, Control and Tell: A Framework for Generating Controllable and Grounded Captions
[J].
2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019),
2019,
:8299-8308
[9]
Devlin J, 2019, 2019 CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: HUMAN LANGUAGE TECHNOLOGIES (NAACL HLT 2019), VOL. 1, P4171
[10]
Linking Image and Text with 2-Way Nets
[J].
30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017),
2017,
:1855-1865