共 58 条
[1]
VQA: Visual Question Answering
[J].
2015 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV),
2015,
:2425-2433
[3]
Less Is More: Picking Informative Frames for Video Captioning
[J].
COMPUTER VISION - ECCV 2018, PT XIII,
2018, 11217
:367-384
[4]
Devlin J, 2019, 2019 CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: HUMAN LANGUAGE TECHNOLOGIES (NAACL HLT 2019), VOL. 1, P4171
[5]
Garcia N., 2018, BMVC
[6]
Garcia N, 2020, AAAI CONF ARTIF INTE, V34, P10826
[7]
Unpaired Image Captioning via Scene Graph Alignments
[J].
2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019),
2019,
:10322-10331
[8]
Deep Residual Learning for Image Recognition
[J].
2016 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR),
2016,
:770-778
[9]
Hewlett D., 2017, Proceedings of the EMNLP, P2011, DOI [10.18653/v1/d17-1214, DOI 10.18653/V1]
[10]
Hu MH, 2019, 57TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2019), P2285