共 172 条
[1]
Spatio-Temporal Dynamics and Semantic Attribute Enriched Visual Encoding for Video Captioning
[J].
2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019),
2019,
:12479-12488
[2]
Don't Just Assume; Look and Answer: Overcoming Priors for Visual Question Answering
[J].
2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR),
2018,
:4971-4980
[3]
Bottom-Up and Top-Down Attention for Image Captioning and Visual Question Answering
[J].
2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR),
2018,
:6077-6086
[4]
[Anonymous], 2011, NEURAL INFORM PROCES
[5]
[Anonymous], 2014, T ASSOC COMPUT LING
[6]
[Anonymous], 2011, ACL
[7]
[Anonymous], 2006, 22 INT C DAT ENG WOR, DOI [DOI 10.1109/ICDEW.2006.145, 10.1109/ICDEW.2006.145]
[8]
VQA: Visual Question Answering
[J].
2015 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV),
2015,
:2425-2433
[9]
Arik SÖ, 2017, ADV NEUR IN, V30
[10]
Arik SO, 2017, PR MACH LEARN RES, V70