共 62 条
[1]
Bottom-Up and Top-Down Attention for Image Captioning and Visual Question Answering
[J].
2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR),
2018,
:6077-6086
[2]
SPICE: Semantic Propositional Image Caption Evaluation
[J].
COMPUTER VISION - ECCV 2016, PT V,
2016, 9909
:382-398
[3]
Ankur B., 2018, ARXIV PREPRINT ARXIV
[4]
[Anonymous], 2017, IEEE T NEURAL NETW L, DOI DOI 10.1109/TNNLS.2016.2636185
[5]
[Anonymous], 2017, ARXIV170403162
[6]
VQA: Visual Question Answering
[J].
2015 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV),
2015,
:2425-2433
[7]
Ba Jimmy Lei, 2016, Layer normalization
[8]
Banerjee S., 2005, IEEMMT, P65
[9]
Battaglia P. W., 2018, ARXIV
[10]
Ben-Younes H, 2019, AAAI CONF ARTIF INTE, P8102