共 36 条
[21]
Lei Jie, 2019, Tvqa+: Spatio-temporal grounding for video question answering
[23]
Lu JS, 2016, ADV NEUR IN, V29
[24]
Ma SM, 2019, AAAI CONF ARTIF INTE, P6810
[25]
BLEU: a method for automatic evaluation of machine translation
[J].
40TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, PROCEEDINGS OF THE CONFERENCE,
2002,
:311-318
[26]
Sutskever I, 2014, ADV NEUR IN, V27
[27]
COIN: A Large-scale Dataset for Comprehensive Instructional Video Analysis
[J].
2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019),
2019,
:1207-1216
[28]
Vedantam R, 2015, PROC CVPR IEEE, P4566, DOI 10.1109/CVPR.2015.7299087
[29]
Vinyals O, 2015, PROC CVPR IEEE, P3156, DOI 10.1109/CVPR.2015.7298935