共 47 条
[41]
Sequence to Sequence - Video to Text
[J].
2015 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV),
2015,
:4534-4542
[42]
Reconstruction Network for Video Captioning
[J].
2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR),
2018,
:7622-7631
[43]
M3: Multimodal Memory Modelling for Video Captioning
[J].
2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR),
2018,
:7512-7520
[44]
Video Captioning via Hierarchical Reinforcement Learning
[J].
2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR),
2018,
:4213-4222
[45]
Interpretable Video Captioning via Trajectory Structured Localization
[J].
2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR),
2018,
:6829-6837
[46]
MSR-VTT: A Large Video Description Dataset for Bridging Video and Language
[J].
2016 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR),
2016,
:5288-5296
[47]
Yao T., 2018, P EUR C COMP VIS ECC, P684