共 58 条
[21]
End-to-End Video Captioning with Multitask Reinforcement Learning
[J].
2019 IEEE WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION (WACV),
2019,
:339-348
[24]
Lin K, 2021, AAAI CONF ARTIF INTE, V35, P2047
[25]
VX2TEXT: End-to-End Learning of Video-Based Text Generation From Multimodal Inputs
[J].
2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021,
2021,
:7001-7011
[26]
Liu FL, 2021, FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, ACL-IJCNLP 2021, P281
[28]
Spatio-Temporal Graph for Video Captioning with Knowledge Distillation
[J].
2020 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2020),
2020,
:10867-10876
[29]
BLEU: a method for automatic evaluation of machine translation
[J].
40TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, PROCEEDINGS OF THE CONFERENCE,
2002,
:311-318
[30]
Radford A., 2019, Technical report, V1, P9