共 52 条
[1]
[Anonymous], CVPR, DOI DOI 10.1007/S00467-024-06571-7
[2]
[Anonymous], 2011, P 49 ANN M ASS COMPU
[3]
Banerjee Satanjeev, 2005, P ACL WORKSHOP INTRI
[4]
Quo Vadis, Action Recognition? A New Model and the Kinetics Dataset
[J].
30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017),
2017,
:4724-4733
[5]
Chen M., Tvt: Twoview transformer network for video captioning, P847
[6]
Chen X., 2015, arXiv
[7]
Less Is More: Picking Informative Frames for Video Captioning
[J].
COMPUTER VISION - ECCV 2018, PT XIII,
2018, 11217
:367-384
[8]
Deng J, 2009, PROC CVPR IEEE, P248, DOI 10.1109/CVPRW.2009.5206848
[9]
Learning Spatiotemporal Features with 3D Convolutional Networks
[J].
2015 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV),
2015,
:4489-4497
[10]
Fan L., 2019, ARXIV