共 39 条
- [1] Ba Jimmy Lei, 2016, LAYER NORMALIZATION, DOI 10.48550/arXiv.1607.06450
- [2] Quo Vadis, Action Recognition? A New Model and the Kinetics Dataset [J]. 30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, : 4724 - 4733
- [3] Deng J, 2009, PROC CVPR IEEE, P248, DOI 10.1109/CVPRW.2009.5206848
- [4] Devlin J, 2019, 2019 CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: HUMAN LANGUAGE TECHNOLOGIES (NAACL HLT 2019), VOL. 1, P4171
- [5] Learning Spatiotemporal Features with 3D Convolutional Networks [J]. 2015 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2015, : 4489 - 4497
- [6] Escorcia Victor, 2019, ARXIV190712763
- [7] SlowFast Networks for Video Recognition [J]. 2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019), 2019, : 6201 - 6210
- [8] TALL: Temporal Activity Localization via Language Query [J]. 2017 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2017, : 5277 - 5285
- [9] Deep Residual Learning for Image Recognition [J]. 2016 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2016, : 770 - 778
- [10] Localizing Moments in Video with Natural Language [J]. 2017 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2017, : 5804 - 5813