共 32 条
- [1] Ballas N., 2015, arXiv
- [2] Bertasius G, 2021, Arxiv, DOI arXiv:2102.05095
- [3] Learning Spatiotemporal Features with 3D Convolutional Networks [J]. 2015 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2015, : 4489 - 4497
- [4] Fan QF, 2019, Arxiv, DOI arXiv:1912.00869
- [5] X3D: Expanding Architectures for Efficient Video Recognition [J]. 2020 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2020, : 200 - 210
- [6] SlowFast Networks for Video Recognition [J]. 2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019), 2019, : 6201 - 6210
- [7] Convolutional Two-Stream Network Fusion for Video Action Recognition [J]. 2016 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2016, : 1933 - 1941
- [9] The "something something" video database for learning and evaluating visual common sense [J]. 2017 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2017, : 5843 - 5851
- [10] Graves A, 2012, STUD COMPUT INTELL, V385, P1, DOI [10.1007/978-3-642-24797-2, 10.1162/neco.1997.9.1.1]