共 59 条
[1]
[Anonymous], 2016, arXiv
[2]
ViViT: A Video Vision Transformer
[J].
2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2021),
2021,
:6816-6826
[3]
Bertasius G, 2021, PR MACH LEARN RES, V139
[4]
Bottou Leon., 2012, NEURAL NETWORKS TRIC, P421
[5]
Quo Vadis, Action Recognition? A New Model and the Kinetics Dataset
[J].
30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017),
2017,
:4724-4733
[6]
Conceptual 12M: Pushing Web-Scale Image-Text Pre-Training To Recognize Long-Tail Visual Concepts
[J].
2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021,
2021,
:3557-3567
[7]
MM-ViT: Multi-Modal Video Transformer for Compressed Video Action Recognition
[J].
2022 IEEE WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION (WACV 2022),
2022,
:786-797
[8]
Chen R.J., 2022, P IEEE C COMPUTER VI
[10]
Deng J, 2009, PROC CVPR IEEE, P248, DOI 10.1109/CVPRW.2009.5206848