共 27 条
[1]
ViViT: A Video Vision Transformer
[J].
2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2021),
2021,
:6816-6826
[2]
Bertasius G, 2021, PR MACH LEARN RES, V139
[3]
Carreira J., 2018, arXiv
[4]
Quo Vadis, Action Recognition? A New Model and the Kinetics Dataset
[J].
30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017),
2017,
:4724-4733
[5]
PoTion: Pose MoTion Representation for Action Recognition
[J].
2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR),
2018,
:7024-7033
[6]
Multiscale Vision Transformers
[J].
2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2021),
2021,
:6804-6815
[7]
The "something something" video database for learning and evaluating visual common sense
[J].
2017 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV),
2017,
:5843-5851
[8]
Single-Stage is Enough: Multi-Person Absolute 3D Pose Estimation
[J].
2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR),
2022,
:13076-13085