共 61 条
[1]
ViViT: A Video Vision Transformer
[J].
2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2021),
2021,
:6816-6826
[2]
Brown TB, 2020, ADV NEUR IN, V33
[3]
Emerging Properties in Self-Supervised Vision Transformers
[J].
2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2021),
2021,
:9630-9640
[4]
Quo Vadis, Action Recognition? A New Model and the Kinetics Dataset
[J].
30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017),
2017,
:4724-4733
[6]
UNITER: UNiversal Image-TExt Representation Learning
[J].
COMPUTER VISION - ECCV 2020, PT XXX,
2020, 12375
:104-120
[7]
Chen YZ, 2023, AAAI CONF ARTIF INTE, P396
[8]
Dass SDS, 2024, Arxiv, DOI arXiv:2404.06243
[9]
TimeBalance: Temporally-Invariant and Temporally-Distinctive Video Representations for Semi-Supervised Action Recognition
[J].
2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR,
2023,
:2341-2352
[10]
Devlin J, 2019, 2019 CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: HUMAN LANGUAGE TECHNOLOGIES (NAACL HLT 2019), VOL. 1, P4171