共 54 条
[1]
Abnar S, 2020, 58TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2020), P4190
[2]
Abu-El-Haija S., 2016, arXiv
[3]
ViViT: A Video Vision Transformer
[J].
2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2021),
2021,
:6816-6826
[4]
Ba J. L., 2016, arXiv, DOI 10.48550/arXiv:1607.06450
[6]
Bertasius G, 2021, PR MACH LEARN RES, V139
[7]
Bulat A, 2021, ADV NEUR IN
[8]
Campbell Dylan, 2021, ADV NEUR IN, V34
[9]
Quo Vadis, Action Recognition? A New Model and the Kinetics Dataset
[J].
30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017),
2017,
:4724-4733
[10]
CrossViT: Cross-Attention Multi-Scale Vision Transformer for Image Classification
[J].
2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2021),
2021,
:347-356