共 59 条
[1]
Argyriou A., 2006, Advances in neural information processing systems, V19
[2]
ViViT: A Video Vision Transformer
[J].
2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2021),
2021,
:6816-6826
[3]
Asai Akari, 2022, ARXIV220511961
[4]
Inside-Outside Net: Detecting Objects in Context with Skip Pooling and Recurrent Neural Networks
[J].
2016 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR),
2016,
:2874-2883
[5]
Ben-Zaken E, 2022, PROCEEDINGS OF THE 60TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2022): (SHORT PAPERS), VOL 2, P1
[6]
Brown Tom B., 2020, Language Models Are FewShot Learners., DOI [DOI 10.5555/3495724.3495883, 10.5555/3495724.3495883]
[7]
Carion N, 2020, Img Proc Comp Vis Re, V12346, P213, DOI 10.1007/978-3-030-58452-8_13
[8]
Emerging Properties in Self-Supervised Vision Transformers
[J].
2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2021),
2021,
:9630-9640
[9]
Pre-Trained Image Processing Transformer
[J].
2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021,
2021,
:12294-12305
[10]
Chen Shoufa, 2022, ARXIV220513535