共 69 条
[51]
Vaswani A, 2017, ADV NEUR IN, V30
[52]
Wang AL, 2019, Arxiv, DOI arXiv:1804.07461
[53]
VideoMAE V2: Scaling Video Masked Autoencoders with Dual Masking
[J].
2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR),
2023,
:14549-14560
[54]
InternImage: Exploring Large-Scale Vision Foundation Models with Deformable Convolutions
[J].
2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR),
2023,
:14408-14419
[55]
Pyramid Vision Transformer: A Versatile Backbone for Dense Prediction without Convolutions
[J].
2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2021),
2021,
:548-558
[56]
Wasim ST, 2023, Arxiv, DOI arXiv:2304.03307
[57]
Wei C., 2021, arXiv
[58]
Xie C., 2022, ICLR
[59]
Yang Zhilin, 2019, XLNet: Generalized autoregressive pretraining for language understand
[60]
Yao HJ, 2023, Arxiv, DOI arXiv:2311.15769