共 54 条
[23]
Lei J, 2020, Arxiv, DOI arXiv:2005.05402
[26]
SWINBERT: End-to-End Transformers with Sparse Attention for Video Captioning
[J].
2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2022),
2022,
:17928-17937
[27]
Video Swin Transformer
[J].
2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2022),
2022,
:3192-3201
[28]
Swin Transformer: Hierarchical Vision Transformer using Shifted Windows
[J].
2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2021),
2021,
:9992-10002
[30]
Mingxing Wang, 2020, 2020 IEEE 3rd International Conference on Computer and Communication Engineering Technology (CCET), P10, DOI 10.1109/CCET50901.2020.9213129