共 44 条
[1]
[Anonymous], 2023, IEEE J.Sel. Areas Commun., V41, P214
[2]
End-to-End Object Detection with Transformers
[J].
COMPUTER VISION - ECCV 2020, PT I,
2020, 12346
:213-229
[3]
Devlin J, 2019, Arxiv, DOI [arXiv:1810.04805, DOI 10.48550/ARXIV.1810.04805]
[4]
Dosovitskiy A, 2021, Arxiv, DOI [arXiv:2010.11929, 10.48550/arXiv.2010.11929, DOI 10.48550/ARXIV.2010.11929]
[5]
MISA: Modality-Invariant and -Specific Representations for Multimodal Sentiment Analysis
[J].
MM '20: PROCEEDINGS OF THE 28TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA,
2020,
:1122-1131
[6]
Masked Autoencoders Are Scalable Vision Learners
[J].
2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2022),
2022,
:15979-15988
[8]
UniT: Multimodal Multitask Learning with a Unified Transformer
[J].
2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2021),
2021,
:1419-1429
[9]
Huang D., 2022, arXiv
[10]
Deep Learning-Based Image Semantic Coding for Semantic Communications
[J].
2021 IEEE GLOBAL COMMUNICATIONS CONFERENCE (GLOBECOM),
2021,