共 46 条
[1]
Cheng JY, 2021, 2021 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP 2021), P2447
[2]
Devlin J, 2019, 2019 CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: HUMAN LANGUAGE TECHNOLOGIES (NAACL HLT 2019), VOL. 1, P4171
[3]
SlowFast Networks for Video Recognition
[J].
2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019),
2019,
:6201-6210
[4]
Frid-Adar M, 2018, I S BIOMED IMAGING, P289, DOI 10.1109/ISBI.2018.8363576
[5]
MISA: Modality-Invariant and -Specific Representations for Multimodal Sentiment Analysis
[J].
MM '20: PROCEEDINGS OF THE 28TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA,
2020,
:1122-1131
[6]
Deep Residual Learning for Image Recognition
[J].
2016 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR),
2016,
:770-778
[7]
Oscar: Object-Semantics Aligned Pre-training for Vision-Language Tasks
[J].
COMPUTER VISION - ECCV 2020, PT XXX,
2020, 12375
:121-137
[9]
Liu YH, 2019, Arxiv, DOI arXiv:1907.11692
[10]
Liu Z, 2018, PROCEEDINGS OF THE 56TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL), VOL 1, P2247