共 34 条
[1]
Akbari H, 2021, ADV NEUR IN
[3]
End-to-End Object Detection with Transformers
[J].
COMPUTER VISION - ECCV 2020, PT I,
2020, 12346
:213-229
[4]
Video Emotion Recognition in the Wild Based on Fusion of Multimodal Features
[J].
ICMI'16: PROCEEDINGS OF THE 18TH ACM INTERNATIONAL CONFERENCE ON MULTIMODAL INTERACTION,
2016,
:494-500
[5]
Look, Listen, and Attend: Co-Attention Network for Self-Supervised Audio-Visual Representation Learning
[J].
MM '20: PROCEEDINGS OF THE 28TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA,
2020,
:3884-3892
[6]
Dai WL, 2020, Arxiv, DOI arXiv:2009.09629
[7]
Dai WL, 2021, 2021 CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: HUMAN LANGUAGE TECHNOLOGIES (NAACL-HLT 2021), P5305
[8]
Dosovitskiy A, 2021, Arxiv, DOI arXiv:2010.11929
[9]
Gong Y, 2022, AAAI CONF ARTIF INTE, P10699
[10]
Han W, 2021, 2021 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP 2021), P9180