共 58 条
[31]
TMT: A Transformer-based Modal Translator for Improving Multimodal Sequence Representations in Audio Visual Scene-aware Dialog
[J].
INTERSPEECH 2020,
2020,
:3501-3505
[32]
Oscar: Object-Semantics Aligned Pre-training for Vision-Language Tasks
[J].
COMPUTER VISION - ECCV 2020, PT XXX,
2020, 12375
:121-137
[33]
Lu JS, 2019, ADV NEUR IN, V32
[34]
Lu JS, 2016, ADV NEUR IN, V29
[35]
Luo Huaishao, 2020, ARXIV200206353
[36]
A dataset and exploration of models for understanding video data through fill-in-the-blank question-answering
[J].
30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017),
2017,
:7359-7368
[37]
BLEU: a method for automatic evaluation of machine translation
[J].
40TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, PROCEEDINGS OF THE CONFERENCE,
2002,
:311-318
[38]
Flickr30k Entities: Collecting Region-to-Phrase Correspondences for Richer Image-to-Sentence Models
[J].
2015 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV),
2015,
:2641-2649
[39]
Raffel C., 2019, ABS191010683 ARXIV
[40]
Design a robust controller for active queue management in large delay networks
[J].
ISCC2004: NINTH INTERNATIONAL SYMPOSIUM ON COMPUTERS AND COMMUNICATIONS, VOLS 1 AND 2, PROCEEDINGS,
2004,
:748-754