共 17 条
[1]
Chen Xiaolin, 2023, ACM T INFORM SYST, V42, P1
[2]
Visual Dialog
[J].
30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017),
2017,
:1080-1089
[3]
Devlin J, 2019, 2019 CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: HUMAN LANGUAGE TECHNOLOGIES (NAACL HLT 2019), VOL. 1, P4171
[4]
Feng JZ, 2023, PROCEEDINGS OF THE 61ST ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, ACL 2023, VOL 1, P7348
[5]
Jia C, 2021, PR MACH LEARN RES, V139
[7]
Stacked Cross Attention for Image-Text Matching
[J].
COMPUTER VISION - ECCV 2018, PT IV,
2018, 11208
:212-228
[8]
Knowledge-aware Multimodal Dialogue Systems
[J].
PROCEEDINGS OF THE 2018 ACM MULTIMEDIA CONFERENCE (MM'18),
2018,
:801-809
[9]
Lin H, 2020, LANGUAGE MODELS ARE, V33, P1877, DOI DOI 10.48550/ARXIV.2005.14165
[10]
Microsoft COCO: Common Objects in Context
[J].
COMPUTER VISION - ECCV 2014, PT V,
2014, 8693
:740-755