共 25 条
[1]
VQA: Visual Question Answering
[J].
2015 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV),
2015,
:2425-2433
[2]
Learning the Best Pooling Strategy for Visual Semantic Embedding
[J].
2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021,
2021,
:15784-15793
[3]
INTRA-MODAL CONSTRAINT LOSS FOR IMAGE-TEXT RETRIEVAL
[J].
2022 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING, ICIP,
2022,
:4023-4027
[4]
Devlin J, 2019, 2019 CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: HUMAN LANGUAGE TECHNOLOGIES (NAACL HLT 2019), VOL. 1, P4171
[6]
Diao HW, 2021, AAAI CONF ARTIF INTE, V35, P1218
[7]
Faghri Fartash, 2018, BRIT MACH VIS C
[8]
He K, 2016, PROC CVPR IEEE, P770, DOI [10.1109/CVPR.2016.90, DOI 10.1109/CVPR.2016.90]
[9]
Ji Z, 2021, PROCEEDINGS OF THE THIRTIETH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, IJCAI 2021, P765
[10]
Karpathy A, 2015, PROC CVPR IEEE, P3128, DOI 10.1109/CVPR.2015.7298932