共 37 条
[1]
[Anonymous], 2021, PMLR
[2]
VQA: Visual Question Answering
[J].
2015 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV),
2015,
:2425-2433
[3]
MUTAN: Multimodal Tucker Fusion for Visual Question Answering
[J].
2017 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV),
2017,
:2631-2639
[4]
LaKo: Knowledge-driven Visual Question Answering via Late Knowledge-to-Text Injection
[J].
PROCEEDINGS OF THE 11TH INTERNATIONAL JOINT CONFERENCE ON KNOWLEDGE GRAPHS, IJCKG 2022,
2022,
:20-29
[5]
Demszky Dorottya, 2018, Transforming question answering datasets into natural language inference datasets
[6]
MuKEA: Multimodal Knowledge Extraction and Accumulation for Knowledge-based Visual Question Answering
[J].
2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2022),
2022,
:5079-5088
[7]
Gardères F, 2020, FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, EMNLP 2020, P489
[8]
A Unified End-to-End Retriever-Reader Framework for Knowledge-based VQA
[J].
PROCEEDINGS OF THE 30TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2022,
2022,
:2061-2069
[9]
Hudson Drew A, 2019, P IEEECVF C COMPUTER, P6700
[10]
Karpukhin V, 2020, PROCEEDINGS OF THE 2020 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP), P6769