共 71 条
[12]
Dai Wenliang, 2023, arXiv
[13]
Davis B., 2022, COMPUTER VISION ECCV, P280, DOI DOI 10.1007/978-3-031-25069-919
[14]
Dosovitskiy A., 2021, arXiv
[15]
Driess Danny, 2023, Palm-e: An embodied multimodal language model
[16]
Making the V in VQA Matter: Elevating the Role of Image Understanding in Visual Question Answering
[J].
30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017),
2017,
:6325-6334
[17]
VizWiz Grand Challenge: Answering Visual Questions from Blind People
[J].
2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR),
2018,
:3608-3617
[18]
Deep Residual Learning for Image Recognition
[J].
2016 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR),
2016,
:770-778
[19]
Hong T, 2022, AAAI CONF ARTIF INTE, P10767
[20]
LayoutLMv3: Pre-training for Document AI with Unified Text and Image Masking
[J].
PROCEEDINGS OF THE 30TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2022,
2022,
:4083-4091