共 52 条
[1]
[Anonymous], 2014, Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP)
[2]
MUTAN: Multimodal Tucker Fusion for Visual Question Answering
[J].
2017 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV),
2017,
:2631-2639
[3]
Boski M, 2017, 2017 10TH INTERNATIONAL WORKSHOP ON MULTIDIMENSIONAL (ND) SYSTEMS (NDS)
[5]
Visual Question Reasoning on General Dependency Tree
[J].
2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR),
2018,
:7249-7257
[6]
See-Through-Text Grouping for Referring Image Segmentation
[J].
2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019),
2019,
:7453-7462
[7]
Language-Based Image Editing with Recurrent Attentive Models
[J].
2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR),
2018,
:8721-8729
[8]
Chen LB, 2017, IEEE INT SYMP NANO, P1, DOI 10.1109/NANOARCH.2017.8053709
[9]
Devlin J, 2019, 2019 CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: HUMAN LANGUAGE TECHNOLOGIES (NAACL HLT 2019), VOL. 1, P4171
[10]
Dosovitskiy A., 2021, arXiv