共 51 条
[1]
Agrawal Aishwarya, 2016, ARXIV
[2]
VQA: Visual Question Answering
[J].
2015 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV),
2015,
:2425-2433
[3]
Bojanowski P., 2017, Trans. Assoc. Comput. Linguistics, V5, P135, DOI [DOI 10.1162/TACLA00051, 10.1162/tacl_a_00051, DOI 10.1162/TACL_A_00051]
[4]
Bolt R. A., 1980, Computer Graphics, V14, P262, DOI 10.1145/965105.807503
[5]
Bougares Fethi, 2018, P 3 C MACH TRANSL SH, P304
[6]
Butterworth G, 2003, POINTING: WHERE LANGAUAGE, CULTURE, AND COGNITON MEET, P9
[7]
Calli B, 2015, PROCEEDINGS OF THE 17TH INTERNATIONAL CONFERENCE ON ADVANCED ROBOTICS (ICAR), P510, DOI 10.1109/ICAR.2015.7251504
[8]
Chen Y., 2021, IEEE INT C COMPUTER
[9]
Chen Yen-Chun, 2020, ECCV