Joint embedding VQA model based on dynamic word vector

被引:133
作者
Ma, Zhiyang [1 ]
Zheng, Wenfeng [1 ]
Chen, Xiaobing [1 ]
Yin, Lirong [2 ]
机构
[1] Univ Elect Sci & Technol China, Sch Automat, Chengdu, Peoples R China
[2] Louisiana State Univ, Dept Geog & Anthropol, Baton Rouge, LA 70803 USA
关键词
Faster R-CNN; ELMo; MA; VQA;
D O I
10.7717/peerj-cs.353
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The existing joint embedding Visual Question Answering models use different combinations of image characterization, text characterization and feature fusion method, but all the existing models use static word vectors for text characterization. However, in the real language environment, the same word may represent different meanings in different contexts, and may also be used as different grammatical components. These differences cannot be effectively expressed by static word vectors, so there may be semantic and grammatical deviations. In order to solve this problem, our article constructs a joint embedding model based on dynamic word vector-none KB-Specific network (N-KBSN) model which is different from commonly used Visual Question Answering models based on static word vectors. The N-KBSN model consists of three main parts: question text and image feature extraction module, self attention and guided attention module, feature fusion and classifier module. Among them, the key parts of N-KBSN model are: image characterization based on Faster R-CNN, text characterization based on ELMo and feature enhancement based on multi-head attention mechanism. The experimental results show that the N-KBSN constructed in our experiment is better than the other 2017-winner (glove) model and 2019-winner (glove) model. The introduction of dynamic word vector improves the accuracy of the overall results.
引用
收藏
页数:20
相关论文
共 41 条
[1]  
Abacha AB, 2019, CLEF
[2]   Bottom-Up and Top-Down Attention for Image Captioning and Visual Question Answering [J].
Anderson, Peter ;
He, Xiaodong ;
Buehler, Chris ;
Teney, Damien ;
Johnson, Mark ;
Gould, Stephen ;
Zhang, Lei .
2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2018, :6077-6086
[3]  
[Anonymous], 2016, MULTIMODAL COMPACT B
[4]  
[Anonymous], PROC CVPR IEEE
[5]  
[Anonymous], 2016, FOCUSED DYNAMIC ATTE
[6]   VQA: Visual Question Answering [J].
Antol, Stanislaw ;
Agrawal, Aishwarya ;
Lu, Jiasen ;
Mitchell, Margaret ;
Batra, Dhruv ;
Zitnick, C. Lawrence ;
Parikh, Devi .
2015 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2015, :2425-2433
[7]   The Text-Based Adventure AI Competition [J].
Atkinson, Timothy ;
Baier, Hendrik ;
Copplestone, Tara ;
Devlin, Sam ;
Swan, Jerry .
IEEE TRANSACTIONS ON GAMES, 2019, 11 (03) :260-266
[8]  
Chen K, 2015, ABC CNN ATTENTION BA
[9]   Object Detection with Discriminatively Trained Part-Based Models [J].
Felzenszwalb, Pedro F. ;
Girshick, Ross B. ;
McAllester, David ;
Ramanan, Deva .
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2010, 32 (09) :1627-1645
[10]  
Gao Haoyuan, 2015, Advances in Neural Information Processing Systems (NeurIPS)