Knowledge base graph emb e dding module design for Visual question answering model

被引：190

作者：

Zheng, Wenfeng ^{[1
]}

Yin, Lirong ^{[2
]}

Chen, Xiaobing ^{[1
]}

Ma, Zhiyang ^{[1
]}

Liu, Shan ^{[1
]}

Yang, Bo ^{[1
]}

机构：

[1] Univ Elect Sci & Technol China, Sch Automat, Chengdu 610054, Peoples R China

[2] Louisiana State Univ, Dept Geog & Anthropol, Baton Rouge, LA 70803 USA

来源：

PATTERN RECOGNITION | 2021年 / 120卷

关键词：

Faster R-CNN; DBpedia spotlight; knowledge base; VQA; IMPACT;

D O I：

10.1016/j.patcog.2021.108153

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

In this paper, a knowledge base graph embedding module is constructed to extend the versatility of knowledge-based VQA (Visual Question Answering) models. The knowledge base graph embedding module constructed in this paper extracts core entities from images and text, and maps them as knowledge base entities, then extracts the sub-graphs closely related to the core entities, and converts the sub-graphs into low-dimensional vectors to realize sub-graph embedding. In order to achieve good sub graph embedding, we first extracted two experimental knowledge bases with rich semantics from DBpedia: DBV and DBA. Based on these two knowledge bases, this paper selects several excellent models in knowledge base embedding as test models, including SE (structured embedding),SME(semantic matching energy function), and TransE model to produce link prediction. The results show that there is a clear correspondence between the entities of the DBV, which can achieve excellent node embedding. And the TransE model can achieve a good knowledge base embedding, so we built the knowledge base graph embedding module based on TransE. And then we construct a VQA model (KBSN) based on the knowledge base graph embedding. Experimental results on VQA2.0 and KB-VQA data sets prove that the knowledge base graph embedding module improves the accuracy. (c) 2021 Elsevier Ltd. All rights reserved.

引用

页数：10

共 50 条

[1]

[Anonymous], 2015, ARXIV PREPRINT ARXIV

[2] DBpedia: A nucleus for a web of open data [J].

Auer, Soeren ;

Bizer, Christian ;

Kobilarov, Georgi ;

Lehmann, Jens ;

Cyganiak, Richard ;

Ives, Zachary .

SEMANTIC WEB, PROCEEDINGS, 2007, 4825 :722-+

[3] DecomVQANet: Decomposing visual question answering deep network via tensor decomposition and regression [J].

Bai, Zongwen ;

Li, Ying ;

Wozniak, Marcin ;

Zhou, Meili ;

Li, Di .

PATTERN RECOGNITION, 2021, 110

[4]

Bollacker KD, 2008, P 2008 ACM SIGMOD IN

[5]

Bordes A, 2011, P 25 AAAI C ART INT, P301, DOI DOI 10.1609/AAAI.V25I1.7917

[6]

Bordes A., 2013, NEURAL INFORM PROCES

[7] A semantic matching energy function for learning with multi-relational data Application to word-sense disambiguation [J].

Bordes, Antoine ;

Glorot, Xavier ;

Weston, Jason ;

Bengio, Yoshua .

MACHINE LEARNING, 2014, 94 (02) :233-259

[8]

Chen K., 2015, Abc-cnn: An attention based convolutional neural network for visual question answering, DOI DOI 10.1155/2015/956757

[9] Temporal evolution characteristics of PM2.5 concentration based on continuous wavelet transform [J].

Chen, Xiaobing ;

Yin, Lirong ;

Fan, Yulin ;

Song, Lihong ;

Ji, Tingting ;

Liu, Yan ;

Tian, Jiawei ;

Zheng, Wenfeng .

SCIENCE OF THE TOTAL ENVIRONMENT, 2020, 699

[10]

Daiber J., 2013, Proceedings of the 9th International Conference on Semantic Systems, P121, DOI [10.1145/2506182.2506198, DOI 10.1145/2506182.2506198]

← 1 2 3 4 5 →