Multimodal Named Entity Recognition with Image Attributes and Image Knowledge

被引:30
作者
Chen, Dawei [1 ]
Li, Zhixu [1 ,2 ]
Gu, Binbin [4 ]
Chen, Zhigang [3 ]
机构
[1] Soochow Univ, Sch Comp Sci & Technol, Suzhou, Peoples R China
[2] IFLYTEK Res, Suzhou, Peoples R China
[3] iFLYTEK, State Key Lab Cognit Intelligence, Hefei, Peoples R China
[4] Univ Calif Irvine, Irvine, CA USA
来源
DATABASE SYSTEMS FOR ADVANCED APPLICATIONS (DASFAA 2021), PT II | 2021年 / 12682卷
基金
中国国家自然科学基金; 国家重点研发计划;
关键词
Named entity recognition; Multimodal learning; Social media; Knowledge graph;
D O I
10.1007/978-3-030-73197-7_12
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Multimodal named entity extraction is an emerging task which uses both textual and visual information to detect named entities and identify their entity types. The existing efforts are often flawed in two aspects. Firstly, they may easily ignore the natural prejudice of visual guidance brought by the image. Secondly, they do not further explore the knowledge contained in the image. In this paper, we novelly propose a novel neural network model which introduces both image attributes and image knowledge to help improve named entity extraction. While the image attributes are high-level abstract information of an image that could be labelled by a pre-trained model based on ImageNet, the image knowledge could be obtained from a general encyclopedia knowledge graph with multi-modal information such as DBPedia and Yago. Our emperical study conducted on real-world data collection demonstrates the effectiveness of our approach comparing with several state-of-the-art approaches.
引用
收藏
页码:186 / 201
页数:16
相关论文
共 34 条
  • [11] Lample G, 2016, Arxiv, DOI [arXiv:1603.01360, DOI 10.48550/ARXIV.1603.01360]
  • [12] Limsopatham N, 2016, Bidirectional LSTM for named entity recognition in Twitter messages
  • [13] Lin B.Y., 2017, P 3 WORKSHOP NOISY U, P160, DOI [10.18653/v1/W17-4421, DOI 10.18653/V1/W17-4421]
  • [14] Lu D, 2018, PROCEEDINGS OF THE 56TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL), VOL 1, P1990
  • [15] Luo G, 2015, P 2015 C EMP METH NA, P879
  • [16] Ma RT, 2020, Arxiv, DOI arXiv:1908.05969
  • [17] OK-VQA: A Visual Question Answering Benchmark Requiring External Knowledge
    Marino, Kenneth
    Rastegari, Mohammad
    Farhadi, Ali
    Mottaghi, Roozbeh
    [J]. 2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019), 2019, : 3190 - 3199
  • [18] Moon S, 2018, Arxiv, DOI arXiv:1802.07862
  • [19] Ng HT, 2002, COLING 2002
  • [20] Ritter A, 2011, P C EMP METH NAT LAN, P1524