UAMNer: uncertainty-aware multimodal named entity recognition in social media posts

被引:18
作者
Liu, Luping [1 ]
Wang, Meiling [1 ]
Zhang, Mozhi [2 ]
Qing, Linbo [1 ]
He, Xiaohai [1 ]
机构
[1] Sichuan Univ, Coll Elect & Informat Engn, Chengdu 610064, Sichuan, Peoples R China
[2] Univ Maryland, Comp Sci Dept, College Pk, MD 20742 USA
基金
中国国家自然科学基金;
关键词
Multimodal named entity recognition; Multimodal social media; Bayesian neural network; Multimodal transformer; NEURAL-NETWORKS;
D O I
10.1007/s10489-021-02546-5
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Named Entity Recognition (NER) on social media is a challenging task, as social media posts are usually short and noisy. Recently, some work explores different ways to incorporate the visual information from the image to improve NER on social media and achieves great success. However, existing methods ignore a common scenario on social media-the image sometimes does not match the posted text. Thus, the irrelevant images may introduce noisy information in existing models. In this paper, a novel uncertainty-aware framework for multimodal NER (UAMNer) on social media is put forward, which combines visual features with text when the text information is insufficient, thus suppressing noisy information from the irrelevant images. Specifically, we propose a two-stage label refinement framework for multimodal NER in social media posts. Given a multimodal post, we first use a bayesian neural network to produce candidate labels from the text. If the candidate labels have high uncertainty, we then use a multimodal transformer to refine the label with textual and visual features. We experiment on two public datasets, namely Twitter-2015 and Twitter-2017. The proposed method achieves better performance compared with the state-of-the-art methods.
引用
收藏
页码:4109 / 4125
页数:17
相关论文
共 52 条
[1]   Convolutional Neural Networks for Speech Recognition [J].
Abdel-Hamid, Ossama ;
Mohamed, Abdel-Rahman ;
Jiang, Hui ;
Deng, Li ;
Penn, Gerald ;
Yu, Dong .
IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2014, 22 (10) :1533-1545
[2]   Deep learning for Arabic NLP: A survey [J].
Al-Ayyoub, Mahmoud ;
Nuseir, Aya ;
Alsmearat, Kholoud ;
Jararweh, Yaser ;
Gupta, Brij .
JOURNAL OF COMPUTATIONAL SCIENCE, 2018, 26 :522-531
[3]  
Arshad Omer, 2019, 2019 International Conference on Document Analysis and Recognition (ICDAR). Proceedings, P337, DOI 10.1109/ICDAR.2019.00061
[4]  
Ba Jimmy Lei, 2016, LAYER NORMALIZATION, DOI 10.48550/arXiv.1607.06450
[5]  
Bender O., 2003, P 7 C NATURAL LANGUA, P148, DOI DOI 10.3115/1119176.1119196
[6]  
Chen S, 2020, CAPTION IS WORTH THO
[7]  
Chiu Jason PC, 2016, Transactions association for computational linguistics, V4, P357, DOI [DOI 10.1162/TACLA00104, DOI 10.1162/TACL_A_00104, 10.1162/tacl_a_00104]
[8]  
Dai, 2019, 2019 12 INT C IMAG S
[9]  
Devlin J, 2019, 2019 CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: HUMAN LANGUAGE TECHNOLOGIES (NAACL HLT 2019), VOL. 1, P4171
[10]  
Finkel J.R., 2005, P 43 ANN M ASS COMP, P363, DOI [10.3115/1219840.1219885, DOI 10.3115/1219840.1219885]