Multi-scale Visual Semantic Enhancement for Multimodal Named Entity Recognition Method

被引:0
作者
Wang H.-R. [1 ,2 ]
Xu X. [1 ]
Wang T. [1 ]
Chen F.-P. [1 ]
机构
[1] School of Computer Science and Engineering, North Minzu University, Yinchuan
[2] The Key Laboratory of Images, Graphics Intelligent Processing of State Ethnic Affairs Commission, North Minzu University, Yinchuan
来源
Zidonghua Xuebao/Acta Automatica Sinica | 2024年 / 50卷 / 06期
关键词
multi-task learning; multimodal fusion; Multimodal named entity recognition (MNER); Transformer;
D O I
10.16383/j.aas.c230573
中图分类号
学科分类号
摘要
To address the issues of semantic loss in image features and weak semantic constraints in multimodal representations encountered in the research of multimodal named entity recognition (MNER) methods, multi-scale visual semantic enhancement for multimodal named entity recognition method (MSVSE) is proposed. After supplementing image semantics by extracting multiple visual features, the semantic interaction and feature fusion between text features and various visual features are explored through a multimodal feature fusion module. This process outputs multi-scale visual semantic-enhanced multimodal text representations. The visual entity classifier is used to decode multi-scale visual semantic features to learn the semantic consistency between various visual features. The multi-task decoder is invoked to mine the fine-grained semantic representation in multimodal text repre-sentation and text features, and carry out joint decoding to solve the semantic bias problem, thereby further improving the accuracy of named entity recognition. To verify the effectiveness of the method, experiments were carried out on Twitter-2015 and Twitter-2017 respectively, and compared with other 10 methods. The average F1 values of the MSVSE on the two datasets have increased. © 2024 Science Press. All rights reserved.
引用
收藏
页码:1234 / 1245
页数:11
相关论文
共 28 条
  • [1] Moon S, Neves L, Carvalho V., Multimodal named entity recognition for short social media posts, Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pp. 852-860, (2018)
  • [2] Lu D, Neves L, Carvalho V, Zhang N, Ji H., Visual attention model for name tagging in multimodal social media, Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics, pp. 1990-1999, (2018)
  • [3] Asgari-Chenaghlu M, Farzinvash M R, Farzinvash L, Balafar M A, Motamed C., CWI: A multimodal deep learning approach for named entity recognition from social media using character, word and image features, Neural Computing and Applications, 34, 3, pp. 1905-1922, (2022)
  • [4] Zhang Q, Fu J L, Liu X Y, Huang X J., Adaptive co-attention network for named entity recognition in tweets, Proceedings of the 32nd AAAI Conference on Artificial Intelligence, the 30th Innovative Applications of Artificial Intelligence Conference, and the 8th AAAI Symposium on Educational Advances in Artificial Intelligence, pp. 5674-5681, (2018)
  • [5] Zheng C M, Wu Z W, Wang T, Cai Y, Li Q., Object-aware multimodal named entity recognition in social media posts with adversarial learning, IEEE Transactions on Multimedia, 23, pp. 2520-2532, (2020)
  • [6] Wu Z W, Zheng C M, Cai Y, Chen J Y, Leung H F, Li Q., Multimodal representation with embedded visual guiding objects for named entity recognition in social media posts, Proceedings of the 28th ACM International Conference on Multimedia, pp. 1038-1046, (2020)
  • [7] Yu J F, Jiang J, Yang L, Xia R., Improving multimodal named entity recognition via entity span detection with unified multimodal transformer, Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pp. 3342-3352, (2020)
  • [8] Xu B, Huang S Z, Sha C F, Wang H Y., MAF: A general matching and alignment framework for multimodal named entity recognition, Proceedings of the 15th ACM International Conference on Web Search and Data Mining, pp. 1215-1223, (2022)
  • [9] Wang X W, Ye J B, Li Z X, Tian J F, Jiang Y, Yan M, Et al., CAT-MNER: Multimodal named entity recognition with knowledge refined cross-modal attention, Proceedings of the IEEE International Conference on Multimedia and Exposition, pp. 1-6, (2022)
  • [10] Zhang D, Wei S Z, Li S S, Wu H Q, Zhu Q M, Zhou G D., Multimodal graph fusion for named entity recognition with targeted visual guidance, Proceedings of the AAAI Conference on Artificial Intelligence, pp. 14347-14355, (2021)