Named entity recognition: a semi-supervised learning approach

被引:5
作者
Sintayehu H. [1 ]
Lehal G.S. [1 ]
机构
[1] Department of Computer Science, Punjabi University, Patiala
关键词
Expectation–maximization; Feature set; Label propagation; Machine learning; Named entity recognition; Semi-supervised learning;
D O I
10.1007/s41870-020-00470-4
中图分类号
学科分类号
摘要
Named entity recognition (NER) application development for under-resourced (i.e. NLP resource) language is usually obstructed by lack of named entity tagged dataset and this led to performance deterioration. Similarly, in Amharic language getting annotated training dataset for named entity recognition problem is extortionate, though an enormous amount of untagged data is easily accessible. Fortunately, the performance of NER possibly be boosted via encompassing a few labeled data with an oversized collection of unlabeled data. Based on this premise, this paper tend to investigate graph-based label propagation algorithm for the Amharic NER problem, a simple semi-supervised, iterative algorithm, to propagate labels through the dataset. In addition, it is aimed at making a rigorous comparison with expectation–maximization with semi-supervised learning approaches. The experiment reveals, label propagation based NER achieves superior performance compared to expected maximization using a few labeled training data. Since expectation maximization algorithm demands a moderate labeled example to be learned, meant very few labeled examples are not enough to generate adequate parameters for recognition of named entities, consequently it couldn’t perform great as the label propagation algorithm. © 2020, Bharati Vidyapeeth's Institute of Computer Applications and Management.
引用
收藏
页码:1659 / 1665
页数:6
相关论文
共 20 条
[1]  
Guodong Z., Jian S., Named entity recognition using an HMM-based Chunk Tagger, Proceedings of the 40Th Annual Meeting of the Association for Computational Linguistics, pp. 473-480, (2002)
[2]  
Banko M., Cafarella J., Soderland S., Broadhead M., Etzion O., Open information extraction from the web, Proceedings of the 20Th International Joint Conference on Artificial Intelligence, pp. 2670-2676, (2007)
[3]  
Besufkad A., A Named Entity Recognition System for Amharic, (2013)
[4]  
Mikiyas T., Amharic Named Entity Recognition Using a Hybrid Approach, (2017)
[5]  
Moges A., Named Entity Recognition for Amharic Language, (2010)
[6]  
Zaghloul W., Trimi S., Developing an innovative entity extraction method for unstructured data, Int J Qual Innov, 3, 3, pp. 217-226, (2017)
[7]  
Xiaoshan F., Huanye G., Jianfeng G., A semi supervised approach to build annotated corpus for Chinese named entity recognition, Proceedings of the Third (SIGHAN) Workshop on Chinese Language Processing, pp. 129-133, (2004)
[8]  
Xiaojin Z., Ghahramaniy Z., Learning from labeled and unlabeled data with label propagation, CMU CALD Tech Report CMU-CALD-02-107, (2003)
[9]  
Xiaojin Z., Ghahramani Z., Lafferty D., Semi-supervised learning using gaussian fields and harmonic functions, Proceedings of the Twentieth International Conference on Machine Learning, pp. 912-919, (2003)
[10]  
Bengio Y., Courville A., Vincent P., Representation learning: a review and new perspectives, IEEE Trans Pattern Anal Mach Intell, 35, 8, pp. 1798-1828, (2013)