Semi-Supervised Learning for Named Entity Recognition Using Weakly Labeled Training Data

被引:0
作者
Zafarian, Atefeh [1 ]
Rokni, Ali [1 ]
Khadivi, Shahram [1 ]
Ghiasifard, Sonia [1 ]
机构
[1] Amirkabir Univ Technol, Dept Comp Engn & IT, HLT Lab, Tehran, Iran
来源
2015 INTERNATIONAL SYMPOSIUM ON ARTIFICIAL INTELLIGENCE AND SIGNAL PROCESSING (AISP) | 2015年
关键词
Named entity Recognition; Bilingual parallel corpora; graph-based semi-supervised learning;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The shortage of the annotated training data is still an important challenge to building many Natural Language Process (NLP) tasks such as Named Entity Recognition. NER requires a large amount of training data with a high degree of human supervision whereas there is not enough labeled data for every language. In this paper, we use an unlabeled bilingual corpora to extract useful features from transferring information from resource-rich language toward resource-poor language and by using these features and a small training data, make a NER supervised model. Then we utilize a graph-based semi-supervised learning method that trains a CRF-based supervised classifier using that labeled data and uses high-confidence predictions on the unlabeled data to expand the training set and improve efficiency of NER model with the new training set.
引用
收藏
页码:129 / 135
页数:7
相关论文
共 20 条
[1]  
[Anonymous], 2004, Proceedings of the International Joint Workshop on Natural Language Processing in Biomedicine and its Applications (NLPBA), DOI 10.3115/1567594.1567618
[2]  
[Anonymous], 1997, P 5 APPL NAT LANG PR, DOI DOI 10.3115/974557.974586
[3]  
[Anonymous], 2008, PROC AUSTRALAS LANG
[4]  
[Anonymous], 2003, P 20 INT C MACH LEAR
[5]  
[Anonymous], 1995, ACL, DOI 10.3115/981658.981684
[6]  
Asahara M, 2003, HLT-NAACL 2003: HUMAN LANGUAGE TECHNOLOGY CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, PROCEEDINGS OF THE MAIN CONFERENCE, P8
[7]  
Borthwick A., 1998, P 7 MESS UND C MUC 7
[8]  
Chieu HaiLeong., 2002, Proceedings of the 19th international conference on Computational linguistics-, V1, P1
[9]  
Collins M., UNSUPERVISED MODELS, P100
[10]  
Cucerzan Silviu., 1999, PROC 1999 JOINT SIGD, P90