Semi-supervised deep learning based named entity recognition model to parse education section of resumes

被引:18
作者
Gaur, Bodhvi [1 ,2 ]
Saluja, Gurpreet Singh [1 ]
Sivakumar, Hamsa Bharathi [1 ]
Singh, Sanjay [1 ]
机构
[1] Manipal Inst Technol, Dept Informat & Commun Technol, MAHE, Manipal 576104, India
[2] Johns Hopkins Univ, Dept Comp Sci, 3400 North Charles St, Baltimore, MD 21218 USA
关键词
Named entity recognition (NER); Semi-supervised learning; Deep learning models; Natural language processing; Resume information extraction;
D O I
10.1007/s00521-020-05351-2
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
A job seeker's resume contains several sections, including educational qualifications. Educational qualifications capture the knowledge and skills relevant to the job. Machine processing of the education sections of resumes has been a difficult task. In this paper, we attempt to identify educational institutions' names and degrees from a resume's education section. Usually, a significant amount of annotated data is required for neural network-based named entity recognition techniques. A semi-supervised approach is used to overcome the lack of large annotated data. We trained a deep neural network model on an initial (seed) set of resume education sections. This model is used to predict entities of unlabeled education sections and is rectified using a correction module. The education sections containing the rectified entities are augmented to the seed set. The updated seed set is used for retraining, leading to better accuracy than the previously trained model. This way, it can provide a high overall accuracy without the need of large annotated data. Our model has achieved an accuracy of 92.06% on the named entity recognition task.
引用
收藏
页码:5705 / 5718
页数:14
相关论文
共 39 条
[1]  
[Anonymous], 2018, NATURAL LANGUAGE TOO
[2]  
Ayishathahira C. H., 2018, 2018 International CET Conference on Control, Communication, and Computing (IC4), P388, DOI 10.1109/CETIC4.2018.8530883
[3]  
Babar N, 2017, LEVENSHTEIN ALGORITH
[4]  
Bird S., 2009, NATURAL LANGUAGE PRO
[5]   Conditional Random Fields for Pattern Recognition Applied to Structured Data [J].
Burr, Tom ;
Skurikhin, Alexei .
ALGORITHMS, 2015, 8 (03) :466-483
[6]  
Chifu ES, 2017, INT C INTELL COMP CO, P189, DOI 10.1109/ICCP.2017.8117003
[7]  
Chollet F., 2018, KERAS DEEP LEARNING
[8]  
Farkas Richard, 2014, Mining Intelligence and Knowledge Exploration. Second International Conference, MIKE 2014. Proceedings: LNCS 8891, P333, DOI 10.1007/978-3-319-13817-6_32
[9]  
Ghufran M, 2017, INT CONF RES CHAL, P135, DOI 10.1109/RCIS.2017.7956530
[10]  
Gonzalez J, 2018, FUZZYWUZZY FUZZY STR