Using a domain expert in semi-supervised learning

被引:1
作者
机构
[1] School of Computer Science and Engineering, The University of New South Wales, Sydney
来源
Finlayson, Angela (angf@cse.unsw.edu.au) | 1600年 / Springer Verlag卷 / 8863期
关键词
Supervised learning;
D O I
10.1007/978-3-319-13332-4_9
中图分类号
学科分类号
摘要
AbstractSemi-supervised learning requires some data to be labeled but then uses this in conjunction with a large amount of unlabeled data to learn a model for a domain. Since the labeled data should be representative of the range of unlabeled data available, the aim of this research is to identify which data should be labeled. An approach has been developed where a domain expert starts to label unlabeled data and also writes rules to classify such data. The labeled data are also used as machine learning training data. If the expert rules and the rules developed by machine learning agree on a label for an unseen datum, the label is accepted and the case automatically added to the training data for learning, otherwise the case is checked by the expert and if the label from the rules is wrong, the expert provides the correct label and a rule to correctly classify the case. Further data is then processed in the same way. Results from a number of datasets using a simulated expert as the domain expert suggest that this method produces more accurate knowledge bases than other semi-supervised methods using similar amounts of labeled data and the resultant knowledge bases are as accurate as having all the data labeled. © Springer International Publishing Switzerland 2014.
引用
收藏
页码:99 / 111
页数:12
相关论文
共 19 条
[1]  
Zhu X., Semi-supervised learning literature survey, (2005)
[2]  
Chapelle O., Scholkopf B., Zien A., Semi-supervised learning, (2006)
[3]  
Zhou Z.-H., Li M., Semi-supervised learning by disagreement, Knowledge and Information Systems, 24, 3, pp. 415-439, (2010)
[4]  
Blum A., Mitchell T., Combining labeled and unlabeled data with co-training, Proceedings of the Eleventh Annual Conference on Computational Learning Theory, pp. 92-100, (1998)
[5]  
Goldman S., Zhou Y., Enhancing supervised learning with unlabeled data, ICML 2000 Proceedings of the Seventeenth International Conference on Machine Learning, pp. 327-334, (2000)
[6]  
Tur G., Hakkani-Tur D., Schapire R.E., Combining active and semi-supervised learning for spoken language understanding, Speech Communication, 45, 2, pp. 171-186, (2005)
[7]  
Zhu X., Lafferty J., Ghahramani Z., Combining active learning and semi-supervised learning using gaussian fields and harmonic functions, ICML 2003 Workshop on the Continuum from Labeled to Unlabeled Data in Machine Learning and Data Mining, pp. 58-65, (2003)
[8]  
Parsazad S., Saboori E., Allahyar A., Data Selection for Semi-Supervised Learning, 1208, (2012)
[9]  
Finlayson A., Compton P., Run-time validation of knowledge-based systems, Proceedings of the seventh International Conference on Knowledge Capture, pp. 25-32, (2013)
[10]  
Dazeley R., Park S.S., Kang B.H., Online knowledge validation with prudence analysis in a document management application, Expert Systems With Applications, 38, 9, pp. 10959-10965, (2011)