Classification optimization for training a large dataset with Naive Bayes

被引:8
作者
Thi Thanh Sang Nguyen [1 ]
Pham Minh Thu Do [1 ]
机构
[1] Int Univ Vietnam Natl Univ, Sch Comp Sci & Engn, Ho Chi Minh City, Vietnam
关键词
Data mining; Naive Bayes; Word embedding; Feature selection;
D O I
10.1007/s10878-020-00578-0
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
Book classification is very popular in digital libraries. Book rating prediction is crucial to improve the care of readers. The commonly used techniques are decision tree, Naive Bayes (NB), neural networks, etc. Moreover, mining book data depends on feature selection, data pre-processing, and data preparation. This paper proposes the solutions of knowledge representation optimization as well as feature selection to enhance book classification and point out appropriate classification algorithms. Several experiments have been conducted and it has been found that NB could provide best prediction results. The accuracy and performance of NB can be improved and outperform other classification algorithms by applying appropriate strategies of feature selections, data type selection as well as data transformation.
引用
收藏
页码:141 / 169
页数:29
相关论文
共 21 条
[1]  
Amatriain X, 2011, RECOMMENDER SYSTEMS HANDBOOK, P39, DOI 10.1007/978-0-387-85820-3_2
[2]  
[Anonymous], 2014, P 2014 C EMP METH NA
[3]  
[Anonymous], 2014, P 31 INT C INT C MAC
[4]  
[Anonymous], 2009, ENCY DATABASE SYSTEM, DOI DOI 10.1007/978-0-387-39940-9_565
[5]  
Faloutsos C, 2012, MOR KAUF D, P39
[6]  
Frank E., 2016, WEKA WORKBENCH ONLIN
[7]  
Freund Y., 1996, Machine Learning. Proceedings of the Thirteenth International Conference (ICML '96), P148
[8]   A decision-theoretic generalization of on-line learning and an application to boosting [J].
Freund, Y ;
Schapire, RE .
JOURNAL OF COMPUTER AND SYSTEM SCIENCES, 1997, 55 (01) :119-139
[9]  
Kam H T, 1995, P 3 INT C DOC AN REC
[10]  
Mikolov T., 2013, Advances in neural information processing systems, V26, P3111