Quick Induction of NNTrees for Text Categorization Based on Discriminative Multiple Centroid Approach

被引:0
作者
Hayashi, Hirotomo [1 ]
Zhao, Qiangfu [1 ]
机构
[1] Univ Aizu, Dept Comp & Informat Syst, Aizu Wakamatsu, Fukushima, Japan
来源
2010 IEEE INTERNATIONAL CONFERENCE ON SYSTEMS, MAN AND CYBERNETICS (SMC 2010) | 2010年
关键词
Pattern recognition; decision tree; neural network; dimensionality reduction; text categorization;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Neural network tree (NNTree) is a hybrid model for machine learning. So far, we have proposed an efficient algorithm for inducing NNTrees, and verified through experiments that NNTrees are efficient and effective for solving different pattern recognition problems. However, for problems like text categorization, induction of NNTrees can be very computationally expensive. To solve this problem, we have tried to induce NNTrees after dimensionality reduction. Specifically, we have studied the linear discriminant analysis (LDA) based approach, the principal component analysis (PCA) based approach, and the direct centroid (DC) based approach. Results show that DC is simple but not effective; and LDA performs better but the computational cost for finding the transformation matrix is very high. To solve the problem more efficiently, we propose in this paper the discriminant multiple centroid (DMC) approach. Actually, DMC is a two-stage approach, in which all data are first mapped to a lower dimensional space based on the centroids, and the LDA is then conducted in the mapped space. Experimental results obtained for three public text datasets show that in all cases DMC is much faster than LDA without significant degradation.
引用
收藏
页数:8
相关论文
共 29 条
[1]  
[Anonymous], 2002, Principal components analysis
[2]  
[Anonymous], 2014, C4. 5: programs for machine learning
[3]  
[Anonymous], 1984, OLSHEN STONE CLASSIF, DOI 10.2307/2530946
[4]  
[Anonymous], 1996, Bow: A toolkit for statistical language modeling, text retrieval, classification and clustering
[5]  
[Anonymous], P 14 INT C MACH LEAR
[6]  
[Anonymous], 1989, P C ADV NEUR INF PRO
[7]  
[Anonymous], 1973, Pattern Classification and Scene Analysis
[8]  
[Anonymous], 2007, Uci machine learning repository
[9]  
Ash T., 1989, Connection Science, V1, P365, DOI 10.1080/09540098908915647
[10]  
Church K.W., 1990, WORD ASS NORMS MUTUA, V16, P22