A confidence-based hierarchical feature clustering algorithm for text classification

被引:0
作者
Jiang, Jung-Yi [1 ]
Yin, Kai-Tai [1 ]
Lee, Shie-Jue [1 ]
机构
[1] Natl Sun Yat Sen Univ, Dept Elect Engn, Kaohsiung, Taiwan
来源
2007 INTERNATIONAL CONFERENCE ON INTELLIGENT PERVASIVE COMPUTING, PROCEEDINGS | 2007年
关键词
D O I
10.1109/IPC.2007.35
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
In this paper we propose a novel feature reduction approach to group words hierarchically into clusters which can then be used as new features for document classification. Initially, each word constitutes a cluster We calculate the mutual confidence between any two different words. The pair of clusters containing the two words with the highest mutual confidence are combined into a new cluster This process of merging is iterated until all the mutual confidences between the un-processed pair of words are smaller than a predefined threshold or only one cluster exists. In this way, a hierarchy of word clusters is obtained. The user can decide the clusters, from a certain level, to be used as new features for document classification. Experimental results have shown that our method can perform better than other methods.
引用
收藏
页码:161 / 164
页数:4
相关论文
共 8 条
[1]  
[Anonymous], 1997, Proceedings of the fourteenth international conference on machine learning, DOI DOI 10.1016/J.ESWA.2008.05.026
[2]  
Baker L. D., 1998, Proceedings of the 21st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, P96, DOI 10.1145/290941.290970
[3]  
BEKKERMAN R, 2002, J MACHINE LEARNING R, V1, P1
[4]  
Dhillon I. S., 2003, Journal of Machine Learning Research, V3, P1265, DOI 10.1162/153244303322753661
[5]  
Lee L, 1993, 31 ANN M ASS COMPUTA, P183, DOI DOI 10.3115/981574.981598
[6]  
MCCALLUM K, 2000, P 6 INT C KNOWL DISC, P169
[7]   Machine learning in automated text categorization [J].
Sebastiani, F .
ACM COMPUTING SURVEYS, 2002, 34 (01) :1-47
[8]  
SLONIM N, 2001, P 23 EUR C INF RET R