Hierarchical Document Classification Using Automatically Generated Hierarchy

被引:0
作者
Li, Tao [1 ]
Zhu, Shenghuo [1 ]
机构
[1] Florida Int Univ, Sch Comp Sci, Miami, FL 33199 USA
来源
PROCEEDINGS OF THE FIFTH SIAM INTERNATIONAL CONFERENCE ON DATA MINING | 2005年
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Automated text categorization has witnessed a booming interest with the exponential growth of information and the ever-increasing needs for organizations. The underlying hierarchical structure identifies the relationships of dependence between different categories and provides valuable sources of information for categorization. Although considerable research has been conducted in the field of hierarchical document categorization, little has been done on automatic generation of topic hierarchies. In this paper, we propose the method of using linear discriminant projection to generate more meaningful intermediate levels of hierarchies in large flat sets of classes. The linear discriminant projection approach first transforms all documents onto a low-dimensional space and then clusters the categories into hierarchies accordingly. The paper also investigates the effect of using generated hierarchical structure for text classification. Our experiments show that generated hierarchies improve classification performance in most cases. A preliminary short version of the paper has appeared in [8].
引用
收藏
页码:521 / 525
页数:5
相关论文
共 11 条
[1]  
[Anonymous], 1997, ICML
[2]  
D'Alessio S., 2000, RIAO 00
[3]  
DEERWESTER S, 1990, J AM SOC INFORM SCI, V41, P391, DOI 10.1002/(SICI)1097-4571(199009)41:6<391::AID-ASI1>3.0.CO
[4]  
2-9
[5]  
Fukunaga K., 1990, INTRO STAT PATTERN R
[6]   Generalizing discriminant analysis using the generalized singular value decomposition [J].
Howland, P ;
Park, H .
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2004, 26 (08) :995-1006
[7]  
Jain K, 1988, Algorithms for clustering data
[8]  
Li T., 2003, P 12 INT C INFORM KN, P317
[9]  
Li Tao, 2003, ACM SIGIR, P421
[10]   Hierarchical text classification and evaluation [J].
Sun, AX ;
Lim, EP .
2001 IEEE INTERNATIONAL CONFERENCE ON DATA MINING, PROCEEDINGS, 2001, :521-528