Experiments with hierarchical text classification

被引:0
作者
Granitzer, M [1 ]
Auer, P [1 ]
机构
[1] Know Ctr, Div Knowledge Discovery, A-8010 Graz, Austria
来源
PROCEEDINGS OF THE NINTH IASTED INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND SOFT COMPUTING | 2005年
关键词
machine learning; supervised learning; hierarchical text classification; boosting; ranking performance;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
This paper applies Boosting to hierarchical text classification where the hierarchical structure is given as directed acyclic graph and compares the results to Support Vector Machines. Hierarchical classification is performed top-down and in each node a flat classifier decides if a document should be further propagated or not. As flat classifiers BoosTexter, CentroidBooster and Support Vector Machines are used, were CentroidBooster is an AdaBoost.MH based alternative similar to BoosTexter. Experiments on the Reuters Corpus Volume 1 and the OHSUMED data set show that the F-1-measure increases if the hierarchal structure of a data set is taken into account. Regarding time complexity we show, that depending on the structure of a hierarchy, learning and classification time can be reduced. Besides these hard classification approaches we also investigate the ranking performance of hierarchical classifiers. Ranking, which can be achieved by providing a meaningful score for each classification decision, is important in most practical settings. We investigate an approach based on using a sigmoid function for calculating a meaningful score, where parameter estimation is based on error bounds from computational learning theory.
引用
收藏
页码:177 / 182
页数:6
相关论文
共 14 条
[1]  
DUMAIS ST, P SIGIR 00 23 ACM IN
[2]  
JOACHIMS T, P ECML 98 10 EUR C M
[3]  
KOLLER D, P ICML 97 14 INT C M
[4]  
MCCALLUM AK, P ICML 98 15 INT C M
[5]  
Platt JC, 2000, ADV NEUR IN, P61
[6]  
Rocchio J. J., 1971, RELEVANCE FEEDBACK I
[7]  
ROSE T, P 3 INT C LANG RES E
[8]   Hierarchical text categorization using neural networks [J].
Ruiz, ME ;
Srinivasan, P .
INFORMATION RETRIEVAL, 2002, 5 (01) :87-118
[9]   BoosTexter: A boosting-based system for text categorization [J].
Schapire, RE ;
Singer, Y .
MACHINE LEARNING, 2000, 39 (2-3) :135-168
[10]   Machine learning in automated text categorization [J].
Sebastiani, F .
ACM COMPUTING SURVEYS, 2002, 34 (01) :1-47