Inductive Model Generation for Text Categorization using a Bipartite Heterogeneous Network

被引:10
作者
Rossi, Rafael Geraldeli [1 ]
Faleiros, Thiago de Paulo [1 ]
Lopes, Alneu de Andrade [1 ]
Rezende, Solange Oliveira [1 ]
机构
[1] Univ Sao Paulo, Sao Carlos, SP, Brazil
来源
12TH IEEE INTERNATIONAL CONFERENCE ON DATA MINING (ICDM 2012) | 2012年
关键词
Heterogeneous Network; Text Categorization;
D O I
10.1109/ICDM.2012.130
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Usually, algorithms for categorization of numeric data have been applied for text categorization after a preprocessing phase which assigns weights for textual terms deemed as attributes. However, due to characteristics of textual data, some algorithms for data categorization are not efficient for text categorization. Characteristics of textual data such as sparsity and high dimensionality sometimes impair the quality of general-purpose classifiers. Here, we propose a text classifier based on a bipartite heterogeneous network used to represent textual document collections. Such algorithm induces a classification model assigning weights to objects that represents terms of the textual document collection. The induced weights correspond to the influence of the terms in the classification of documents they appear. The least-mean-square algorithm is used in the inductive process. Empirical evaluation using a large amount of textual document collections shows that the proposed IMBHN algorithm produces significantly better results than the k-NN, C4.5, SVM and Naive Bayes algorithms.
引用
收藏
页码:1086 / 1091
页数:6
相关论文
共 14 条
[1]  
[Anonymous], 2012, MINING TEXT DATA
[2]  
Caruana R., 2006, ACM INT C P SER, P161, DOI [10.1145/1143844.1143865, DOI 10.1145/1143844.1143865]
[3]  
Demsar J, 2006, J MACH LEARN RES, V7, P1
[4]  
Feldman R., 2006, TEXT MINING HDB ADV
[5]  
Forman G., 2006, 19MCLASSTEXTWC DATAS
[6]  
Ji M, 2010, LECT NOTES ARTIF INT, V6321, P570
[7]  
KOHONEN T, 1988, IEEE INT C NEUR NETW, V1, P61
[8]  
Newman M., 2010, Networks: An introduction oxford univ
[9]   Machine learning in automated text categorization [J].
Sebastiani, F .
ACM COMPUTING SURVEYS, 2002, 34 (01) :1-47
[10]  
Widrow B., 1960, Adaptive switching circuits, DOI DOI 10.21236/AD0241531