Coupled Term-Term Relation Analysis for Document Clustering

被引:129
作者
Cheng, Xin [1 ]
Miao, Duoqian [1 ]
Wang, Can [2 ]
Cao, Longbing [2 ]
机构
[1] Tongji Univ, Dept Comp Sci & Technol, Shanghai 200092, Peoples R China
[2] Univ Technol Sydney, Adv Analyt Inst, Sydney, NSW 2007, Australia
来源
2013 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN) | 2013年
关键词
D O I
10.1109/IJCNN.2013.6706853
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Traditional document clustering approaches are usually based on the Bag of Words model, which is limited by its assumption of the independence among terms. Recent strategies have been proposed to capture the relation between terms based on statistical analysis, and they estimate the relation between terms purely by their co-occurrence across the documents. However, the implicit interactions with other link terms are overlooked, which leads to the discovery of incomplete information. This paper proposes a coupled term-term relation model for document representation, which considers both the intra-relation (i.e. co-occurrence of terms) and inter-relation (i.e. dependency of terms via link terms) between a pair of terms. The coupled relation for each pair of terms is further used to map a document onto a new feature space, which includes more semantic information. Substantial experiments verify that the document clustering incorporated with our proposed relation achieves a significant performance improvement compared to the state-of-the-art techniques.
引用
收藏
页数:8
相关论文
共 21 条
[1]   Text clustering with local semantic kernels [J].
AlSumait, Loulwah ;
Domeniconi, Carlotta .
SURVEY OF TEXT MINING II: CLUSTERING, CLASSIFICATION, AND RETRIEVAL, 2008, :87-105
[2]  
[Anonymous], 2008, International Conference on Research and Development in Information Retrieval, DOI [10.1145/, DOI 10.1145/1390334.1390367]
[3]  
[Anonymous], 1995, P 12 INT MACH LEARN
[4]  
[Anonymous], 2007, P 16 INT WORLD WID W, DOI DOI 10.1145/1242572.1242675
[5]  
[Anonymous], 2011, PROC 20 ACM C INFORM
[6]   A context vector model for information retrieval [J].
Billhardt, H ;
Borrajo, D ;
Maojo, V .
JOURNAL OF THE AMERICAN SOCIETY FOR INFORMATION SCIENCE AND TECHNOLOGY, 2002, 53 (03) :236-249
[7]   Extracting semantic representations from word co-occurrence statistics: A computational study [J].
Bullinaria, John A. ;
Levy, Joseph P. .
BEHAVIOR RESEARCH METHODS, 2007, 39 (03) :510-526
[8]   Coupled Behavior Analysis with Applications [J].
Cao, Longbing ;
Ou, Yuming ;
Yu, Philip S. .
IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2012, 24 (08) :1378-1392
[9]  
CRAVEN M, 1998, P 15 NAT C ART INT A
[10]  
DEERWESTER S, 1990, J AM SOC INFORM SCI, V41, P391, DOI 10.1002/(SICI)1097-4571(199009)41:6<391::AID-ASI1>3.0.CO