A Text Document Clustering Method Based on Topical Concept

被引:0
作者
Ding, Yi [1 ]
Fu, Xian [1 ]
机构
[1] Hubei Normal Univ, Coll Comp Sci & Technol, Huangshi, Peoples R China
来源
ADVANCES IN ELECTRONIC COMMERCE, WEB APPLICATION AND COMMUNICATION, VOL 1 | 2012年 / 148卷
关键词
document clustering; clusters indexing; topical concept;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Nowadays, document clustering technology has been extensively used in text mining, information retrieval systems and etc. The conventional document clustering methods rely on the classical vector-space model using the key words as the feature. However, these methods ignore the semantic relations among the keywords, do not really address the special problems of document clustering: high dimensionality of the data, and high computation complexity. To solve these problems, based on topic concept clustering, this paper proposes a method for Chinese document clustering. In this paper, we introduce a novel topical document clustering method called Document Features Indexing Clustering (DFIC), which can identify topics accurately and cluster documents according to these topics. In DFIC, "topic elements" are defined and extracted for indexing base clusters. Additionally, document features are investigated and exploited. Experimental results show that DFIC can gain a higher precision (92.76%) than some widely used traditional clustering methods.
引用
收藏
页码:547 / 552
页数:6
相关论文
共 6 条
[1]  
[Anonymous], 1997, ACM SIGACT NEWS
[2]  
DEERWESTER S, 1990, J AM SOC INFORM SCI, V41, P391, DOI 10.1002/(SICI)1097-4571(199009)41:6<391::AID-ASI1>3.0.CO
[3]  
2-9
[4]  
Fasulo D., 2004, UWCSE010302
[5]   Document ranking and the vector-space model [J].
Lee, DL ;
Chuang, H ;
Seamons, K .
IEEE SOFTWARE, 1997, 14 (02) :67-75
[6]  
Zamir O., 1998, Proceedings of the 21st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, P46, DOI 10.1145/290941.290956