Enhanced Distributed Document Clustering Algorithm Using Different Similarity Measures

被引:0
作者
Narayanan, Neethi [1 ]
Judith, J. E. [1 ]
Jayakumari, J. [1 ]
机构
[1] Noorul Islam Ctr Higher Educ Kumaracoil, Kumaracoil, Tamil Nadu, India
来源
2013 IEEE CONFERENCE ON INFORMATION AND COMMUNICATION TECHNOLOGIES (ICT 2013) | 2013年
关键词
Distributed document clustering similarity measures; Cosine similarity; Jaccard coefficient; Pearson coefficient;
D O I
暂无
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
Many of the distributed environments like internets, intranets, local area networks and wireless networks have different distributed data sources. lnorder to analyze and monitor these distributed data sources specialized data mining technologies for distributed applications are required. A variety of distributed document clustering algorithms exists for this purpose. This paper presents an Enhanced Distributed Algorithm (FDA) for document clustering. This paper presents the performance analysis of the algorithm using different similarity measures like cosine similarity, Jaccard and Pearson coefficient. The test was performed on standard document corpora like 2ONG (News Group), Reuters, Web The performance of this proposed FDA algorithm is also evaluated using different performance factors in order to determine its accuracy and clustering quality.
引用
收藏
页码:545 / 550
页数:6
相关论文
共 20 条
[1]  
Aberer K, 2003, SIGMOD RECORD, V32, P29, DOI 10.1145/945721.945729
[2]  
[Anonymous], 2000, SIGKDD Explorations
[3]  
[Anonymous], 2008, Introduction to information retrieval
[4]  
[Anonymous], P SIGCOMM
[5]   Clustering distributed data streams in peer-to-peer environments [J].
Bandyopadhyay, Sanghamitra ;
Giannella, Chris ;
Maulik, Ujjwal ;
Kargupta, Hillol ;
Liu, Kun ;
Datta, Souptik .
INFORMATION SCIENCES, 2006, 176 (14) :1952-1985
[6]  
Datta S., 2006, P SIAM INT C DAT MIN
[7]   Distributed data mining in peer-to-peer networks [J].
Datta, Souptik ;
Bhaduri, Kanishka ;
Giannella, Chris ;
Kargupta, Hillol ;
Wolff, Ran .
IEEE INTERNET COMPUTING, 2006, 10 (04) :18-26
[8]   Approximate Distributed K-Means Clustering over a Peer-to-Peer Network [J].
Datta, Souptik ;
Giannella, Chris R. ;
Kargupta, Hillol .
IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2009, 21 (10) :1372-1388
[9]  
Eisenhardt M., 2003, INFORMATIK
[10]   Hierarchically Distributed Peer-to-Peer Document Clustering and Cluster Summarization [J].
Hammouda, Khaled M. ;
Kamel, Mohamed S. .
IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2009, 21 (05) :681-698