Document Clustering Using Hybrid XOR Similarity Function for Efficient Software Component Reuse

被引:8
作者
Radhakrishna, Vangipuram [1 ]
Srinivas, C. [2 ]
Rao, C. V. Guru [3 ]
机构
[1] VNR Vignana Jyothi Inst Engn & Technol, Dept IT, Hyderabad, Andhra Pradesh, India
[2] Kakatiya Inst Technol, Warangal, Andhra Pradesh, India
[3] SR Engn Coll, Warangal, Andhra Pradesh, India
来源
FIRST INTERNATIONAL CONFERENCE ON INFORMATION TECHNOLOGY AND QUANTITATIVE MANAGEMENT | 2013年 / 17卷
关键词
hybrid xor; clustering; frequent itemsets; cluster; DISCOVERY;
D O I
10.1016/j.procs.2013.05.017
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
hi this paper a generalized approach is proposed for clustering a set of given documents or text files or software components for reuse based on the new similarity function called hybrid XOR function defined for the purpose of finding degree of similarity among two document sets or any two software components. We construct a matrix called similarity matrix of order n-1 by n for n document sets or software components by applying hybrid XOR function for each pair of document sets. We define and design the clustering algorithm which has its input as similarity matrix and output as a set of clusters formed dynamically as compared to other clustering algorithms that predefine the count of clusters and documents being tit to one of those clusters or classes finally. The approach carried out uses simple computations. (C) 2013 The Authors. Published by Elsevier B.V. Selection and peer-review under responsibility of the organizers of the 2013 International Conference on information Technology and Quantitative Management
引用
收藏
页码:121 / 128
页数:8
相关论文
共 16 条
[1]  
Agrawal R., 1993, SIGMOD Record, V22, P207, DOI 10.1145/170036.170072
[2]  
[Anonymous], 2002, P 8 ACM SIGKDD INT C, DOI DOI 10.1145/775047.775110
[3]  
Hajdinjak Melita, 2009, INFORMATICA, V33, P143
[4]   Discovery of maximum length frequent itemsets [J].
Hu, Tianming ;
Sung, Sam Yuan ;
Xiong, Hui ;
Fu, Qian .
INFORMATION SCIENCES, 2008, 178 (01) :69-87
[5]  
Jiang Jung-i, 2011, IEEE T KNOWLEDGE DAT, V23
[6]  
Khuzaima S.Daudjee, 1994, ORG REUSABLE SOFTWAR
[7]  
KOU G, 2012, MULTIPLE FACTOR HIER, V197, P123
[8]  
Kumar Sunil, 2007, P IEEE INT S COMP IN
[9]   Text document clustering based on neighbors [J].
Luo, Congnan ;
Li, Yanjun ;
Chung, Soon M. .
DATA & KNOWLEDGE ENGINEERING, 2009, 68 (11) :1271-1288
[10]  
Mitchell Brain S., 2001, P IEEE INT C SOFTW M, P744