Data clustering using efficient similarity measures

被引:19
作者
Bisandu, Desmond Bala [1 ]
Prasad, Rajesh [2 ]
Liman, Musa Muhammad [3 ]
机构
[1] Univ Jos, Dept Comp Sci, PMB 2084, Jos 930001, Plateau State, Nigeria
[2] African Univ Sci & Technol, Dept Comp Sci, PMB 681 Garki, Abuja Fct, Nigeria
[3] Univ Putra Malaysia, Dept Comp Sci, Serdang 43400, Selangor, Malaysia
关键词
Similarity measure; Document clustering; Text document; Euclidean distance and Edit distance; SELECTION;
D O I
10.1080/09720510.2019.1565443
中图分类号
O21 [概率论与数理统计]; C8 [统计学];
学科分类号
020208 ; 070103 ; 0714 ;
摘要
The need for appropriate applications of the various similarity measures for clustering has arisen over the years as data massively keep on increasing. The issue of deciding which similarity measure is the best and on what kind of dataset have been a very cumbersome task in the field of data mining, data science, other related fields, and organizations that highly depends on the knowledge outcome from a huge set of data to make some vital / crucial decisions. This is because various datasets portray some common features associated with them; the need for clearer understanding of the various similarity measures for clustering different datasets is needed. This paper presents a critical review of various similarity measures applied in text and data clustering. A theoretical comparison has been made to check the suitability of the measures on different kind of data sets.
引用
收藏
页码:901 / 922
页数:22
相关论文
共 27 条
[1]  
Abou-Assaleh T., 2004, P 2 ANN C PRIVACY SE, P193
[2]   Text feature selection with a robust weight scheme and dynamic dimension reduction to text document clustering [J].
Abualigah, Laith Mohammad ;
Khader, Ahamad Tajudin ;
Al-Betar, Mohammed Azmi ;
Alomari, Osama Ahmad .
EXPERT SYSTEMS WITH APPLICATIONS, 2017, 84 :24-36
[3]  
Akinwale A., 2012, EFFECTIVE SIMILARITY
[4]  
Akinwale A., 2015, J APPL COMPUT SCI, V23, P7
[5]  
[Anonymous], 2010, CORP EV MEAS AUT PLA
[6]  
Barrón-Cedeño A, 2009, LECT NOTES COMPUT SC, V5478, P696, DOI 10.1007/978-3-642-00958-7_69
[7]  
Bharath Bhushan S. N., 2017, PATTERN RECOGNIT LET
[8]   Mining Large-scale Event Knowledge from Web Text [J].
Cao, Ya-nan ;
Zhang, Peng ;
Guo, Jing ;
Guo, Li .
2014 INTERNATIONAL CONFERENCE ON COMPUTATIONAL SCIENCE, 2014, 29 :478-487
[9]  
Cavnar W., 1994, Proceedings of Symposium on Document Analysis and Information Retrieval, P161
[10]   Further discussion for extended similarity measures [J].
Chao, Henry ;
Chu, Peter .
JOURNAL OF DISCRETE MATHEMATICAL SCIENCES & CRYPTOGRAPHY, 2015, 18 (04) :403-408