Multi-view document clustering based on geometrical similarity measurement

被引:29
作者
Diallo, Bassoma [1 ]
Hu, Jie [1 ]
Li, Tianrui [1 ]
Khan, Ghufran Ahmad [1 ]
Hussein, Ahmed Saad [1 ,2 ]
机构
[1] Southwest Jiaotong Univ, Inst Artificial Intelligence, Chengdu 611756, Peoples R China
[2] Univ Informat Technol & Commun, Baghdad 00964, Iraq
基金
美国国家科学基金会;
关键词
Multi-view clustering; Ensemble clustering; Similarity measurement; Document clustering; ALGORITHM; SPARSE;
D O I
10.1007/s13042-021-01295-8
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Numerous works implemented multi-view clustering algorithms in document clustering. A challenging problem in document clustering is the similarity metric. Existing multi-view document clustering methods broadly utilized two measurements: the Cosine similarity (CS) and the Euclidean distance (ED). The first did not consider the magnitude difference (MD) between the two vectors. The second can't register the divergence of two vectors that offer a similar ED. In this paper, we originally created five models of similarity metric. This methodology foils the downside of the CS and ED similarity metrics by figuring the divergence between documents with the same ED while thinking about their sizes. Furthermore, we proposed our multi-view document clustering plan which dependent on the proposed similarity metric. Firstly, CS, ED, triangle's area similarity and sector's area similarity metric, and our five similarity metrics have been applied to every view of a dataset to generate a corresponding similarity matrix. Afterward, we ran clustering algorithms on these similarity matrices to evaluate the performance of single view. Later, we aggregated these similarity matrices to obtain a unified similarity matrix and apply spectral clustering algorithm on it to generate the final clusters. The experimental results show that the proposed similarity functions can gauge the similitude between documents more accurately than the existing metrics, and the proposed clustering scheme surpasses considerably up-to-date algorithms.
引用
收藏
页码:663 / 675
页数:13
相关论文
共 69 条
[1]   Link-based multi-verse optimizer for text documents clustering [J].
Abasi, Ammar Kamal ;
Khader, Ahamad Tajudin ;
Al-Betar, Mohammed Azmi ;
Naim, Syibrah ;
Makhadmeh, Sharif Naser ;
Alyasseri, Zaid Abdi Alkareem .
APPLIED SOFT COMPUTING, 2020, 87
[2]   A combination of objective functions and hybrid Krill herd algorithm for text document clustering analysis [J].
Abualigah, Laith Mohammad ;
Khader, Ahamad Tajudin ;
Hanandeh, Essam Said .
ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE, 2018, 73 :111-125
[3]   Realizing Euclidean distance matrices by sphere intersection [J].
Alencar, Jorge ;
Lavor, Carlile ;
Liberti, Leo .
DISCRETE APPLIED MATHEMATICS, 2019, 256 :5-10
[4]  
[Anonymous], 2017, P 4 INT C ADV COMP C
[5]   On Cartesian product of Euclidean distance matrices [J].
Bapat, Ravindra B. ;
Kurata, Hiroshi .
LINEAR ALGEBRA AND ITS APPLICATIONS, 2019, 562 :135-153
[6]  
Bhardwaj, 2016, INT J COMPUT APPL, V135, P22
[7]  
Birjali M, 2016, INT CONF MULTIMED, P24, DOI 10.1109/ICMCS.2016.7905587
[8]   Co-clustering of Multi-View Datasets: a Parallelizable Approach [J].
Bisson, Gilles ;
Grimal, Clement .
12TH IEEE INTERNATIONAL CONFERENCE ON DATA MINING (ICDM 2012), 2012, :828-833
[9]   Cluster ensembles: A survey of approaches with recent extensions and applications [J].
Boongoen, Tossapon ;
Iam-On, Natthakan .
COMPUTER SCIENCE REVIEW, 2018, 28 :1-25
[10]   Multi-view low-rank sparse subspace clustering [J].
Brbic, Maria ;
Kopriva, Ivica .
PATTERN RECOGNITION, 2018, 73 :247-258