Questions clustering using canopy-K-means and hierarchical-K-means clustering

被引：5

作者：

Alian M. ^{[1
]}

Al-Naymat G. ^{[2
]}

机构：

[1] Basic Sciences Department, Faculty of Science, The Hashemite University, P.O. Box 330127, Zarqa

[2] Artificial Intelligence Research Center (AIRC), College of Engineering and Information Technology, Ajman University, Ajman

来源：

International Journal of Information Technology | 2022年 / 14卷 / 7期

关键词：

Canopy clustering; Hierarchical clustering; K-means clustering; Questions clustering;

D O I：

10.1007/s41870-022-01012-w

中图分类号：

学科分类号：

摘要：

In questions datasets, several questions could produce duplicates since they are similar questions due to the ability to write a question in different forms based on the flexibility of Natural Language. However, extracting relevant questions is time-consuming if it is performed manually. Therefore, the computational power of computers is necessary to group similar questions into clusters based on their semantic similarity but still the information included within a question may be insufficient to efficiently cluster the questions making it a challenging task. In this research, canopy clustering is employed as a previous step for K-means clustering, then it is compared to the Hierarchical Clustering approach. Quora questions dataset is used in the experiments to identify question pairs that are similar. In terms of F1 score and rand statistic measure, the results demonstrate that the Hierarchical-K-means approach provides better validity clustering measures than the Canopy-K-means approach. In addition to identifying matches, the Canopy approach serves with the top related questions that have the same intent in the same cluster in several canopies. © 2022, The Author(s), under exclusive licence to Bharati Vidyapeeth's Institute of Computer Applications and Management.

引用

页码：3793 / 3802

页数：9

共 50 条

[21] Hierarchical K-Means Clustering Algorithm Based on Silhouette and Entropy
Dong, Wuzhou
Ren, JiaDong
Zhang, Dongmei
ARTIFICIAL INTELLIGENCE AND COMPUTATIONAL INTELLIGENCE, PT I, 2011, 7002 : 339 - +
[22] RK-means clustering: K-means with reliability
Hua, Chunsheng
Chen, Qian
Wu, Haiyuan
Wada, Toshikazu
IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2008, E91D (01) : 96 - 104
[23] Hybrid Clustering: Combining K-Means and Interval valued data-type Hierarchical Clustering
Galdino, Sergio Mario Lins
da Silva, Jornandes Dias
ACTA POLYTECHNICA HUNGARICA, 2024, 21 (09) : 175 - 186
[24] Classification of Moving Vehicles using K-Means Clustering
Changalasetty, Suresh Babu
Thota, Lalitha Saroja
Badawy, Ahmed Said
Ghribi, Wade
2015 IEEE INTERNATIONAL CONFERENCE ON ELECTRICAL, COMPUTER AND COMMUNICATION TECHNOLOGIES, 2015,
[25] An Analysis of DRR Suggestions Using K-means Clustering
Go Bui, Shelly Marie
Gorro, Ken
Angelo Aquino, Gio
Jane Sabellano, Mary
PROCEEDINGS OF THE 2017 INTERNATIONAL CONFERENCE ON INFORMATION TECHNOLOGY (ICIT 2017), 2017, : 76 - 80
[26] Support vector machine using K-means clustering
Lee, S. J.
Park, C.
Jhun, M.
Ko, J-Y.
JOURNAL OF THE KOREAN STATISTICAL SOCIETY, 2007, 36 (01) : 175 - 182
[27] On Clustering Histograms with k-Means by Using Mixed α-Divergences
Nielsen, Frank
Nock, Richard
Amari, Shun-ichi
ENTROPY, 2014, 16 (06) : 3273 - 3301
[28] Grouping of Retail Items by Using K-Means Clustering
Kusrini, Kusrini
THIRD INFORMATION SYSTEMS INTERNATIONAL CONFERENCE 2015, 2015, 72 : 495 - 502
[29] Colour Constancy using K-means Clustering Algorithm
Hussain, Md. Akmol
Akbari, Akbar Sheikh
Ghaffari, Ahmad
2016 9TH INTERNATIONAL CONFERENCE ON DEVELOPMENTS IN ESYSTEMS ENGINEERING (DESE 2016), 2016, : 283 - 288
[30] Variable lag variography using k-means clustering
Kapageridis, I. K.
COMPUTERS & GEOSCIENCES, 2015, 85 : 49 - 63

← 1 2 3 4 5 →