Questions clustering using canopy-K-means and hierarchical-K-means clustering

被引：5

作者：

Alian M. ^{[1
]}

Al-Naymat G. ^{[2
]}

机构：

[1] Basic Sciences Department, Faculty of Science, The Hashemite University, P.O. Box 330127, Zarqa

[2] Artificial Intelligence Research Center (AIRC), College of Engineering and Information Technology, Ajman University, Ajman

来源：

International Journal of Information Technology | 2022年 / 14卷 / 7期

关键词：

Canopy clustering; Hierarchical clustering; K-means clustering; Questions clustering;

D O I：

10.1007/s41870-022-01012-w

中图分类号：

学科分类号：

摘要：

In questions datasets, several questions could produce duplicates since they are similar questions due to the ability to write a question in different forms based on the flexibility of Natural Language. However, extracting relevant questions is time-consuming if it is performed manually. Therefore, the computational power of computers is necessary to group similar questions into clusters based on their semantic similarity but still the information included within a question may be insufficient to efficiently cluster the questions making it a challenging task. In this research, canopy clustering is employed as a previous step for K-means clustering, then it is compared to the Hierarchical Clustering approach. Quora questions dataset is used in the experiments to identify question pairs that are similar. In terms of F1 score and rand statistic measure, the results demonstrate that the Hierarchical-K-means approach provides better validity clustering measures than the Canopy-K-means approach. In addition to identifying matches, the Canopy approach serves with the top related questions that have the same intent in the same cluster in several canopies. © 2022, The Author(s), under exclusive licence to Bharati Vidyapeeth's Institute of Computer Applications and Management.

引用

页码：3793 / 3802

页数：9

共 50 条

[1] Bayesian hierarchical K-means clustering
Liu, Yue
Li, Bufang
INTELLIGENT DATA ANALYSIS, 2020, 24 (05) : 977 - 992
[2] An Empirical comparison of Clustering using Hierarchical methods and K-means
Praveen, P.
Rama, B.
PROCEEDINGS OF THE 2016 IEEE 2ND INTERNATIONAL CONFERENCE ON ADVANCES IN ELECTRICAL & ELECTRONICS, INFORMATION, COMMUNICATION & BIO INFORMATICS (IEEE AEEICB-2016), 2016, : 445 - 449
[3] Canopy-K-means Combined Collaborative Filtering Using RMSE-minimization
Kuan, Sao-, I
Kim, Jongmin
Kwon, Oh-Heum
Song, Ha-Joo
2022 IEEE INTERNATIONAL CONFERENCE ON BIG DATA AND SMART COMPUTING (IEEE BIGCOMP 2022), 2022, : 31 - 34
[4] Hierarchical hesitant fuzzy K-means clustering algorithm
Chen Na
Xu Ze-shui
Xia Mei-mei
APPLIED MATHEMATICS-A JOURNAL OF CHINESE UNIVERSITIES SERIES B, 2014, 29 (01) : 1 - 17
[5] Hierarchical hesitant fuzzy K-means clustering algorithm
CHEN Na
XU Ze-shui
XIA Mei-mei
Applied Mathematics:A Journal of Chinese Universities, 2014, (01) : 1 - 17
[6] Hierarchical hesitant fuzzy K-means clustering algorithm
Na Chen
Ze-shui Xu
Mei-mei Xia
Applied Mathematics-A Journal of Chinese Universities, 2014, 29 : 1 - 17
[7] Efficient Image Retrieval Using Hierarchical K-Means Clustering
Park, Dayoung
Hwang, Youngbae
SENSORS, 2024, 24 (08)
[8] Clustering of Image Data Using K-Means and Fuzzy K-Means
Rahmani, Md. Khalid Imam
Pal, Naina
Arora, Kamiya
INTERNATIONAL JOURNAL OF ADVANCED COMPUTER SCIENCE AND APPLICATIONS, 2014, 5 (07) : 160 - 163
[9] Clustering and Classification of Cotton Lint Using Principle Component Analysis, Agglomerative Hierarchical Clustering, and K-Means Clustering
Kamalha, Edwin
Kiberu, Jovan
Nibikora, Ildephonse
Mwasiagi, Josphat Igadwa
Omollo, Edison
JOURNAL OF NATURAL FIBERS, 2018, 15 (03) : 425 - 435
[10] Improving the Walktrap Algorithm Using K-Means Clustering
Brusco, Michael
Steinley, Douglas
Watts, Ashley L.
MULTIVARIATE BEHAVIORAL RESEARCH, 2024, 59 (02) : 266 - 288

← 1 2 3 4 5 →