Questions clustering using canopy-K-means and hierarchical-K-means clustering

被引:5
|
作者
Alian M. [1 ]
Al-Naymat G. [2 ]
机构
[1] Basic Sciences Department, Faculty of Science, The Hashemite University, P.O. Box 330127, Zarqa
[2] Artificial Intelligence Research Center (AIRC), College of Engineering and Information Technology, Ajman University, Ajman
关键词
Canopy clustering; Hierarchical clustering; K-means clustering; Questions clustering;
D O I
10.1007/s41870-022-01012-w
中图分类号
学科分类号
摘要
In questions datasets, several questions could produce duplicates since they are similar questions due to the ability to write a question in different forms based on the flexibility of Natural Language. However, extracting relevant questions is time-consuming if it is performed manually. Therefore, the computational power of computers is necessary to group similar questions into clusters based on their semantic similarity but still the information included within a question may be insufficient to efficiently cluster the questions making it a challenging task. In this research, canopy clustering is employed as a previous step for K-means clustering, then it is compared to the Hierarchical Clustering approach. Quora questions dataset is used in the experiments to identify question pairs that are similar. In terms of F1 score and rand statistic measure, the results demonstrate that the Hierarchical-K-means approach provides better validity clustering measures than the Canopy-K-means approach. In addition to identifying matches, the Canopy approach serves with the top related questions that have the same intent in the same cluster in several canopies. © 2022, The Author(s), under exclusive licence to Bharati Vidyapeeth's Institute of Computer Applications and Management.
引用
收藏
页码:3793 / 3802
页数:9
相关论文
共 50 条
  • [1] Bayesian hierarchical K-means clustering
    Liu, Yue
    Li, Bufang
    INTELLIGENT DATA ANALYSIS, 2020, 24 (05) : 977 - 992
  • [2] An Empirical comparison of Clustering using Hierarchical methods and K-means
    Praveen, P.
    Rama, B.
    PROCEEDINGS OF THE 2016 IEEE 2ND INTERNATIONAL CONFERENCE ON ADVANCES IN ELECTRICAL & ELECTRONICS, INFORMATION, COMMUNICATION & BIO INFORMATICS (IEEE AEEICB-2016), 2016, : 445 - 449
  • [3] Canopy-K-means Combined Collaborative Filtering Using RMSE-minimization
    Kuan, Sao-, I
    Kim, Jongmin
    Kwon, Oh-Heum
    Song, Ha-Joo
    2022 IEEE INTERNATIONAL CONFERENCE ON BIG DATA AND SMART COMPUTING (IEEE BIGCOMP 2022), 2022, : 31 - 34
  • [4] Hierarchical hesitant fuzzy K-means clustering algorithm
    Chen Na
    Xu Ze-shui
    Xia Mei-mei
    APPLIED MATHEMATICS-A JOURNAL OF CHINESE UNIVERSITIES SERIES B, 2014, 29 (01) : 1 - 17
  • [5] Hierarchical hesitant fuzzy K-means clustering algorithm
    CHEN Na
    XU Ze-shui
    XIA Mei-mei
    Applied Mathematics:A Journal of Chinese Universities, 2014, (01) : 1 - 17
  • [6] Hierarchical hesitant fuzzy K-means clustering algorithm
    Na Chen
    Ze-shui Xu
    Mei-mei Xia
    Applied Mathematics-A Journal of Chinese Universities, 2014, 29 : 1 - 17
  • [7] Efficient Image Retrieval Using Hierarchical K-Means Clustering
    Park, Dayoung
    Hwang, Youngbae
    SENSORS, 2024, 24 (08)
  • [8] Clustering of Image Data Using K-Means and Fuzzy K-Means
    Rahmani, Md. Khalid Imam
    Pal, Naina
    Arora, Kamiya
    INTERNATIONAL JOURNAL OF ADVANCED COMPUTER SCIENCE AND APPLICATIONS, 2014, 5 (07) : 160 - 163
  • [9] Clustering and Classification of Cotton Lint Using Principle Component Analysis, Agglomerative Hierarchical Clustering, and K-Means Clustering
    Kamalha, Edwin
    Kiberu, Jovan
    Nibikora, Ildephonse
    Mwasiagi, Josphat Igadwa
    Omollo, Edison
    JOURNAL OF NATURAL FIBERS, 2018, 15 (03) : 425 - 435
  • [10] Improving the Walktrap Algorithm Using K-Means Clustering
    Brusco, Michael
    Steinley, Douglas
    Watts, Ashley L.
    MULTIVARIATE BEHAVIORAL RESEARCH, 2024, 59 (02) : 266 - 288