Questions clustering using canopy-K-means and hierarchical-K-means clustering

被引:5
作者
Alian M. [1 ]
Al-Naymat G. [2 ]
机构
[1] Basic Sciences Department, Faculty of Science, The Hashemite University, P.O. Box 330127, Zarqa
[2] Artificial Intelligence Research Center (AIRC), College of Engineering and Information Technology, Ajman University, Ajman
关键词
Canopy clustering; Hierarchical clustering; K-means clustering; Questions clustering;
D O I
10.1007/s41870-022-01012-w
中图分类号
学科分类号
摘要
In questions datasets, several questions could produce duplicates since they are similar questions due to the ability to write a question in different forms based on the flexibility of Natural Language. However, extracting relevant questions is time-consuming if it is performed manually. Therefore, the computational power of computers is necessary to group similar questions into clusters based on their semantic similarity but still the information included within a question may be insufficient to efficiently cluster the questions making it a challenging task. In this research, canopy clustering is employed as a previous step for K-means clustering, then it is compared to the Hierarchical Clustering approach. Quora questions dataset is used in the experiments to identify question pairs that are similar. In terms of F1 score and rand statistic measure, the results demonstrate that the Hierarchical-K-means approach provides better validity clustering measures than the Canopy-K-means approach. In addition to identifying matches, the Canopy approach serves with the top related questions that have the same intent in the same cluster in several canopies. © 2022, The Author(s), under exclusive licence to Bharati Vidyapeeth's Institute of Computer Applications and Management.
引用
收藏
页码:3793 / 3802
页数:9
相关论文
共 50 条
  • [21] Hierarchical K-Means Clustering Algorithm Based on Silhouette and Entropy
    Dong, Wuzhou
    Ren, JiaDong
    Zhang, Dongmei
    ARTIFICIAL INTELLIGENCE AND COMPUTATIONAL INTELLIGENCE, PT I, 2011, 7002 : 339 - +
  • [22] RK-means clustering: K-means with reliability
    Hua, Chunsheng
    Chen, Qian
    Wu, Haiyuan
    Wada, Toshikazu
    IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2008, E91D (01) : 96 - 104
  • [23] Hybrid Clustering: Combining K-Means and Interval valued data-type Hierarchical Clustering
    Galdino, Sergio Mario Lins
    da Silva, Jornandes Dias
    ACTA POLYTECHNICA HUNGARICA, 2024, 21 (09) : 175 - 186
  • [24] Classification of Moving Vehicles using K-Means Clustering
    Changalasetty, Suresh Babu
    Thota, Lalitha Saroja
    Badawy, Ahmed Said
    Ghribi, Wade
    2015 IEEE INTERNATIONAL CONFERENCE ON ELECTRICAL, COMPUTER AND COMMUNICATION TECHNOLOGIES, 2015,
  • [25] An Analysis of DRR Suggestions Using K-means Clustering
    Go Bui, Shelly Marie
    Gorro, Ken
    Angelo Aquino, Gio
    Jane Sabellano, Mary
    PROCEEDINGS OF THE 2017 INTERNATIONAL CONFERENCE ON INFORMATION TECHNOLOGY (ICIT 2017), 2017, : 76 - 80
  • [26] Support vector machine using K-means clustering
    Lee, S. J.
    Park, C.
    Jhun, M.
    Ko, J-Y.
    JOURNAL OF THE KOREAN STATISTICAL SOCIETY, 2007, 36 (01) : 175 - 182
  • [27] On Clustering Histograms with k-Means by Using Mixed α-Divergences
    Nielsen, Frank
    Nock, Richard
    Amari, Shun-ichi
    ENTROPY, 2014, 16 (06) : 3273 - 3301
  • [28] Grouping of Retail Items by Using K-Means Clustering
    Kusrini, Kusrini
    THIRD INFORMATION SYSTEMS INTERNATIONAL CONFERENCE 2015, 2015, 72 : 495 - 502
  • [29] Colour Constancy using K-means Clustering Algorithm
    Hussain, Md. Akmol
    Akbari, Akbar Sheikh
    Ghaffari, Ahmad
    2016 9TH INTERNATIONAL CONFERENCE ON DEVELOPMENTS IN ESYSTEMS ENGINEERING (DESE 2016), 2016, : 283 - 288
  • [30] Variable lag variography using k-means clustering
    Kapageridis, I. K.
    COMPUTERS & GEOSCIENCES, 2015, 85 : 49 - 63