Topic Word Set-Based Text Clustering

被引:0
|
作者
Ghazifard, Amir Mehdi [1 ]
Shams, Mohammadreza [2 ]
Shamaee, Zeinab [3 ]
机构
[1] Univ Isfahan, E Learning Dept, Esfahan, Iran
[2] Univ Tehran, ECE Dept, Tehran 14174, Iran
[3] Isfahan Univ Technol, ECE Dept, Esfahan, Iran
关键词
e-commerce; clustering; classification; term correlation graph; topic word set;
D O I
暂无
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
Clustering is the task of grouping related and similar data without any prior knowledge about the labels. In some real world applications, we face huge amounts of unstructured textual data with no organization. In these situations, clustering is a primitive operation that needs to be done to help future e-commerce tasks. Clustering can be used to enhance different e-commerce applications like recommender systems, customer relationship management systems or personal assistant agents. In this paper we propose a new method for text clustering, by constructing a term correlation graph, and then extracting topic word sets from it and finally, categorizing each document to its related topic with the help of a classification algorithm like SVM. This method provides a natural and understandable description for clusters by their topic word sets, and it also enables us to decide the cluster of documents only when needed and in a parallel fashion, thus significantly reducing the offline processing time. Our clustering method also outperforms the well-known k-means clustering algorithm according to clustering quality measures.
引用
收藏
页数:11
相关论文
共 50 条
  • [21] A clustering-based topic model using word networks and word embeddings
    Wenchuan Mu
    Kwan Hui Lim
    Junhua Liu
    Shanika Karunasekera
    Lucia Falzon
    Aaron Harwood
    Journal of Big Data, 9
  • [22] A Short Text Topic Model Based on Semantics and Word Expansion
    Li Zhen
    Shao Yabin
    Yang Ning
    2022 IEEE 2ND INTERNATIONAL CONFERENCE ON COMPUTER COMMUNICATION AND ARTIFICIAL INTELLIGENCE (CCAI 2022), 2022, : 60 - 64
  • [23] An information set-based robust text-independent speaker authentication
    Jeevan Medikonda
    Saurabh Bhardwaj
    Hanmandlu Madasu
    Soft Computing, 2020, 24 : 5271 - 5287
  • [24] A shadowed set-based three-way clustering ensemble approach
    ChunMao Jiang
    ZhiCong Li
    JingTao Yao
    International Journal of Machine Learning and Cybernetics, 2022, 13 : 2545 - 2558
  • [25] Clustering and rough set-based knowledge discovery for product family planning
    Zhou, C. J.
    Lin, Z. H.
    E-ENGINEERING & DIGITAL ENTERPRISE TECHNOLOGY, 2008, 10-12 : 45 - 50
  • [26] An information set-based robust text-independent speaker authentication
    Medikonda, Jeevan
    Bhardwaj, Saurabh
    Madasu, Hanmandlu
    SOFT COMPUTING, 2020, 24 (07) : 5271 - 5287
  • [27] Application of random set-based clustering to landmine detection with hyperspectral imagery
    Bolton, Jeremy
    Gader, Paul
    IGARSS: 2007 IEEE INTERNATIONAL GEOSCIENCE AND REMOTE SENSING SYMPOSIUM, VOLS 1-12: SENSING AND UNDERSTANDING OUR PLANET, 2007, : 2022 - 2025
  • [28] A shadowed set-based three-way clustering ensemble approach
    Jiang, ChunMao
    Li, ZhiCong
    Yao, JingTao
    INTERNATIONAL JOURNAL OF MACHINE LEARNING AND CYBERNETICS, 2022, 13 (09) : 2545 - 2558
  • [29] A Topic-based Dynamic Clustering Algorithm for Text Stream
    Rao, Y.
    Li, X. J.
    PROCEEDINGS OF THE 2015 INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND INDUSTRIAL ENGINEERING (AIIE 2015), 2015, 123 : 480 - 483
  • [30] Corpus-based topic diffusion for short text clustering
    Zheng, Chu Tao
    Liu, Cheng
    Wong, Hau San
    NEUROCOMPUTING, 2018, 275 : 2444 - 2458