Weighted k-Means Algorithm Based Text Clustering

被引:4
|
作者
Chen, Xiuguo [1 ]
Yin, Wensheng [1 ]
Tu, Pinghui [2 ]
Zhang, Hengxi [2 ]
机构
[1] Huazhong Univ Sci & Technol, Sch Mech Sci & Engn, Wuhan 430074, Peoples R China
[2] Wuhan Commun Basic Construct Project Qual Supervi, Wuhan 430015, Peoples R China
来源
IEEC 2009: FIRST INTERNATIONAL SYMPOSIUM ON INFORMATION ENGINEERING AND ELECTRONIC COMMERCE, PROCEEDINGS | 2009年
关键词
text clustering; k-means clustering; weighting; text mining;
D O I
10.1109/IEEC.2009.17
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
this paper proposes a weighted k-means clustering algorithm based on k-means (MacQueen, 1967; Anderberg, 1973) algorithm, and it can be used to cluster texts. Firstly, the weighted k-means algorithm changes the descriptive approach of text objects, and converts the categorical attributes to numeric ones to measure the dissimilarity of text objects by Euclidean distance; then, the weighted k-means algorithm uses weight vector to decrease the affects of irrelevant attributes and reflect the semantic information of text objects. Through an experiment, the weighted k-means algorithm is demonstrated to be more effective than k-means algorithm when used to cluster texts.
引用
收藏
页码:51 / +
页数:2
相关论文
共 50 条
  • [1] Chinese text clustering algorithm based k-means
    Yao, Mingyu
    Pi, Dechang
    Cong, Xiangxiang
    2012 INTERNATIONAL CONFERENCE ON MEDICAL PHYSICS AND BIOMEDICAL ENGINEERING (ICMPBE2012), 2012, 33 : 301 - 307
  • [2] Chinese Text Clustering Algorithm Based K-Means
    Yao, Mingyu
    Pi, Dechang
    Cong, Xiangxiang
    2011 AASRI CONFERENCE ON ARTIFICIAL INTELLIGENCE AND INDUSTRY APPLICATION (AASRI-AIIA 2011), VOL 1, 2011, : 90 - 93
  • [3] Design and application of a text clustering algorithm based on parallelized k-means clustering
    Wang H.
    Zhou C.
    Li L.
    Revue d'Intelligence Artificielle, 2019, 33 (06) : 453 - 460
  • [4] Weighted K-means Clustering Analysis Based on Improved Genetic Algorithm
    Zhang, Tongjie
    Cao, Yan
    Mu, Xiangwei
    SENSORS, MECHATRONICS AND AUTOMATION, 2014, 511-512 : 904 - 908
  • [5] Distributed Algorithm for Text Documents Clustering Based on k-Means Approach
    Sarnovsky, Martin
    Carnoka, Noema
    INFORMATION SYSTEMS ARCHITECTURE AND TECHNOLOGY, ISAT 2015, PT II, 2016, 430 : 165 - 174
  • [6] An improved K-Means text clustering algorithm based on Local Search
    Liu, Xiangwei
    2008 4TH INTERNATIONAL CONFERENCE ON WIRELESS COMMUNICATIONS, NETWORKING AND MOBILE COMPUTING, VOLS 1-31, 2008, : 11578 - 11581
  • [7] Similarity matrix-based K-means algorithm for text clustering
    曹奇敏
    郭巧
    吴向华
    JournalofBeijingInstituteofTechnology, 2015, 24 (04) : 566 - 572
  • [8] A K-means Text Clustering Algorithm Based on Subject Feature Vector
    Duo, Ji
    Zhang, Peng
    Hao, Liu
    JOURNAL OF WEB ENGINEERING, 2021, 20 (06): : 1935 - 1946
  • [9] Improved K-Means algorithm in text semantic clustering
    Ma, Junhong
    Open Cybernetics and Systemics Journal, 2014, 8 : 530 - 534
  • [10] A k-means based clustering algorithm
    Bloisi, Domenico Daniele
    Locchi, Luca
    COMPUTER VISION SYSTEMS, PROCEEDINGS, 2008, 5008 : 109 - 118