Research of feature selection for text clustering based on cloud model

被引:4
|
作者
Zhao, Junmin [1 ]
Zhang, Kai [1 ]
Wan, Jian [2 ]
机构
[1] Henan University of Urban Construction, Institute of Computer Science and Engineering, Pingdingshan
[2] ZhengZhou ShiYi Technology Co. Ltd, Zhengzhou
关键词
Cloud model; Feature selection; K-means algorithm; TF-IDF;
D O I
10.4304/jsw.8.12.3246-3252
中图分类号
学科分类号
摘要
Text clustering belongs to the unsupervised machine learning, the discriminability of class attributes cannot be measured in clustering. And the traditional text feature selection methods cannot effectively solve the high-dimensional problem. To overcome the weakness in existing feature selection, this paper proposes a new method which introduces the cloud model theory into feature selection, constructs the clouds filter for clustering documents. The distribution of document words is constructed in a microcosmic level. By employing the cloud model digital characteristics we can better compute the separability between feature words. Experimental results with K-means algorithm show that our method can remarkably improve the accuracy of text clustering. © 2013 Academy Publisher.
引用
收藏
页码:3246 / 3252
页数:6
相关论文
共 50 条
  • [1] Text Categorization Based on Clustering Feature Selection
    Zhou, Xiaofei
    Hu, Yue
    Guo, Li
    2ND INTERNATIONAL CONFERENCE ON INFORMATION TECHNOLOGY AND QUANTITATIVE MANAGEMENT, ITQM 2014, 2014, 31 : 398 - 405
  • [2] Multinomial mixture model with feature selection for text clustering
    Li, Minqiang
    Zhang, Liang
    KNOWLEDGE-BASED SYSTEMS, 2008, 21 (07) : 704 - 708
  • [3] A Turkish Text Classification Based Feature Selection and Density Peaks Clustering
    Zorarpaci, Ezgi
    2023 31ST SIGNAL PROCESSING AND COMMUNICATIONS APPLICATIONS CONFERENCE, SIU, 2023,
  • [4] A New Feature Selection Method for Text Clustering
    XU Junling1
    2. State Key Laboratory of Software Engineering
    3. Department of Computer Science and Engineering
    Wuhan University Journal of Natural Sciences, 2007, (05) : 912 - 916
  • [5] A Clustering Based Feature Selection Method Using Feature Information Distance for Text Data
    Chao, Shilong
    Cai, Jie
    Yang, Sheng
    Wang, Shulin
    INTELLIGENT COMPUTING THEORIES AND APPLICATION, ICIC 2016, PT I, 2016, 9771 : 122 - 132
  • [6] Link based BPSO for feature selection in big data text clustering
    Kushwaha, Neetu
    Pant, Millie
    FUTURE GENERATION COMPUTER SYSTEMS-THE INTERNATIONAL JOURNAL OF ESCIENCE, 2018, 82 : 190 - 199
  • [7] Research on Feature Selection Methods Based on Feature Clustering and Information Theory
    Wang, Wenhui
    Zhou, Changyin
    ADVANCED INTELLIGENT COMPUTING TECHNOLOGY AND APPLICATIONS, PT XIII, ICIC 2024, 2024, 14874 : 71 - 82
  • [8] Text clustering with feature selection by using statistical data
    Li, Yanjun
    Luo, Congnan
    Chung, Soon M.
    IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2008, 20 (05) : 641 - 652
  • [9] Research on Text Feature Clustering Based on Improved Parallel Genetic Algorithm
    Jiang, Mingyang
    Fan, Xiaojing
    Pei, Zhili
    Zhang, Zhifeng
    PROCEEDINGS OF 2018 TENTH INTERNATIONAL CONFERENCE ON ADVANCED COMPUTATIONAL INTELLIGENCE (ICACI), 2018, : 235 - 238
  • [10] Research on Text Feature Selection Algorithm Based on Information Gain and Feature Relation Tree
    Zhang, Hong
    Ren, Yong-gong
    Yang, Xue
    2013 10TH WEB INFORMATION SYSTEM AND APPLICATION CONFERENCE (WISA 2013), 2013, : 446 - 449