Research of feature selection for text clustering based on cloud model

被引:4
作者
Zhao, Junmin [1 ]
Zhang, Kai [1 ]
Wan, Jian [2 ]
机构
[1] Henan University of Urban Construction, Institute of Computer Science and Engineering, Pingdingshan
[2] ZhengZhou ShiYi Technology Co. Ltd, Zhengzhou
关键词
Cloud model; Feature selection; K-means algorithm; TF-IDF;
D O I
10.4304/jsw.8.12.3246-3252
中图分类号
学科分类号
摘要
Text clustering belongs to the unsupervised machine learning, the discriminability of class attributes cannot be measured in clustering. And the traditional text feature selection methods cannot effectively solve the high-dimensional problem. To overcome the weakness in existing feature selection, this paper proposes a new method which introduces the cloud model theory into feature selection, constructs the clouds filter for clustering documents. The distribution of document words is constructed in a microcosmic level. By employing the cloud model digital characteristics we can better compute the separability between feature words. Experimental results with K-means algorithm show that our method can remarkably improve the accuracy of text clustering. © 2013 Academy Publisher.
引用
收藏
页码:3246 / 3252
页数:6
相关论文
共 50 条
[31]   Text feature selection with a robust weight scheme and dynamic dimension reduction to text document clustering [J].
Abualigah, Laith Mohammad ;
Khader, Ahamad Tajudin ;
Al-Betar, Mohammed Azmi ;
Alomari, Osama Ahmad .
EXPERT SYSTEMS WITH APPLICATIONS, 2017, 84 :24-36
[32]   A fuzzy clustering based algorithm for feature selection [J].
Sun, HJ ;
Wang, SR ;
Mei, Z .
2002 INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND CYBERNETICS, VOLS 1-4, PROCEEDINGS, 2002, :1993-1998
[33]   Feature Selection for Density-Based Clustering [J].
Ling, Yun ;
Ye, Chongyi .
2009 INTERNATIONAL SYMPOSIUM ON INTELLIGENT UBIQUITOUS COMPUTING AND EDUCATION, 2009, :226-229
[34]   Chinese and English text classification techniques incorporating CHI feature selection for ELT cloud classroom [J].
Wei, Yufen .
OPEN COMPUTER SCIENCE, 2024, 14 (01)
[35]   Fuzzy Clustering-based GMDH Model to Feature Selection in Customer Analysis [J].
Zhao, Hengjun ;
He, Changzheng ;
Ye, Zhen .
ISBIM: 2008 INTERNATIONAL SEMINAR ON BUSINESS AND INFORMATION MANAGEMENT, VOL 1, 2009, :461-464
[36]   INCREMENTAL TEXT CLUSTERING ALGORITHM FOR CLOUD-BASED DATA MANAGEMENT IN SCIENTIFIC RESEARCH PAPERS [J].
Nilufar, Mahfuja ;
Abhari, Abdolreza .
PROCEEDINGS OF THE 2022 ANNUAL MODELING AND SIMULATION CONFERENCE (ANNSIM'22), 2022, :778-789
[37]   A countably infinite mixture model for clustering and feature selection [J].
Nizar Bouguila ;
Djemel Ziou .
Knowledge and Information Systems, 2012, 33 :351-370
[38]   A countably infinite mixture model for clustering and feature selection [J].
Bouguila, Nizar ;
Ziou, Djemel .
KNOWLEDGE AND INFORMATION SYSTEMS, 2012, 33 (02) :351-370
[39]   A novel feature selection method for text classification using association rules and clustering [J].
Sheydaei, Navid ;
Saraee, Mohamad ;
Shahgholian, Azar .
JOURNAL OF INFORMATION SCIENCE, 2015, 41 (01) :3-15
[40]   Research on Feature Selection Method in Chinese Text Automatic Classification [J].
Hong, Ying ;
Shao, Xiwen .
PROCEEDINGS OF THE 2015 INTERNATIONAL CONFERENCE ON APPLIED SCIENCE AND ENGINEERING INNOVATION, 2015, 12 :1759-1763