Research of feature selection for text clustering based on cloud model

被引:4
作者
Zhao, Junmin [1 ]
Zhang, Kai [1 ]
Wan, Jian [2 ]
机构
[1] Henan University of Urban Construction, Institute of Computer Science and Engineering, Pingdingshan
[2] ZhengZhou ShiYi Technology Co. Ltd, Zhengzhou
关键词
Cloud model; Feature selection; K-means algorithm; TF-IDF;
D O I
10.4304/jsw.8.12.3246-3252
中图分类号
学科分类号
摘要
Text clustering belongs to the unsupervised machine learning, the discriminability of class attributes cannot be measured in clustering. And the traditional text feature selection methods cannot effectively solve the high-dimensional problem. To overcome the weakness in existing feature selection, this paper proposes a new method which introduces the cloud model theory into feature selection, constructs the clouds filter for clustering documents. The distribution of document words is constructed in a microcosmic level. By employing the cloud model digital characteristics we can better compute the separability between feature words. Experimental results with K-means algorithm show that our method can remarkably improve the accuracy of text clustering. © 2013 Academy Publisher.
引用
收藏
页码:3246 / 3252
页数:6
相关论文
共 50 条
[41]   Toward feature selection in big data preprocessing based on hybrid cloud-based model [J].
Shehab, Noha ;
Badawy, Mahmoud ;
Ali, H. Arafat .
JOURNAL OF SUPERCOMPUTING, 2022, 78 (03) :3226-3265
[42]   Toward feature selection in big data preprocessing based on hybrid cloud-based model [J].
Noha Shehab ;
Mahmoud Badawy ;
H Arafat Ali .
The Journal of Supercomputing, 2022, 78 :3226-3265
[43]   New Model of Feature Selection based Chaotic Firefly Algorithm for Arabic Text Categorization [J].
Hadni, Meryeme ;
Hjiaj, Hassane .
INTERNATIONAL ARAB JOURNAL OF INFORMATION TECHNOLOGY, 2023, 20 (3A) :461-468
[44]   Utility-based feature selection for text classification [J].
Heyong Wang ;
Ming Hong ;
Raymond Yiu Keung Lau .
Knowledge and Information Systems, 2019, 61 :197-226
[45]   Feature subset selection in SOM based text categorization [J].
Bassiouny, S ;
Nagi, M ;
Hussein, MF .
IC-AI '04 & MLMTA'04 , VOL 1 AND 2, PROCEEDINGS, 2004, :860-866
[46]   Utility-based feature selection for text classification [J].
Wang, Heyong ;
Hong, Ming ;
Lau, Raymond Yiu Keung .
KNOWLEDGE AND INFORMATION SYSTEMS, 2019, 61 (01) :197-226
[47]   Design of privacy preserving model based on clustering involved anonymization along with feature selection [J].
Srijayanthi, S. ;
Sethukarasi, T. .
COMPUTERS & SECURITY, 2023, 126
[48]   Principal Component Analysis based Feature Selection for clustering [J].
Xu, Jun-Ling ;
Xu, Bao-Wen ;
Zhang, Wei-Feng ;
Cui, Zi-Feng .
PROCEEDINGS OF 2008 INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND CYBERNETICS, VOLS 1-7, 2008, :460-+
[49]   Feature selection in robust clustering based on Laplace mixture [J].
Cord, A ;
Ambroise, C ;
Cocquerez, JP .
PATTERN RECOGNITION LETTERS, 2006, 27 (06) :627-635
[50]   Feature selection and clustering based web service selection using QoSs [J].
Lalit Purohit ;
Santosh S. Rathore ;
Sandeep Kumar .
Applied Intelligence, 2023, 53 :13352-13377