Research on Distributed Text Clustering Based on Frequent Itemset

被引:0
作者
Yang, Wenchuan [1 ]
Wu, Qiwei [1 ]
Cheng, Zishuai [1 ]
机构
[1] Beijing Univ Posts & Telecommun, Sch Network Secur, Beijing 100876, Peoples R China
来源
PROCEEDINGS OF THE 36TH CHINESE CONTROL CONFERENCE (CCC 2017) | 2017年
基金
中国国家自然科学基金;
关键词
Text clustering; Frequent Itemset; Correlation analysis; Hadoop;
D O I
暂无
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Text clustering, as a significant field in natural language processing, is a key technology of processing and organizing massive text data. In the era of big data, however, the massiveness of data brings great challenge in aspects of time and accuracy of text clustering. This paper focus on the issue of speed and preciseness in text clustering combined with genetic algorithm, feedback and distributed computing. A distributed text clustering method is proposed, and it is based on frequent Itemset. The examination result shows it can find out the global optimal centers more efficiently and make the clustering most accurate.
引用
收藏
页码:5700 / 5705
页数:6
相关论文
共 5 条
[1]   Sparse kernel spectral clustering models for large-scale data analysis [J].
Alzate, Carlos ;
Suykens, Johan A. K. .
NEUROCOMPUTING, 2011, 74 (09) :1382-1390
[2]  
He Feng, 2010, HIGH TECH COMMUNICAT, V20
[3]  
Liu Bing, 2009, WEB DATA MINING, P12
[4]  
Sun Lijuan, 2013, SPECTRAL CLUSTERING
[5]  
Yang Daiqing, 2009, MODERN LIB INFORM TE, P23