CMPK: a high accuracy microblog user classification method for professional analysis

被引:0
作者
Peng, Ying [1 ]
Wang, Haiquan [1 ]
机构
[1] Beihang Univ, Sch Comp Sci & Engn, Beijing, Peoples R China
来源
2013 INTERNATIONAL CONFERENCE ON CLOUD AND SERVICE COMPUTING (CSC 2013) | 2013年
关键词
text mining; user classification; vector space model; K-Nearest Neighbor algorithm;
D O I
10.1109/CSC.2013.28
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
Analyzing and mining the massive data recorded in microblog in order to discover the characteristics and rules of individual behaviors, group behaviors and interactive behaviors is now the research hotspot of massive data mining and behavioral analysis area. However, the influence of social attributes, such as user's occupation, to his behavior and social relations is always neglected in the existing researches. Concerning this issue, the paper proposed a high accuracy microblog user classification method for professional analysis-CMPK (Classification Method based on Professional lexicon and K-nearest neighbor algorithm), this method uses vector space model combined with the professional lexicon and KNN (K-Nearest Neighbor algorithm) classification algorithm to analyze the industry that the microblog user belongs to based on all kinds of information he put on the network. The experiment proved that the accuracy rate of CMPK is nearly 90% which is high precision.
引用
收藏
页码:134 / 139
页数:6
相关论文
共 8 条
[1]  
[Anonymous], 2013, 31 TIMES CHINESE INT
[2]  
Cui Anqi, 2012, P JOINT C 6 CHIN SEM
[3]  
Jiang Chengyi, 2012, J BOOK INTELLIGENCE, V17, P138
[4]  
Lin Xiaojun, 2010, COMPUTER ENG, P4
[5]  
Liu Zheng., 2010, Do credit constraints amplify macroeconomic fluctuations?, P1
[6]   Short Text Classification in Twitter to Improve Information Filtering [J].
Sriram, Bharath ;
Fuhry, David ;
Demir, Engin ;
Ferhatosmanoglu, Hakan ;
Demirbas, Murat .
SIGIR 2010: PROCEEDINGS OF THE 33RD ANNUAL INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH DEVELOPMENT IN INFORMATION RETRIEVAL, 2010, :841-842
[7]  
Xiao YinTao, 2011, THESIS JIANGSU U
[8]  
Zhang Ning, 2005, COMPUTER ENG, V31