The Distributed Representation for Societal Risk Classification toward BBS Posts

被引:3
作者
Chen Jindong [1 ,2 ]
Tang Xijin [1 ]
机构
[1] Chinese Acad Sci, Acad Math & Syst Sci, Inst Syst Sci, Beijing 100190, Peoples R China
[2] China Acad Aerosp Syst Sci & Engn, Beijing 100048, Peoples R China
基金
中国国家自然科学基金;
关键词
Distributed representation; KNN; paragraph vector model; societal risk classification; Tianya forum;
D O I
10.1007/s11424-016-5099-z
中图分类号
O1 [数学];
学科分类号
0701 ; 070101 ;
摘要
The risk classification of BBS posts is important to the evaluation of societal risk level within a period. Using the posts collected from Tianya forum as the data source, the authors adopted the societal risk indicators from socio psychology, and conduct document-level multiple societal risk classification of BBS posts. To effectively capture the semantics and word order of documents, a shallow neural network as Paragraph Vector is applied to realize the distributed vector representations of the posts in the vector space. Based on the document vectors, the authors apply one classification method KNN to identify the societal risk category of the posts. The experimental results reveal that paragraph vector in document-level societal risk classification achieves much faster training speed and at least 10% improvements of F-measures than Bag-of-Words. Furthermore, the performance of paragraph vector is also superior to edit distance and Lucene-based search method. The present work is the first attempt of combining document embedding method with socio psychology research results to public opinions area.
引用
收藏
页码:627 / 644
页数:18
相关论文
共 31 条
[1]  
[Anonymous], 2011, J MACHINE LEARNING R
[2]  
[Anonymous], 2014, P INT C INT C MACH L
[3]   A neural probabilistic language model [J].
Bengio, Y ;
Ducharme, R ;
Vincent, P ;
Jauvin, C .
JOURNAL OF MACHINE LEARNING RESEARCH, 2003, 3 (06) :1137-1155
[4]   Topics and trends of the on-line public concerns based on Tianya forum [J].
Cao, Lina ;
Tang, Xijin .
JOURNAL OF SYSTEMS SCIENCE AND SYSTEMS ENGINEERING, 2014, 23 (02) :212-230
[5]   Thrombin promotes fibronectin secretion by bone marrow mesenchymal stem cells via the protease-activated receptor mediated signalling pathways [J].
Chen, Jin ;
Ma, Yujie ;
Wang, Zi ;
Wang, Hengxiang ;
Wang, Lisheng ;
Xiao, Fengjun ;
Wang, Hua ;
Tan, Jianming ;
Guo, Zikuan .
STEM CELL RESEARCH & THERAPY, 2014, 5
[6]   NEAREST NEIGHBOR PATTERN CLASSIFICATION [J].
COVER, TM ;
HART, PE .
IEEE TRANSACTIONS ON INFORMATION THEORY, 1967, 13 (01) :21-+
[7]   Measuring the Happiness of Large-Scale Written Expression: Songs, Blogs, and Presidents [J].
Dodds, Peter Sheridan ;
Danforth, Christopher M. .
JOURNAL OF HAPPINESS STUDIES, 2010, 11 (04) :441-456
[8]  
Gu J.F., 2005, 1 INT C INT FED SYST
[9]  
Hao BB, 2014, LECT NOTES COMPUT SC, V8610, P324, DOI 10.1007/978-3-319-09912-5_27
[10]  
Hirsch L, 2007, GECCO 2007: GENETIC AND EVOLUTIONARY COMPUTATION CONFERENCE, VOL 1 AND 2, P1604