A term correlation based semi-supervised microblog clustering with dual constraints

被引:3
作者
Ma, Huifang [1 ,2 ]
Zhang, Di [1 ]
Jia, Meihuizi [1 ]
Lin, Xianghong [1 ]
机构
[1] Northwest Normal Univ, Coll Comp Sci & Engn, Lanzhou 730070, Gansu, Peoples R China
[2] Chinese Acad Sci, Inst Comp Technol, Key Lab Intelligent Informat Proc, Beijing 100085, Peoples R China
基金
中国国家自然科学基金;
关键词
Semi-supervised clustering; Microblogs; Dual constraints; Term correlation matrix; Nonnegative matrix factorization; NONNEGATIVE MATRIX FACTORIZATION;
D O I
10.1007/s13042-017-0750-0
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Microblog clustering is very important in many web applications. However, microblogs do not provide sufficient word occurrences. Meanwhile the limited length of these messages prevents traditional text clustering approaches from being employed to their full potential. To address this problem, in this paper, we propose a novel semi-supervised learning scheme fully exploring the semantic information to compensate for the limited message length. The key idea is to explore term correlation data, which well captures the semantic information for term weighting and provides greater context for microblogs. We then formulate microblog clustering problem as a semi-supervised non-negative matrix factorization co-clustering framework, which takes advantage of both prior domain knowledge of data points (microblogs) in the form of pair-wise constraints and category knowledge of features (terms). Our approach not only greatly reduces the labor-intensive labeling process, but also deeply exploits hidden information from microblog itself. Extensive experiments are conducted on two real-world microblog datasets. The results demonstrate the effectiveness of the proposed approach which produces promising performance as compared to state-of-the-art methods.
引用
收藏
页码:679 / 692
页数:14
相关论文
共 41 条
[31]   Enriching short text representation in microblog for clustering [J].
Tang, Jiliang ;
Wang, Xufei ;
Gao, Huiji ;
Hu, Xia ;
Liu, Huan .
FRONTIERS OF COMPUTER SCIENCE, 2012, 6 (01) :88-101
[32]   Constrained spectral embedding for K-way data clustering [J].
Wacquet, G. ;
Caillault, E. Poisson ;
Hamad, D. ;
Hebert, P-A .
PATTERN RECOGNITION LETTERS, 2013, 34 (09) :1009-1017
[33]  
Wagstaff K., 2001, P INT C MACH LEARN, V1, P577
[34]  
Wang H., 2011, PROC 22 INT JOINT C, P1553
[35]  
Xing E., 2002, Advances in neural information processing systems, V15
[36]   Self-Taught convolutional neural networks for short text clustering [J].
Xu, Jiaming ;
Xu, Bo ;
Wang, Peng ;
Zheng, Suncong ;
Tian, Guanhua ;
Zhao, Jun ;
Xu, Bo .
NEURAL NETWORKS, 2017, 88 :22-31
[37]  
Yan X., 2013, P 22 INT C WORLD WID, P1445
[38]   Combining Lexical and Semantic Features for Short Text Classification [J].
Yang, Lili ;
Li, Chunping ;
Ding, Qiang ;
Li, Li .
17TH INTERNATIONAL CONFERENCE IN KNOWLEDGE BASED AND INTELLIGENT INFORMATION AND ENGINEERING SYSTEMS - KES2013, 2013, 22 :78-86
[39]  
Zhao WNX, 2011, LECT NOTES COMPUT SC, V6611, P338, DOI 10.1007/978-3-642-20161-5_34
[40]   Effective semi-supervised document clustering via active learning with instance-level constraints [J].
Zhao, Weizhong ;
He, Qing ;
Ma, Huifang ;
Shi, Zhongzhi .
KNOWLEDGE AND INFORMATION SYSTEMS, 2012, 30 (03) :569-587