Extracting common emotions from blogs based on fine-grained sentiment clustering

被引:29
作者
Feng, Shi [1 ]
Wang, Daling [1 ]
Yu, Ge [1 ]
Gao, Wei [2 ]
Wong, Kam-Fai [2 ]
机构
[1] Northeastern Univ, Inst Comp Software & Theory, Shenyang, Peoples R China
[2] Chinese Univ Hong Kong, Dept Syst Engn & Engn Management, Shatin, Hong Kong, Peoples R China
基金
中国国家自然科学基金;
关键词
Opinion mining; Sentiment analysis; PLSA; WEB;
D O I
10.1007/s10115-010-0325-9
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Recently, blogs have emerged as the major platform for people to express their feelings and sentiments in the age of Web 2.0. The common emotions, which reflect people's collective and overall sentiments, are becoming the major concern for governments, business companies and individual users. Different from previous literatures on sentiment classification and summarization, the major issue of common emotion extraction is to find out people's collective sentiments and their corresponding distributions on the Web. Most existing blog clustering methods take into account keywords, stories or timelines but neglect the embedded sentiments, which are considered very important features of blogs. In this paper, a novel method based on Probabilistic Latent Semantic Analysis (PLSA) is presented to model the hidden sentiment factors and an emotion-oriented clustering approach is proposed to find common emotions according to the fine-grained sentiment similarity between blogs. Extensive experiments are conducted on real-world datasets consisting of different topics. The results show that our approach can partition blogs into sentiment coherent clusters and the extracted common emotion words afford good navigation guidelines for embedded sentiments in each cluster.
引用
收藏
页码:281 / 302
页数:22
相关论文
共 34 条
[1]  
AGARWAL N, 2008, P 8 INT C WEB ENG IC
[2]  
[Anonymous], 2002, P 40 ANN M ASS COMP
[3]  
[Anonymous], 2009, P 15 ACM SIGKDD INT
[4]  
AVERILL J, 1975, JSAS CATALOG SELECTE
[5]  
BANSAL N, 2007, P 33 INT C VER LARG
[6]  
BARILAN J, 2004, P 13 INT C WORLD WID
[7]  
BEKKERMAN R, 2007, P 20 INT JOINT C ART
[8]  
CHESLEY P, 2006, AAAI SPRING S TECHN
[9]   The Google similarity distance [J].
Cilibrasi, Rudi L. ;
Vitanyi, Paul M. B. .
IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2007, 19 (03) :370-383
[10]   Using cocitation information to estimate political orientation in web documents [J].
Efron, M .
KNOWLEDGE AND INFORMATION SYSTEMS, 2006, 9 (04) :492-511