Using text mining and sentiment analysis for online forums hotspot detection and forecast

被引:292
作者
Li, Nan [2 ]
Wu, Desheng Dash [1 ,3 ]
机构
[1] Univ Toronto, RiskLab, Toronto, ON M5S 1A1, Canada
[2] Univ Calif Santa Barbara, Dept Comp Sci, Santa Barbara, CA 93106 USA
[3] Reykjavik Univ, Reykjavik, Iceland
关键词
Text mining; Sentiment analysis; Cluster analysis; Online sports forums; Dynamic interacting network analysis; Hotspot detection; Machine learning; Support vector machine; SEQUENCE MOTIFS; CLASSIFICATION;
D O I
10.1016/j.dss.2009.09.003
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Text sentiment analysis, also referred to as emotional polarity computation, has become a flourishing frontier in the text mining community. This paper studies online forums hotspot detection and forecast using sentiment analysis and text mining approaches. First, we create an algorithm to automatically analyze the emotional polarity of a text and to obtain a value for each piece of text. Second, this algorithm is combined with K-means clustering and support vector machine (SVM) to develop unsupervised text mining approach. We use the proposed text mining approach to group the forums into various clusters. with the center of each representing a hotspot forum within the current time span. The data sets used in our empirical studies are acquired and formatted from Sina sports forums, which spans a range of 31 different topic forums and 220,053 posts. Experimental results demonstrate that SVM forecasting achieves highly consistent results with K-means clustering. The top 10 hotspot forums listed by SVM forecasting resembles 80% of K-means clustering results. Both SVM and K-means achieve the same results for the top 4 hotspot forums of the year. (c) 2009 Elsevier B.V. All rights reserved.
引用
收藏
页码:354 / 368
页数:15
相关论文
共 40 条
  • [11] Guralnik V., 2001, Workshop on Data Mining in Bioinformatics, P73
  • [12] RECURRING LOCAL SEQUENCE MOTIFS IN PROTEINS
    HAN, KF
    BAKER, D
    [J]. JOURNAL OF MOLECULAR BIOLOGY, 1995, 251 (01) : 176 - 187
  • [13] HAN KF, 1996, P NAT AC SCI US AM, P5814
  • [14] Predicting the semantic orientation of adjectives
    Hatzivassiloglou, V
    McKeown, KR
    [J]. 35TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS AND THE 8TH CONFERENCE OF THE EUROPEAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, PROCEEDINGS OF THE CONFERENCE, 1997, : 174 - 181
  • [15] Huang RQ, 2007, INT CONF ACOUST SPEE, P993
  • [16] An approach to text classification using dimensionality reduction and combination of classifiers
    Jain, G
    Ginwala, A
    Aslandogan, YA
    [J]. PROCEEDINGS OF THE 2004 IEEE INTERNATIONAL CONFERENCE ON INFORMATION REUSE AND INTEGRATION (IRI-2004), 2004, : 564 - 569
  • [17] JOACHIMS T, 1998, P ECM 10 EUR C MACH
  • [18] Relationship algebra for computing in social networks and social network based applications
    Khan, Javed I.
    Shaikh, Sajid
    [J]. 2006 IEEE/WIC/ACM INTERNATIONAL CONFERENCE ON WEB INTELLIGENCE, (WI 2006 MAIN CONFERENCE PROCEEDINGS), 2006, : 113 - +
  • [19] Network Environment and Financial Risk Using Machine Learning and Sentiment Analysis
    Li, Nan
    Liang, Xun
    Li, Xinli
    Wang, Chao
    Wu, Desheng Dash
    [J]. HUMAN AND ECOLOGICAL RISK ASSESSMENT, 2009, 15 (02): : 227 - 252
  • [20] Thumbs up? Sentiment classification using machine learning techniques
    Pang, B
    Lee, L
    Vaithyanathan, S
    [J]. PROCEEDINGS OF THE 2002 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING, 2002, : 79 - 86