Deriving Topics in Twitter by Exploiting Tweet Interactions

被引:9
作者
Nugroho, Robertus [1 ]
Yang, Jian [1 ]
Zhong, Youliang [1 ]
Paris, Cecile [2 ]
Nepal, Surya [2 ]
机构
[1] Macquarie Univ, Dept Comp, Sydney, NSW, Australia
[2] CSIRO, Canberra, ACT, Australia
来源
2015 IEEE INTERNATIONAL CONGRESS ON BIG DATA - BIGDATA CONGRESS 2015 | 2015年
关键词
Topic Derivation; Twitter; Interactions of Tweets; Joint Matrix Factorization;
D O I
10.1109/BigDataCongress.2015.22
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
Twitter as a big data social network becomes one of the most important sources for capturing the up-to-date events happening in the world. Topic derivation from Twitter is important for various applications such as situation awareness, market analysis, content filtering, and recommendations. However, tweets are short messages, which makes topic derivation challenging. Current methods employ various semantic features of tweet content but mostly overlook the interactions among tweets. In this paper, we propose a novel topic derivation method that takes into account the interactions among tweets, defined as the reciprocal activities related to people who send the tweets, as well as actions and tweet contents. In particular, topics are derived by performing a two-step matrix factorization jointly over the interactions and semantic features of the tweets. We have conducted a number of experiments on tweets collected over a period of time, showing that the proposed method consistently outperforms other advanced topic derivation methods in the literature. Our experiments also reveal that the interactions among tweets do significantly relieve the sparsity problem caused by the short-text nature of Twitter.
引用
收藏
页码:87 / 94
页数:8
相关论文
共 24 条
  • [1] Albakour M., 2013, P 22 ACM INT C C INF, P419
  • [2] [Anonymous], 2006, CRC Standard Curves and Surfaces with Mathematica
  • [3] [Anonymous], 2015, Retriev Technologies
  • [4] Multifaceted Visualisation of Annotated Social Media Data
    Bista, Sanat Kumar
    Nepal, Surya
    Paris, Cecile
    [J]. 2014 IEEE INTERNATIONAL CONGRESS ON BIG DATA (BIGDATA CONGRESS), 2014, : 699 - 706
  • [5] Latent Dirichlet allocation
    Blei, DM
    Ng, AY
    Jordan, MI
    [J]. JOURNAL OF MACHINE LEARNING RESEARCH, 2003, 3 (4-5) : 993 - 1022
  • [6] UTOPIAN: User-Driven Topic Modeling Based on Interactive Nonnegative Matrix Factorization
    Choo, Jaegul
    Lee, Changhyun
    Reddy, Chandan K.
    Park, Haesun
    [J]. IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS, 2013, 19 (12) : 1992 - 2001
  • [7] de Moor Aldo., 2010, P 6 INT C SEMANTIC S, P29
  • [8] Du S. S., MAXIOS LARGE SCALE N
  • [9] Vector Space Models of Word Meaning and Phrase Meaning: A Survey
    Erk, Katrin
    [J]. LANGUAGE AND LINGUISTICS COMPASS, 2012, 6 (10): : 635 - 653
  • [10] Probabilistic latent semantic indexing
    Hofmann, T
    [J]. SIGIR'99: PROCEEDINGS OF 22ND INTERNATIONAL CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL, 1999, : 50 - 57