Deriving Topics in Twitter by Exploiting Tweet Interactions

被引：9

作者：

Nugroho, Robertus ^{[1
]}

Yang, Jian ^{[1
]}

Zhong, Youliang ^{[1
]}

Paris, Cecile ^{[2
]}

Nepal, Surya ^{[2
]}

机构：

[1] Macquarie Univ, Dept Comp, Sydney, NSW, Australia

[2] CSIRO, Canberra, ACT, Australia

来源：

2015 IEEE INTERNATIONAL CONGRESS ON BIG DATA - BIGDATA CONGRESS 2015 | 2015年

关键词：

Topic Derivation; Twitter; Interactions of Tweets; Joint Matrix Factorization;

D O I：

10.1109/BigDataCongress.2015.22

中图分类号：

TP301 [理论、方法];

学科分类号：

081202 ;

摘要：

Twitter as a big data social network becomes one of the most important sources for capturing the up-to-date events happening in the world. Topic derivation from Twitter is important for various applications such as situation awareness, market analysis, content filtering, and recommendations. However, tweets are short messages, which makes topic derivation challenging. Current methods employ various semantic features of tweet content but mostly overlook the interactions among tweets. In this paper, we propose a novel topic derivation method that takes into account the interactions among tweets, defined as the reciprocal activities related to people who send the tweets, as well as actions and tweet contents. In particular, topics are derived by performing a two-step matrix factorization jointly over the interactions and semantic features of the tweets. We have conducted a number of experiments on tweets collected over a period of time, showing that the proposed method consistently outperforms other advanced topic derivation methods in the literature. Our experiments also reveal that the interactions among tweets do significantly relieve the sparsity problem caused by the short-text nature of Twitter.

引用

页码：87 / 94

页数：8

共 24 条

[1] Albakour M., 2013, P 22 ACM INT C C INF, P419
[2] [Anonymous], 2006, CRC Standard Curves and Surfaces with Mathematica
[3] [Anonymous], 2015, Retriev Technologies
[4] Multifaceted Visualisation of Annotated Social Media Data
Bista, Sanat Kumar
Nepal, Surya
Paris, Cecile
[J]. 2014 IEEE INTERNATIONAL CONGRESS ON BIG DATA (BIGDATA CONGRESS), 2014, : 699 - 706
[5] Latent Dirichlet allocation
Blei, DM
Ng, AY
Jordan, MI
[J]. JOURNAL OF MACHINE LEARNING RESEARCH, 2003, 3 (4-5) : 993 - 1022
[6] UTOPIAN: User-Driven Topic Modeling Based on Interactive Nonnegative Matrix Factorization
Choo, Jaegul
Lee, Changhyun
Reddy, Chandan K.
Park, Haesun
[J]. IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS, 2013, 19 (12) : 1992 - 2001
[7] de Moor Aldo., 2010, P 6 INT C SEMANTIC S, P29
[8] Du S. S., MAXIOS LARGE SCALE N
[9] Vector Space Models of Word Meaning and Phrase Meaning: A Survey
Erk, Katrin
[J]. LANGUAGE AND LINGUISTICS COMPASS, 2012, 6 (10): : 635 - 653
[10] Probabilistic latent semantic indexing
Hofmann, T
[J]. SIGIR'99: PROCEEDINGS OF 22ND INTERNATIONAL CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL, 1999, : 50 - 57

← 1 2 3 →