TweetSift: Tweet Topic Classification Based on Entity Knowledge Base and Topic Enhanced Word Embedding

被引:19
作者
Li, Quanzhi [1 ]
Shah, Sameena [1 ]
Liu, Xiaomo [1 ]
Nourbakhsh, Armineh [1 ]
Fang, Rui [1 ]
机构
[1] Thomson Reuters, Res & Dev, 3 Times Sq, Nyc, NY 10036 USA
来源
CIKM'16: PROCEEDINGS OF THE 2016 ACM CONFERENCE ON INFORMATION AND KNOWLEDGE MANAGEMENT | 2016年
关键词
Tweet topic classification; entity knowledge base; topic enhanced word embedding; twitter;
D O I
10.1145/2983323.2983325
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Classifying tweets into topic categories is necessary and important for many applications, since tweets are about a variety of topics and users are only interested in certain topical areas. Many tweet classification approaches fail to achieve high accuracy due to data sparseness issue. Tweet, as a special type of short text, in additional to its text, also has other metadata that can be used to enrich its context, such as user name, mention, hashtag and embedded link. In this demonstration, we present TweetSift, an efficient and effective real time tweet topic classifier. TweetSift exploits external tweet- specific entity knowledge to provide more topical context for a tweet, and integrates them with topic enhanced word embeddings for topic classification. The demonstration will show how TweetSift works and how it is incorporated with our social media event detection system.
引用
收藏
页码:2429 / 2432
页数:4
相关论文
共 13 条
[1]  
[Anonymous], 2013, WORKSH ICLR
[2]  
[Anonymous], 2011, J MACHINE LEARNING R
[3]  
Banerjee S., 2007, P ACM SIGIR
[4]   Probabilistic Topic Models [J].
Blei, David M. .
COMMUNICATIONS OF THE ACM, 2012, 55 (04) :77-84
[5]   Latent Dirichlet allocation [J].
Blei, DM ;
Ng, AY ;
Jordan, MI .
JOURNAL OF MACHINE LEARNING RESEARCH, 2003, 3 (4-5) :993-1022
[6]  
Bollegala D., WWW
[7]  
Liu P., IJCAI
[8]  
Liu Y., AAAI
[9]  
Owoputi O., NAACL
[10]  
Palnad A., WSDM