Discourse-aware rumour stance classification in social media using sequential classifiers

被引:91
作者
Zubiaga, Arkaitz [1 ]
Kochkina, Elena [1 ,2 ]
Liakata, Maria [1 ,2 ]
Procter, Rob [1 ,2 ]
Lukasik, Michal [3 ]
Bontcheva, Kalina [3 ]
Cohn, Trevor [4 ]
Augenstein, Isabelle [5 ]
机构
[1] Univ Warwick, Dept Comp Sci, Gibbet Hill Rd, Coventry CV4 7AL, W Midlands, England
[2] Alan Turing Inst, 96 Euston Rd, London NW1 2DB, England
[3] Univ Sheffield, Dept Comp Sci, Regent Court 211, Sheffield S1 4DP, S Yorkshire, England
[4] Univ Melbourne, Comp & Informat Syst, Melbourne, Vic 3010, Australia
[5] Univ Copenhagen, Dept Comp Sci, Sigurdsgade 41, DK-2200 Copenhagen N, Denmark
基金
英国工程与自然科学研究理事会;
关键词
Stance classification; Social media; Breaking news; Veracity classification; SENTIMENT ANALYSIS; REAL-TIME; TWITTER;
D O I
10.1016/j.ipm.2017.11.009
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Rumour stance classification, defined as classifying the stance of specific social media posts into one of supporting, denying, querying or commenting on an earlier post, is becoming of increasing interest to researchers. While most previous work has focused on using individual tweets as classifier inputs, here we report on the performance of sequential classifiers that exploit the discourse features inherent in social media interactions or 'conversational threads'. Testing the effectiveness of four sequential classifiers-Hawkes Processes, Linear-Chain Conditional Random Fields (Linear CRF), Tree-Structured Conditional Random Fields (Tree CRF) and Long Short Term Memory networks (LSTM)-on eight datasets associated with breaking news stories, and looking at different types of local and contextual features, our work sheds new light on the development of accurate stance classifiers. We show that sequential classifiers that exploit the use of discourse properties in social media conversations while using only local features, outperform non-sequential classifiers. Furthermore, we show that LSTM using a reduced set of features can outperform the other sequential classifiers; this performance is consistent across datasets and across types of stances. To conclude, our work also analyses the different features under study, identifying those that best help characterise and distinguish between stances, such as supporting tweets being more likely to be accompanied by evidence than denying tweets. We also set forth a number of directions for future research.
引用
收藏
页码:273 / 290
页数:18
相关论文
共 79 条
  • [1] An J., 2011, Proceedings of the Fifth International AAAI Conference on Weblogs and Social Media, V5, P18
  • [2] [Anonymous], 2013, P 30 INT C INT C MAC
  • [3] [Anonymous], 2017, P 11 INT WORKSHOP SE
  • [4] [Anonymous], 2015, Technical Report
  • [5] [Anonymous], P 2015 EUR SEM WEB C
  • [6] [Anonymous], ACM CO IN PRESS 1110
  • [7] [Anonymous], 2011, P C EMPIRICAL METHOD
  • [8] [Anonymous], 2008, Journal of Statistical Software, Code Snippets, DOI [10.18637/jss.v028.c01, DOI 10.18637/JSS.V028.C01]
  • [9] [Anonymous], 161007363 ARXIV
  • [10] [Anonymous], 2017, P 11 INT WORKSHOP SE