Real-time processing of social media with SENTINEL: A syndromic surveillance system incorporating deep learning for health classification

被引:62
作者
Serban, Ovidiu [1 ]
Thapen, Nicholas [1 ]
Maginnis, Brendan [1 ]
Hankin, Chris [1 ]
Foot, Virginia [2 ]
机构
[1] Imperial Coll London, Inst Secur Sci & Technol, South Kensington Campus, London SW7 2AZ, England
[2] DSTL, Salisbury SP4 0JQ, Wilts, England
关键词
Real-time processing; Classification; Clustering; Event detection; EVENT DETECTION;
D O I
10.1016/j.ipm.2018.04.011
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Interest in real-time syndromic surveillance based on social media data has greatly increased in recent years. The ability to detect disease outbreaks earlier than traditional methods would be highly useful for public health officials. This paper describes a software system which is built upon recent developments in machine learning and data processing to achieve this goal. The system is built from reusable modules integrated into data processing pipelines that are easily deployable and configurable. It applies deep learning to the problem of classifying health-related tweets and is able to do so with high accuracy. It has the capability to detect illness outbreaks from Twitter data and then to build up and display information about these outbreaks, including relevant news articles, to provide situational awareness. It also provides nowcasting functionality of current disease levels from previous clinical data combined with Twitter data. The preliminary results are promising, with the system being able to detect outbreaks of influenza-like illness symptoms which could then be confirmed by existing official sources. The Nowcasting module shows that using social media data can improve prediction for multiple diseases over simply using traditional data sources.
引用
收藏
页码:1166 / 1184
页数:19
相关论文
共 67 条
  • [1] Abadi M., 2016, 12 USENIX S OPERATIN, DOI 10.5555/3026877.3026899
  • [2] EvenTweet: Online Localized Event Detection from Twitter
    Abdelhaq, Flamed
    Sengstock, Christian
    Gertz, Michael
    [J]. PROCEEDINGS OF THE VLDB ENDOWMENT, 2013, 6 (12): : 1326 - 1329
  • [3] Aggarwal C. C., 2012, SIAM 2012 INT C DATA, P624
  • [4] EFFICIENT STRING MATCHING - AID TO BIBLIOGRAPHIC SEARCH
    AHO, AV
    CORASICK, MJ
    [J]. COMMUNICATIONS OF THE ACM, 1975, 18 (06) : 333 - 340
  • [5] [Anonymous], 2013, Libshorttext: a library for short-text classification and analysis
  • [6] [Anonymous], 2006, Proceedings of the 32nd international conference on Very large data bases
  • [7] Aramaki E., 2011, P C EMP METH NAT LAN, P1568
  • [8] Bansal Piyush, 2015, Advances in Information Retrieval. 37th European Conference on IR Research (ECIR 2015). Proceedings: LNCS 9022, P453, DOI 10.1007/978-3-319-16354-3_50
  • [9] Bodnar T, 2013, PROCEEDINGS OF THE 22ND INTERNATIONAL CONFERENCE ON WORLD WIDE WEB (WWW'13 COMPANION), P699
  • [10] Bojanowski P., 2017, Enriching word vectors with subword information, V5, P135, DOI [10.1162/tacla00051, DOI 10.1162/TACLA00051]