Event detection in finance using hierarchical clustering algorithms on news and tweets

被引:13
|
作者
Carta, Salvatore [1 ]
Consoli, Sergio [2 ]
Piras, Luca [1 ]
Podda, Alessandro Sebastian [1 ]
Recupero, Diego Reforgiato [1 ]
机构
[1] Univ Cagliari, Dept Math & Comp Sci, Cagliari, Italy
[2] European Commiss, Joint Res Ctr DG JRC, Ispra, Varese, Italy
关键词
Natural language processing; Event detection; News analysis; Social media; Finance; Hierarchical clustering; Stocktwits; Text mining; Big data; TWITTER; FRAMEWORK; INFORMATION; EXTRACTION; BURSTY; MODEL;
D O I
10.7717/peerj-cs.438
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In the current age of overwhelming information and massive production of textual data on the Web, Event Detection has become an increasingly important task in various application domains. Several research branches have been developed to tackle the problem from different perspectives, including Natural Language Processing and Big Data analysis, with the goal of providing valuable resources to support decision-making in a wide variety of fields. In this paper, we propose a real-time domain-specific clustering-based event-detection approach that integrates textual information coming, on one hand, from traditional newswires and, on the other hand, from microblogging platforms. The goal of the implemented pipeline is twofold: (i) providing insights to the user about the relevant events that are reported in the press on a daily basis; (ii) alerting the user about potentially important and impactful events, referred to as hot events, for some specific tasks or domains of interest. The algorithm identifies clusters of related news stories published by globally renowned press sources, which guarantee authoritative, noise-free information about current affairs; subsequently, the content extracted from microblogs is associated to the clusters in order to gain an assessment of the relevance of the event in the public opinion. To identify the events of a day d we create the lexicon by looking at news articles and stock data of previous days up to d(-1) Although the approach can be extended to a variety of domains (e.g. politics, economy, sports), we hereby present a specific implementation in the financial sector. We validated our solution through a qualitative and quantitative evaluation, performed on the Dow Jones' Data, News and Analytics dataset, on a stream of messages extracted from the microblogging platform Stocktwits, and on the Standard & Poor's 500 index time-series. The experiments demonstrate the effectiveness of our proposal in extracting meaningful information from real-world events and in spotting hot events in the financial sphere. An added value of the evaluation is given by the visual inspection of a selected number of significant real-world events, starting from the Brexit Referendum and reaching until the recent outbreak of the Covid-19 pandemic in early 2020.
引用
收藏
页数:39
相关论文
共 50 条
  • [1] Event Detection in Finance Using Hierarchical Clustering Algorithms on News and Tweets
    Carta S.
    Consoli S.
    Piras L.
    Podda A.S.
    Recupero D.R.
    PeerJ Computer Science, 2021, 7 : 1 - 39
  • [2] SEDTWik: Segmentation-based Event Detection from Tweets using Wikipedia
    Morabia, Keval M.
    Murthy, Neti Lalita Bhanu
    Malapati, Aruna
    Samant, Surender S.
    NAACL HLT 2019: THE 2019 CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: HUMAN LANGUAGE TECHNOLOGIES: PROCEEDINGS OF THE STUDENT RESEARCH WORKSHOP, 2019, : 77 - 85
  • [3] Anticipointment Detection in Event Tweets
    Kunneman, F.
    van Mulken, M.
    van den Bosch, A.
    INTERNATIONAL JOURNAL ON ARTIFICIAL INTELLIGENCE TOOLS, 2020, 29 (02)
  • [4] The Detection of Fake News in Arabic Tweets Using Deep Learning
    Alyoubi, Shatha
    Kalkatawi, Manal
    Abukhodair, Felwa
    APPLIED SCIENCES-BASEL, 2023, 13 (14):
  • [5] Spatial-Temporal Event Detection from Geo-Tagged Tweets
    Huang, Yuqian
    Li, Yue
    Shan, Jie
    ISPRS INTERNATIONAL JOURNAL OF GEO-INFORMATION, 2018, 7 (04)
  • [6] Unsupervised Event Detection Using Self-learning-based Max-margin Clustering: Analysis on Streaming Tweets
    Gupta, Swati
    Banerjee, Biplab
    IETE JOURNAL OF RESEARCH, 2020, 66 (04) : 569 - 578
  • [7] Harnessing Tweets for Early Detection of an Acute Disease Event
    Joshi, Aditya
    Sparks, Ross
    McHugh, James
    Karimi, Sarvnaz
    Paris, Cecile
    MacIntyre, C. Raina
    EPIDEMIOLOGY, 2020, 31 (01) : 90 - 97
  • [8] Evaluation of Peak Detection Algorithms for Social Media Event Detection
    Healy, Philip
    Hunt, Graham
    Kilroy, Steven
    Lynn, Theo
    Morrison, John P.
    Venkatagiri, Shankar
    10TH INTERNATIONAL WORKSHOP ON SEMANTIC AND SOCIAL MEDIA ADAPTATION AND PERSONALIZATION SMAP 2015, 2015, : 46 - 51
  • [9] Classification of Traffic Event Tweets in Portuguese Language Using Deep Learning
    Teixeira, Estevan Barbara
    de Souza Moura, Pedro Nuno
    Vieira Campos, Carlos Alberto
    2022 INTERNATIONAL WIRELESS COMMUNICATIONS AND MOBILE COMPUTING, IWCMC, 2022, : 566 - 571
  • [10] A Study of Hierarchical Clustering Algorithms
    Patel, Sakshi
    Sihmar, Shivani
    Jatain, Aman
    2015 2ND INTERNATIONAL CONFERENCE ON COMPUTING FOR SUSTAINABLE GLOBAL DEVELOPMENT (INDIACOM), 2015, : 537 - 541