Event detection in finance using hierarchical clustering algorithms on news and tweets

被引:13
|
作者
Carta, Salvatore [1 ]
Consoli, Sergio [2 ]
Piras, Luca [1 ]
Podda, Alessandro Sebastian [1 ]
Recupero, Diego Reforgiato [1 ]
机构
[1] Univ Cagliari, Dept Math & Comp Sci, Cagliari, Italy
[2] European Commiss, Joint Res Ctr DG JRC, Ispra, Varese, Italy
关键词
Natural language processing; Event detection; News analysis; Social media; Finance; Hierarchical clustering; Stocktwits; Text mining; Big data; TWITTER; FRAMEWORK; INFORMATION; EXTRACTION; BURSTY; MODEL;
D O I
10.7717/peerj-cs.438
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In the current age of overwhelming information and massive production of textual data on the Web, Event Detection has become an increasingly important task in various application domains. Several research branches have been developed to tackle the problem from different perspectives, including Natural Language Processing and Big Data analysis, with the goal of providing valuable resources to support decision-making in a wide variety of fields. In this paper, we propose a real-time domain-specific clustering-based event-detection approach that integrates textual information coming, on one hand, from traditional newswires and, on the other hand, from microblogging platforms. The goal of the implemented pipeline is twofold: (i) providing insights to the user about the relevant events that are reported in the press on a daily basis; (ii) alerting the user about potentially important and impactful events, referred to as hot events, for some specific tasks or domains of interest. The algorithm identifies clusters of related news stories published by globally renowned press sources, which guarantee authoritative, noise-free information about current affairs; subsequently, the content extracted from microblogs is associated to the clusters in order to gain an assessment of the relevance of the event in the public opinion. To identify the events of a day d we create the lexicon by looking at news articles and stock data of previous days up to d(-1) Although the approach can be extended to a variety of domains (e.g. politics, economy, sports), we hereby present a specific implementation in the financial sector. We validated our solution through a qualitative and quantitative evaluation, performed on the Dow Jones' Data, News and Analytics dataset, on a stream of messages extracted from the microblogging platform Stocktwits, and on the Standard & Poor's 500 index time-series. The experiments demonstrate the effectiveness of our proposal in extracting meaningful information from real-world events and in spotting hot events in the financial sphere. An added value of the evaluation is given by the visual inspection of a selected number of significant real-world events, starting from the Brexit Referendum and reaching until the recent outbreak of the Covid-19 pandemic in early 2020.
引用
收藏
页数:39
相关论文
共 50 条
  • [41] Classification of Multi-Spectral Satellite Image Using Hierarchical Clustering Algorithms
    Kulkarni, Sushant
    Senthilnath, J.
    Benediktsson, Jon Atli
    2018 IEEE SYMPOSIUM SERIES ON COMPUTATIONAL INTELLIGENCE (IEEE SSCI), 2018, : 1664 - 1669
  • [42] Classification of bioinformatics workflows using weighted versions of partitioning and hierarchical clustering algorithms
    Etienne Lord
    Abdoulaye Baniré Diallo
    Vladimir Makarenkov
    BMC Bioinformatics, 16
  • [43] Classification of bioinformatics workflows using weighted versions of partitioning and hierarchical clustering algorithms
    Lord, Etienne
    Diallo, Abdoulaye Banire
    Makarenkov, Vladimir
    BMC BIOINFORMATICS, 2015, 16
  • [44] Hierarchical Stream Clustering Based NEWS Summarization System
    Raja, M. Arun Manicka
    Swamynathan, S.
    CMC-COMPUTERS MATERIALS & CONTINUA, 2022, 70 (01): : 1263 - 1280
  • [45] Hierarchical Latent Concept Discovery for Video Event Detection
    Li, Chao
    Huang, Zi
    Yang, Yang
    Cao, Jiewei
    Sun, Xiaoshuai
    Shen, Heng Tao
    IEEE TRANSACTIONS ON IMAGE PROCESSING, 2017, 26 (05) : 2149 - 2162
  • [46] Robust Social Event Detection via Deep Clustering
    Zhang, Jiaofu
    Liu, Lianzhong
    Huang, Zihang
    Han, Lihua
    Wang, Shuhai
    Xu, Tongge
    Zhang, Jingyi
    Li, Yangyang
    Liu, Yifeng
    Bhuiyan, Md Zakirul Alam
    19TH IEEE INTERNATIONAL SYMPOSIUM ON PARALLEL AND DISTRIBUTED PROCESSING WITH APPLICATIONS (ISPA/BDCLOUD/SOCIALCOM/SUSTAINCOM 2021), 2021, : 814 - 819
  • [47] Recapitulization of Tweets Using Graph-based Clustering
    Lobo, Vivian Brian
    Ansari, Nazneen
    2017 2ND INTERNATIONAL CONFERENCE ON COMMUNICATION SYSTEMS, COMPUTING AND IT APPLICATIONS (CSCITA), 2017, : 101 - 106
  • [48] EveTAR: A New Test Collection for Event Detection in Arabic Tweets
    Almerekhi, Hind
    Hasanain, Maram
    Elsayed, Tamer
    SIGIR'16: PROCEEDINGS OF THE 39TH INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL, 2016, : 689 - 692
  • [49] Event Detection from News Articles
    Sayyadi, Hassan
    Sahraei, Alireza
    Abolhassani, Hassan
    ADVANCES IN COMPUTER SCIENCE AND ENGINEERING, 2008, 6 : 981 - 984
  • [50] DETSApp: An App for Disaster Event Tweets Summarization using Images Posted on Twitter
    Layek, Ashish Kumar
    Pal, Abhishek
    Saha, Rahul
    Mandal, Sekhar
    PROCEEDINGS OF 2018 FIFTH INTERNATIONAL CONFERENCE ON EMERGING APPLICATIONS OF INFORMATION TECHNOLOGY (EAIT), 2018,