Developing a Real-time Data Analytics Framework For Twitter Streaming Data

被引:20
|
作者
Yadranjiaghdam, Babak [1 ]
Yasrobi, Seyedfaraz [1 ]
Tabrizi, Nasseh [1 ]
机构
[1] East Carolina Univ, Dept Comp Sci, Greenville, NC 27858 USA
关键词
Streaming processing; Big Data; Kafka; Spark; Twitter; Real-time; BIG DATA;
D O I
10.1109/BigDataCongress.2017.49
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Twitter is an online social networking service with more than 300 million users, generating a huge amount of information every day. Twitter's most important characteristic is its ability for users to tweet about events, situations, feelings, opinions, or even something totally new, in real time. Currently there are different workflows offering real-time data analysis for Twitter, presenting general processing over streaming data. This study will attempt to develop an analytical framework with the ability of in-memory processing to extract and analyze structured and unstructured Twitter data. The proposed framework includes data ingestion, stream processing, and data visualization components with the Apache Kafka messaging system that is used to perform data ingestion task. Furthermore, Spark makes it possible to perform sophisticated data processing and machine learning algorithms in real time. We have conducted a case study on tweets about the earthquake in Japan and the reactions of people around the world with analysis on the time and origin of the tweets.
引用
收藏
页码:329 / 336
页数:8
相关论文
共 50 条
  • [41] Logical big data integration and near real-time data analytics
    Silva, Bruno
    Moreira, Jose
    Costa, Rogerio Luis de C.
    DATA & KNOWLEDGE ENGINEERING, 2023, 146
  • [42] Text Mining and Real-Time Analytics of Twitter Data: A Case Study of Australian Hay Fever Prediction
    Subramani, Sudha
    Michalska, Sandra
    Wang, Hua
    Whittaker, Frank
    Heyward, Benjamin
    HEALTH INFORMATION SCIENCE (HIS 2018), 2018, 11148 : 134 - 145
  • [43] Real-time Outlier Detection over Streaming Data
    Yu, Kangqing
    Shi, Wei
    Santoro, Nicola
    Ma, Xiangyu
    2019 IEEE SMARTWORLD, UBIQUITOUS INTELLIGENCE & COMPUTING, ADVANCED & TRUSTED COMPUTING, SCALABLE COMPUTING & COMMUNICATIONS, CLOUD & BIG DATA COMPUTING, INTERNET OF PEOPLE AND SMART CITY INNOVATION (SMARTWORLD/SCALCOM/UIC/ATC/CBDCOM/IOP/SCI 2019), 2019, : 125 - 132
  • [44] A dynamic balanced quadtree for real-time streaming data
    Yang, Guang
    Wu, Xia
    Zhang, Jing
    KNOWLEDGE-BASED SYSTEMS, 2023, 263
  • [45] Interactive Data Cleaning for Real-Time Streaming Applications
    Raeth, Timo
    Onah, Ngozichukwuka
    Sattler, Kai-Uwe
    WORKSHOP ON HUMAN-IN-THE-LOOP DATA ANALYTICS, HILDA 2023, 2023,
  • [46] Management of real-time streaming data grid services
    Fox, G
    Aydin, G
    Gadgil, H
    Pallickara, S
    Pierce, M
    Wu, WJ
    GRID AND COOPERATIVE COMPUTING - GCC 2005, PROCEEDINGS, 2005, 3795 : 3 - 12
  • [47] Real-time Spread Burst Detection in Data Streaming
    Wang, Haibo
    Melissourgos, Dimitrios
    Ma, Chaoyi
    Chen, Shigang
    PROCEEDINGS OF THE ACM ON MEASUREMENT AND ANALYSIS OF COMPUTING SYSTEMS, 2023, 7 (02) : 1 - 31
  • [48] Unsupervised real-time anomaly detection for streaming data
    Ahmad, Subutai
    Lavin, Alexander
    Purdy, Scott
    Agha, Zuha
    NEUROCOMPUTING, 2017, 262 : 134 - 147
  • [49] Management of real-time streaming data Grid services
    Fox, Geoffrey
    Aydin, Galip
    Bulut, Hasan
    Gadgil, Harshawardhan
    Pallickara, Shrideep
    Pierce, Marlon
    Wu, Wenjun
    CONCURRENCY AND COMPUTATION-PRACTICE & EXPERIENCE, 2007, 19 (07): : 983 - 998
  • [50] Streaming Data Movement for Real-Time Image Analysis
    Abelardo López-Lagunas
    Sek Chai
    Journal of Signal Processing Systems, 2011, 62 : 29 - 42