A scalable and real-time system for disease prediction using big data processing

被引:0
作者
Abderrahmane Ed-daoudy
Khalil Maalmi
Aziza El Ouaazizi
机构
[1] National School of Applied Sciences (ENSA),Artificial Intelligence, Data Sciences and Emerging Systems Laboratory (LIASSE)
[2] Sidi Mohamed Ben Abdellah University,undefined
来源
Multimedia Tools and Applications | 2023年 / 82卷
关键词
Real-time; Streaming processing; Machine learning; MLlib; Apache Spark; Tweet processing;
D O I
暂无
中图分类号
学科分类号
摘要
The growing chronic diseases patients and the centralization of medical resources cause significant economic impact resulting in hospital visits, hospital readmission, and other healthcare costs. This paper proposes a scalable and real-time system for disease prediction from medical data streams. This is carried out by integrating Twitter, Apache Kafka, Apache Spark and Apache Cassandra. Thus, Twitter users tweet attributes related to health, Kafka streaming receives all desired tweets attributes and ingest them to Spark streaming. Here, a machine learning algorithm is applied to predict health status and send back a response message through Kafka. The heart disease dataset, obtained from the UCI repository, was used for experiments. In order to enhance prediction accuracy, Relief algorithm is used for features selection. We compared sex types of relevant machine learning algorithms implemented by Spark MLlib such as Random Forest (RF), Naive Bayes, Support Vector Machine, Multilayer Perceptron, Decision Tree and Logistic Regression with the full features as well as selected features. The highest classification accuracy of 92.05% was reported using RF with selected features. The scalability of RF using Spark MLlib and WEKA framework for both training and application stages was measured. The results show significantly better performances of Spark in terms of scalability and computing times.
引用
收藏
页码:30405 / 30434
页数:29
相关论文
共 50 条
  • [1] A scalable and real-time system for disease prediction using big data processing
    Ed-daoudy, Abderrahmane
    Maalmi, Khalil
    El Ouaazizi, Aziza
    MULTIMEDIA TOOLS AND APPLICATIONS, 2023, 82 (20) : 30405 - 30434
  • [2] Real-Time DDoS Attack Detection System Using Big Data Approach
    Awan, Mazhar Javed
    Farooq, Umar
    Babar, Hafiz Muhammad Aqeel
    Yasin, Awais
    Nobanee, Haitham
    Hussain, Muzammil
    Hakeem, Owais
    Zain, Azlan Mohd
    SUSTAINABILITY, 2021, 13 (19)
  • [3] Scalable Containerized Pipeline for Real-time Big Data Analytics
    Aurangzaib, Rana
    Iqbal, Waheed
    Abdullah, Muhammad
    Bukhari, Faisal
    Ullah, Faheem
    Erradi, Abdelkarim
    2022 IEEE INTERNATIONAL CONFERENCE ON CLOUD COMPUTING TECHNOLOGY AND SCIENCE (CLOUDCOM 2022), 2022, : 25 - 32
  • [4] Survey of Real-time Processing Systems for Big Data
    Liu, Xiufeng
    Iftikhar, Nadeem
    Xie, Xike
    PROCEEDINGS OF THE 18TH INTERNATIONAL DATABASE ENGINEERING AND APPLICATIONS SYMPOSIUM (IDEAS14), 2014, : 356 - 361
  • [5] Real-time big data processing for anomaly detection: A Survey
    Habeeb, Riyaz Ahamed Ariyaluran
    Nasaruddin, Fariza
    Gani, Abdullah
    Hashem, Ibrahim Abaker Targio
    Ahmed, Ejaz
    Imran, Muhammad
    INTERNATIONAL JOURNAL OF INFORMATION MANAGEMENT, 2019, 45 : 289 - 307
  • [6] Real-time machine learning for early detection of heart disease using big data approach
    Ed-daoudy, Abderrahmane
    Maalmi, Khalil
    2019 INTERNATIONAL CONFERENCE ON WIRELESS TECHNOLOGIES, EMBEDDED AND INTELLIGENT SYSTEMS (WITS), 2019,
  • [7] Real-time traffic congestion prediction using big data and machine learning techniques
    Chawla, Priyanka
    Hasurkar, Rutuja
    Bogadi, Chaithanya Reddy
    Korlapati, Naga Sindhu
    Rajendran, Rajasree
    Ravichandran, Sindu
    Tolem, Sai Chaitanya
    Gao, Jerry Zeyu
    WORLD JOURNAL OF ENGINEERING, 2024, 21 (01) : 140 - 155
  • [8] Parallel Job Processing Technique for Real-time Big-Data Processing Framework
    Son, Jae Gi
    Kang, Ji-Woo
    An, Jae-Hoon
    Ahn, Hyung-Joo
    Chun, Hyo-Jung
    Kim, Jung-Guk
    2016 RESEARCH IN ADAPTIVE AND CONVERGENT SYSTEMS, 2016, : 226 - 229
  • [9] Performance analysis of disease diagnostic system using IoMT and real-time data analytics
    Yildirim, Emre
    Calhan, Ali
    Cicioglu, Murtaza
    CONCURRENCY AND COMPUTATION-PRACTICE & EXPERIENCE, 2022, 34 (13)
  • [10] Real-time Estimated Time of Arrival prediction system based on historical surveillance data
    Munoz, Andres
    Scarlatti, David
    Costas, Pablo
    2019 45TH EUROMICRO CONFERENCE ON SOFTWARE ENGINEERING AND ADVANCED APPLICATIONS (SEAA 2019), 2019, : 174 - 177