A scalable and real-time system for disease prediction using big data processing

被引:0
|
作者
Abderrahmane Ed-daoudy
Khalil Maalmi
Aziza El Ouaazizi
机构
[1] National School of Applied Sciences (ENSA),Artificial Intelligence, Data Sciences and Emerging Systems Laboratory (LIASSE)
[2] Sidi Mohamed Ben Abdellah University,undefined
来源
关键词
Real-time; Streaming processing; Machine learning; MLlib; Apache Spark; Tweet processing;
D O I
暂无
中图分类号
学科分类号
摘要
The growing chronic diseases patients and the centralization of medical resources cause significant economic impact resulting in hospital visits, hospital readmission, and other healthcare costs. This paper proposes a scalable and real-time system for disease prediction from medical data streams. This is carried out by integrating Twitter, Apache Kafka, Apache Spark and Apache Cassandra. Thus, Twitter users tweet attributes related to health, Kafka streaming receives all desired tweets attributes and ingest them to Spark streaming. Here, a machine learning algorithm is applied to predict health status and send back a response message through Kafka. The heart disease dataset, obtained from the UCI repository, was used for experiments. In order to enhance prediction accuracy, Relief algorithm is used for features selection. We compared sex types of relevant machine learning algorithms implemented by Spark MLlib such as Random Forest (RF), Naive Bayes, Support Vector Machine, Multilayer Perceptron, Decision Tree and Logistic Regression with the full features as well as selected features. The highest classification accuracy of 92.05% was reported using RF with selected features. The scalability of RF using Spark MLlib and WEKA framework for both training and application stages was measured. The results show significantly better performances of Spark in terms of scalability and computing times.
引用
收藏
页码:30405 / 30434
页数:29
相关论文
共 50 条
  • [21] Real-time E-Commerce Comment Classification Using Big Data Processing
    Binh-Hau Tran
    Trong-Hop Do
    38TH INTERNATIONAL CONFERENCE ON INFORMATION NETWORKING, ICOIN 2024, 2024, : 546 - 549
  • [22] A Scalable Software Framework for Real-Time Data Processing in the Railway Environment
    Bhatti, Jabran
    Van Den Wouwer, Dirk
    Kerckhove, Wannes
    Dupont, Thomas
    Volckaert, Bruno
    2016 IEEE INTERNATIONAL CONFERENCE ON INTELLIGENT RAIL TRANSPORTATION (ICIRT), 2016, : 170 - 176
  • [23] Real-time traffic congestion prediction using big data and machine learning techniques
    Chawla, Priyanka
    Hasurkar, Rutuja
    Bogadi, Chaithanya Reddy
    Korlapati, Naga Sindhu
    Rajendran, Rajasree
    Ravichandran, Sindu
    Tolem, Sai Chaitanya
    Gao, Jerry Zeyu
    WORLD JOURNAL OF ENGINEERING, 2024, 21 (01) : 140 - 155
  • [24] A Scalable Machine Learning Online Service for Big Data Real-Time Analysis
    Baldominos, Alejandro
    Albacete, Esperanza
    Saez, Yago
    Isasi, Pedro
    2014 IEEE SYMPOSIUM ON COMPUTATIONAL INTELLIGENCE IN BIG DATA (CIBD), 2014, : 112 - 119
  • [25] Real-Time DDoS Attack Detection System Using Big Data Approach
    Awan, Mazhar Javed
    Farooq, Umar
    Babar, Hafiz Muhammad Aqeel
    Yasin, Awais
    Nobanee, Haitham
    Hussain, Muzammil
    Hakeem, Owais
    Zain, Azlan Mohd
    SUSTAINABILITY, 2021, 13 (19)
  • [26] On the use of IoT and Big Data Technologies for Real-time Monitoring and Data Processing
    Nait Maleka, Y.
    Kharbouch, A.
    El Khoukhi, H.
    Bakhouya, M.
    De Florio, V.
    El Ouadghiri, D.
    Latre, S.
    Blondia, C.
    8TH INTERNATIONAL CONFERENCE ON EMERGING UBIQUITOUS SYSTEMS AND PERVASIVE NETWORKS (EUSPN 2017) / 7TH INTERNATIONAL CONFERENCE ON CURRENT AND FUTURE TRENDS OF INFORMATION AND COMMUNICATION TECHNOLOGIES IN HEALTHCARE (ICTH-2017) / AFFILIATED WORKSHOPS, 2017, 113 : 429 - 434
  • [27] Near real-time big-data processing for data driven applications
    Kampars, Janis
    Grabis, Janis
    2017 3RD INTERNATIONAL CONFERENCE ON BIG DATA INNOVATIONS AND APPLICATIONS (INNOVATE-DATA), 2017, : 35 - 42
  • [28] IoT and Big Data Technologies for Monitoring and Processing Real-Time Healthcare Data
    Kharbouch, Abdelhak
    Naitmalek, Youssef
    Elkhoukhi, Hamza
    Bakhouya, Mohamed
    De Florio, Vincenzo
    Driss El Ouadghiri, Moulay
    Latre, Steven
    Blondia, Chris
    INTERNATIONAL JOURNAL OF DISTRIBUTED SYSTEMS AND TECHNOLOGIES, 2019, 10 (04) : 17 - 30
  • [29] A scalable multiprocessor for real-time signal processing
    Scherrer, D
    Eberle, P
    PARALLEL AND DISTRIBUTED PROCESSING, 1998, 1388 : 902 - 907
  • [30] Scalable, real-time, image processing pipeline
    Delft Univ of Technology, Delft, Netherlands
    Mach Vision Appl, 2 (110-121):