A scalable and real-time system for disease prediction using big data processing

被引:0
作者
Abderrahmane Ed-daoudy
Khalil Maalmi
Aziza El Ouaazizi
机构
[1] National School of Applied Sciences (ENSA),Artificial Intelligence, Data Sciences and Emerging Systems Laboratory (LIASSE)
[2] Sidi Mohamed Ben Abdellah University,undefined
来源
Multimedia Tools and Applications | 2023年 / 82卷
关键词
Real-time; Streaming processing; Machine learning; MLlib; Apache Spark; Tweet processing;
D O I
暂无
中图分类号
学科分类号
摘要
The growing chronic diseases patients and the centralization of medical resources cause significant economic impact resulting in hospital visits, hospital readmission, and other healthcare costs. This paper proposes a scalable and real-time system for disease prediction from medical data streams. This is carried out by integrating Twitter, Apache Kafka, Apache Spark and Apache Cassandra. Thus, Twitter users tweet attributes related to health, Kafka streaming receives all desired tweets attributes and ingest them to Spark streaming. Here, a machine learning algorithm is applied to predict health status and send back a response message through Kafka. The heart disease dataset, obtained from the UCI repository, was used for experiments. In order to enhance prediction accuracy, Relief algorithm is used for features selection. We compared sex types of relevant machine learning algorithms implemented by Spark MLlib such as Random Forest (RF), Naive Bayes, Support Vector Machine, Multilayer Perceptron, Decision Tree and Logistic Regression with the full features as well as selected features. The highest classification accuracy of 92.05% was reported using RF with selected features. The scalability of RF using Spark MLlib and WEKA framework for both training and application stages was measured. The results show significantly better performances of Spark in terms of scalability and computing times.
引用
收藏
页码:30405 / 30434
页数:29
相关论文
共 50 条
  • [21] Real-Time Taxi Demand Prediction using data from the web
    Markou, Ioulia
    Rodrigues, Filipe
    Pereira, Francisco C.
    2018 21ST INTERNATIONAL CONFERENCE ON INTELLIGENT TRANSPORTATION SYSTEMS (ITSC), 2018, : 1664 - 1671
  • [22] Real-time Predictive Maintenance for Wind Turbines Using Big Data Frameworks
    Canizo, Mikel
    Onieva, Enrique
    Conde, Angel
    Charramendieta, Santiago
    Trujillo, Salvador
    2017 IEEE INTERNATIONAL CONFERENCE ON PROGNOSTICS AND HEALTH MANAGEMENT (ICPHM), 2017, : 70 - 77
  • [23] A new Internet of Things architecture for real-time prediction of various diseases using machine learning on big data environment
    Abderrahmane Ed-daoudy
    Khalil Maalmi
    Journal of Big Data, 6
  • [24] A new Internet of Things architecture for real-time prediction of various diseases using machine learning on big data environment
    Ed-daoudy, Abderrahmane
    Maalmi, Khalil
    JOURNAL OF BIG DATA, 2019, 6 (01)
  • [25] AI-Based Stroke Disease Prediction System Using Real-Time Electromyography Signals
    Yu, Jaehak
    Park, Sejin
    Kwon, Soon-Hyun
    Ho, Chee Meng Benjamin
    Pyo, Cheol-Sig
    Lee, Hansung
    APPLIED SCIENCES-BASEL, 2020, 10 (19):
  • [26] Real-time big data analytics for hard disk drive predictive maintenance
    Su, Chuan-Jun
    Huang, Shi-Feng
    COMPUTERS & ELECTRICAL ENGINEERING, 2018, 71 : 93 - 101
  • [27] A Comparative Performance of Real-time Big Data Analytic Architectures
    Sanla, Apisit
    Numnonda, Thanisa
    PROCEEDINGS OF 2019 IEEE 9TH INTERNATIONAL CONFERENCE ON ELECTRONICS INFORMATION AND EMERGENCY COMMUNICATION (ICEIEC 2019), 2019, : 674 - 678
  • [28] Towards of a Real-time Big Data Architecture to Intensive Care
    Goncalves, Andre
    Portela, Filipe
    Santos, Manuel Filipe
    Rua, Fernando
    8TH INTERNATIONAL CONFERENCE ON EMERGING UBIQUITOUS SYSTEMS AND PERVASIVE NETWORKS (EUSPN 2017) / 7TH INTERNATIONAL CONFERENCE ON CURRENT AND FUTURE TRENDS OF INFORMATION AND COMMUNICATION TECHNOLOGIES IN HEALTHCARE (ICTH-2017) / AFFILIATED WORKSHOPS, 2017, 113 : 585 - 590
  • [29] An incremental approach for real-time Big Data visual analytics
    Garcia, Ignacio
    Casado, Ruben
    Bouchachia, Abdelhamid
    2016 IEEE 4TH INTERNATIONAL CONFERENCE ON FUTURE INTERNET OF THINGS AND CLOUD WORKSHOPS (FICLOUDW), 2016, : 177 - 182
  • [30] The SOLID architecture for real-time management of big semantic data
    Martinez-Prieto, Miguel A.
    Cuesta, Carlos E.
    Arias, Mario
    Fernandez, Javier D.
    FUTURE GENERATION COMPUTER SYSTEMS-THE INTERNATIONAL JOURNAL OF ESCIENCE, 2015, 47 : 62 - 79