A scalable and real-time system for disease prediction using big data processing

被引:0
作者
Abderrahmane Ed-daoudy
Khalil Maalmi
Aziza El Ouaazizi
机构
[1] National School of Applied Sciences (ENSA),Artificial Intelligence, Data Sciences and Emerging Systems Laboratory (LIASSE)
[2] Sidi Mohamed Ben Abdellah University,undefined
来源
Multimedia Tools and Applications | 2023年 / 82卷
关键词
Real-time; Streaming processing; Machine learning; MLlib; Apache Spark; Tweet processing;
D O I
暂无
中图分类号
学科分类号
摘要
The growing chronic diseases patients and the centralization of medical resources cause significant economic impact resulting in hospital visits, hospital readmission, and other healthcare costs. This paper proposes a scalable and real-time system for disease prediction from medical data streams. This is carried out by integrating Twitter, Apache Kafka, Apache Spark and Apache Cassandra. Thus, Twitter users tweet attributes related to health, Kafka streaming receives all desired tweets attributes and ingest them to Spark streaming. Here, a machine learning algorithm is applied to predict health status and send back a response message through Kafka. The heart disease dataset, obtained from the UCI repository, was used for experiments. In order to enhance prediction accuracy, Relief algorithm is used for features selection. We compared sex types of relevant machine learning algorithms implemented by Spark MLlib such as Random Forest (RF), Naive Bayes, Support Vector Machine, Multilayer Perceptron, Decision Tree and Logistic Regression with the full features as well as selected features. The highest classification accuracy of 92.05% was reported using RF with selected features. The scalability of RF using Spark MLlib and WEKA framework for both training and application stages was measured. The results show significantly better performances of Spark in terms of scalability and computing times.
引用
收藏
页码:30405 / 30434
页数:29
相关论文
共 109 条
[1]  
Abbasi A(2014)Social media analytics for smart health IEEE Intell Syst 29 60-80
[2]  
Adjeroh D(2016)A survey on big data analytics: challenges, open research issues and tools Int J Adv Comput Sci Appl 7 511-518
[3]  
Dredze M(2021)Real-time monitoring system for early prediction of heart disease using internet of things Soft Comput 25 12145-12158
[4]  
Paul MJ(2017)Disease prediction by machine learning over big data from healthcare communities IEEE Access 5 8869-8879
[5]  
Zahedi FM(2008)Mapreduce: simplified data processing on large clusters Commun ACM 51 107-113
[6]  
Zhao H(2019)A new internet of things architecture for real-time prediction of various diseases using machine learning on big data environment J Big Data 6 104-1154
[7]  
Walia N(2020)Real-time heart disease detection and monitoring system based on fast machine learning using spark Health and Technol 10 1145-302
[8]  
Jain H(2014)Sequential summarization: a full view of twitter trending topics. IEEE/ACM Transactions on Audio Speech Lang Process (TASLP) 22 293-2454
[9]  
Sanvanson P(2017)System framework for cardiovascular disease prediction based on big data technology Symmetry 9 293-116
[10]  
Shaker R(2012)Comparison of artificial neural networks with logistic regression for detection of obesity J Med Syst 36 2449-235