A machine learning based data modeling for medical diagnosis

被引:11
|
作者
Mahoto, Naeem Ahmed [1 ]
Shaikh, Asadullah [2 ]
Sulaiman, Adel [2 ]
Reshan, Mana Saleh Al [2 ]
Rajab, Adel [2 ]
Rajab, Khairan [2 ]
机构
[1] Mehran Univ Engn & Technol, Dept Software Engn, Jamshoro 76062, Sindh, Pakistan
[2] Najran Univ, Coll Comp Sci & Informat Syst, Najran 61441, Saudi Arabia
关键词
Machine learning; Medical data; Classification; Predictive models; CLASSIFICATION;
D O I
10.1016/j.bspc.2022.104481
中图分类号
R318 [生物医学工程];
学科分类号
0831 ;
摘要
High-dimensional medical data makes prediction a complex and difficult task. This study aims at modeling predictive models for medical data. Two datasets of medical data are applied in the study - one online available dataset (Heart Disease data) and another real clinical dataset (Eye Infection Data). A wide range of machine learning algorithms are applied in the modeling stage: Decision Tree, Multilayer Perceptron, Naive Bayesian, Random Forest, and Support Vector Machine. Furthermore, bagging and voting ensemble methods have also been applied with base learners. Both split and cross-validation methods are adopted for the model validation, and well-established evaluation metrics such as accuracy, precision, recall, and F-measure have been considered as evaluation metrics for the predictive models. The method applied for the modeling is comprised of two stages. The first stage uses available features for the predictions. In the second stage, selected features based on positive correlation are used. The adopted method is also for deep learning, especially Convolutional Neural Network (CNN) is applied to analyze the outcomes compared to conventional machine learning algorithms. The experimental results reveal that better predictions are achieved in the second stage. Besides, experiments also indicate split percentage produces better predictive models, and marginally better outcomes are observed in the presence of ensemble methods in comparison with base models. NB outperformed other algorithms with the highest accuracy rate as 88.90%, and MLP obtained 97.50% accuracy for Heart Disease and Eye Infection data, respectively, using 80-20 splits in the second stage. However, the CNN model performed poorly due to the size of the considered datasets.
引用
收藏
页数:15
相关论文
共 50 条
  • [21] Diagnosis of machining outcomes based on machine learning with Logical Analysis of Data
    Shaban, Yasser
    Yacout, Soumaya
    Balazinski, Marek
    Meshreki, Mouhab
    Attia, Helmi
    2015 INTERNATIONAL CONFERENCE ON INDUSTRIAL ENGINEERING AND OPERATIONS MANAGEMENT (IEOM), 2015,
  • [22] Breast cancer diagnosis based on genomic data and extreme learning machine
    Niloofar Jazayeri
    Hedieh Sajedi
    SN Applied Sciences, 2020, 2
  • [23] A Jacobian Matrix-Based Learning Machine and Its Applications in Medical Diagnosis
    Su, Mu-Chun
    Hsieh, Yi-Zeng
    Wang, Chen-Hsu
    Wang, Pa-Chun
    IEEE ACCESS, 2017, 5 : 20036 - 20045
  • [24] Evolution of Machine Learning in Tuberculosis Diagnosis: A Review of Deep Learning-Based Medical Applications
    Singh, Manisha
    Pujar, Gurubasavaraj Veeranna
    Kumar, Sethu Arun
    Bhagyalalitha, Meduri
    Akshatha, Handattu Shankaranarayana
    Abuhaija, Belal
    Alsoud, Anas Ratib
    Abualigah, Laith
    Beeraka, Narasimha M.
    Gandomi, Amir H.
    ELECTRONICS, 2022, 11 (17)
  • [25] Three dimensional visualization modeling algorithm for medical images based on machine learning
    Wang, Xiaojuan
    Wei, Yuntao
    Wei, Yuntao (weiyuntao3@163.com), 1600, Codon Publications (32): : 57 - 68
  • [26] New Machine Learning based Approach for Predictive Modeling on Spatial Data
    Gangappa, M.
    Mai, C. Kiran
    Sammulal, P.
    2017 INTERNATIONAL CONFERENCE ON ADVANCES IN COMPUTING, COMMUNICATIONS AND INFORMATICS (ICACCI), 2017, : 1654 - 1659
  • [27] Data modeling in machine learning based on information-theoretic measures
    Liu, YH
    Li, AJ
    Luo, SW
    2002 INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND CYBERNETICS, VOLS 1-4, PROCEEDINGS, 2002, : 1219 - 1222
  • [28] Environmental data mining and modeling based on machine learning algorithms and geostatistics
    Kanevski, M
    Parkin, R
    Pozdnukhov, A
    Timonin, V
    Maignan, M
    Demyanov, V
    Canu, S
    ENVIRONMENTAL MODELLING & SOFTWARE, 2004, 19 (09) : 845 - 855
  • [29] Machine Learning and Data Mining in Medical Imaging
    Shen, Dinggang
    Zhang, Daoqiang
    Young, Alastair
    Parvin, Bahram
    IEEE JOURNAL OF BIOMEDICAL AND HEALTH INFORMATICS, 2015, 19 (05) : 1587 - 1588
  • [30] Preparing Medical Imaging Data for Machine Learning
    Willemink, Martin J.
    Koszek, Wojciech A.
    Hardell, Cailin
    Wu, Jie
    Fleischmann, Dominik
    Harvey, Hugh
    Folio, Les R.
    Summers, Ronald M.
    Rubin, Daniel L.
    Lungren, Matthew P.
    RADIOLOGY, 2020, 295 (01) : 4 - 15