Development of machine learning model for diagnostic disease prediction based on laboratory tests

被引：98

作者：

Park, Dong Jin ^{[1
]}

Park, Min Woo ^{[2
]}

Lee, Homin ^{[3
]}

Kim, Young-Jin ^{[4
]}

Kim, Yeongsic ^{[5
]}

Park, Young Hoon ^{[6
]}

机构：

[1] Ewha Womans Univ Korea, Coll Med, Dept Lab Med, Seoul, South Korea

[2] Catholic Univ Korea, Dept Lab Med, St Vincents Hosp, Seoul, South Korea

[3] Dept Res, Future Lab, Seoul, South Korea

[4] Pusan Natl Univ, Finance Fishery Manufacture Ind Math Ctr Big Data, Pusan, South Korea

[5] Catholic Univ Korea, Coll Med, Dept Lab Med, Seoul, South Korea

[6] Catholic Univ Korea, Coll Med, Dept Internal Med, Div Hematol, Seoul, South Korea

来源：

SCIENTIFIC REPORTS | 2021年 / 11卷 / 01期

关键词：

CONVOLUTIONAL NEURAL-NETWORKS; RANDOM FOREST; DEEP; SEQUENCE; GENE; CLASSIFICATION; REGULARIZATION; HEPATITIS;

D O I：

10.1038/s41598-021-87171-5

中图分类号：

O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];

学科分类号：

07 ; 0710 ; 09 ;

摘要：

The use of deep learning and machine learning (ML) in medical science is increasing, particularly in the visual, audio, and language data fields. We aimed to build a new optimized ensemble model by blending a DNN (deep neural network) model with two ML models for disease prediction using laboratory test results. 86 attributes (laboratory tests) were selected from datasets based on value counts, clinical importance-related features, and missing values. We collected sample datasets on 5145 cases, including 326,686 laboratory test results. We investigated a total of 39 specific diseases based on the International Classification of Diseases, 10th revision (ICD-10) codes. These datasets were used to construct light gradient boosting machine (LightGBM) and extreme gradient boosting (XGBoost) ML models and a DNN model using TensorFlow. The optimized ensemble model achieved an F1-score of 81% and prediction accuracy of 92% for the five most common diseases. The deep learning and ML models showed differences in predictive power and disease classification patterns. We used a confusion matrix and analyzed feature importance using the SHAP value method. Our new ML model achieved high efficiency of disease prediction through classification of diseases. This study will be useful in the prediction and diagnosis of diseases.

引用

页数：11

共 46 条

[1] Deep convolutional neural networks for mammography: advances, challenges and applications [J].

Abdelhafiz, Dina ;

Yang, Clifford ;

Ammar, Reda ;

Nabavi, Sheida .

BMC BIOINFORMATICS, 2019, 20 (Suppl 11)

[2]

[Anonymous], 2019, IEEE ACM T COMPUT BI

[3] Deep Learning in Cardiology [J].

Bizopoulos, Paschalis ;

Koutsouris, Dimitrios .

IEEE REVIEWS IN BIOMEDICAL ENGINEERING, 2019, 12 :168-193

[4] A comparison of machine learning algorithms and covariate balance measures for propensity score matching and weighting [J].

Cannas, Massimo ;

Arpino, Bruno .

BIOMETRICAL JOURNAL, 2019, 61 (04) :1049-1072

[5] Multilayer perceptron architecture optimization using parallel computing techniques [J].

Castro, Wilson ;

Oblitas, Jimy ;

Santa-Cruz, Roberto ;

Avila-George, Himer .

PLOS ONE, 2017, 12 (12)

[6] Hepatitis B flares in chronic hepatitis B: Pathogenesis, natural course, and management [J].

Chang, Ming-Ling ;

Liaw, Yun-Fan .

JOURNAL OF HEPATOLOGY, 2014, 61 (06) :1407-1417

[7] Machine Learning for Predicting Patient Wait Times and Appointment Delays [J].

Curtis, Catherine ;

Liu, Chang ;

Bollerman, Thomas J. ;

Pianykh, Oleg S. .

JOURNAL OF THE AMERICAN COLLEGE OF RADIOLOGY, 2018, 15 (09) :1310-1316

[8] A review of machine learning in obesity [J].

DeGregory, K. W. ;

Kuiper, P. ;

DeSilvio, T. ;

Pleuss, J. D. ;

Miller, R. ;

Roginski, J. W. ;

Fisher, C. B. ;

Harness, D. ;

Viswanath, S. ;

Heymsfield, S. B. ;

Dungan, I. ;

Thomas, D. M. .

OBESITY REVIEWS, 2018, 19 (05) :668-685

[9] PDRLGB: precise DNA-binding residue prediction using a light gradient boosting machine [J].

Deng, Lei ;

Pan, Juan ;

Xu, Xiaojie ;

Yang, Wenyi ;

Liu, Chuyao ;

Liu, Hui .

BMC BIOINFORMATICS, 2018, 19

[10] Gene selection and classification of microarray data using random forest -: art. no. 3 [J].

Díaz-Uriarte, R ;

de Andrés, SA .

BMC BIOINFORMATICS, 2006, 7 (1)

← 1 2 3 4 5 →