A machine learning-based data mining in medical examination data: a biological features-based biological age prediction model

被引:8
作者
Yang, Qing [1 ]
Gao, Sunan [2 ]
Lin, Junfen [1 ]
Lyu, Ke [3 ]
Wu, Zexu [3 ]
Chen, Yuhao [3 ]
Qiu, Yinwei [1 ]
Zhao, Yanrong [1 ]
Wang, Wei [1 ]
Lin, Tianxiang [1 ]
Pan, Huiyun [4 ]
Chen, Ming [3 ,4 ]
机构
[1] Zhejiang Prov Ctr Dis Control & Prevent, Hangzhou 310051, Peoples R China
[2] Zhejiang Univ, Coll Biosyst Engn & Food Sci, Hangzhou 310058, Peoples R China
[3] Zhejiang Univ, Coll Life Sci, Hangzhou 310058, Peoples R China
[4] Zhejiang Univ, Affiliated Hosp 1, Sch Med, Hangzhou 310058, Peoples R China
关键词
Biological age; Biological features; Machine learning; Interpolation; Stacking; Health status; BLOOD-PRESSURE; BODY HEIGHT; MORTALITY; POPULATION; BIOMARKERS; ADULTS; AUTOENCODERS; CAPACITY; REVEAL; TRENDS;
D O I
10.1186/s12859-022-04966-7
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Background Biological age (BA) has been recognized as a more accurate indicator of aging than chronological age (CA). However, the current limitations include: insufficient attention to the incompleteness of medical data for constructing BA; Lack of machine learning-based BA (ML-BA) on the Chinese population; Neglect of the influence of model overfitting degree on the stability of the association results. Methods and results Based on the medical examination data of the Chinese population (45-90 years), we first evaluated the most suitable missing interpolation method, then constructed 14 ML-BAs based on biomarkers, and finally explored the associations between ML-BAs and health statuses (healthy risk indicators and disease). We found that round-robin linear regression interpolation performed best, while AutoEncoder showed the highest interpolation stability. We further illustrated the potential overfitting problem in ML-BAs, which affected the stability of ML-Bas' associations with health statuses. We then proposed a composite ML-BA based on the Stacking method with a simple meta-model (STK-BA), which overcame the overfitting problem, and associated more strongly with CA (r = 0.66, P < 0.001), healthy risk indicators, disease counts, and six types of disease. Conclusion We provided an improved aging measurement method for middle-aged and elderly groups in China, which can more stably capture aging characteristics other than CA, supporting the emerging application potential of machine learning in aging research.
引用
收藏
页数:23
相关论文
共 78 条
  • [1] Development of models for predicting biological age (BA) with physical, biochemical, and hormonal parameters
    Bae, Chul-Young
    Kang, Young Gon
    Kim, Sehyun
    Cho, Chooyon
    Kang, Hee Cheol
    Yu, Byung Yeon
    Lee, Sang-Wha
    Cho, Kyung Hee
    Lee, Duk Chul
    Lee, Kyurae
    Kim, Jong Sun
    Shin, Kyung Kyun
    [J]. ARCHIVES OF GERONTOLOGY AND GERIATRICS, 2008, 47 (02) : 253 - 265
  • [2] Comparison of Biological Age Prediction Models Using Clinical Biomarkers Commonly Measured in Clinical Practice Settings: AI Techniques Vs. Traditional Statistical Methods
    Bae, Chul-Young
    Im, Yoori
    Lee, Jonghoon
    Park, Choong-Shik
    Kim, Miyoung
    Kwon, Hojeong
    Kim, Boseon
    Park, Hye ri
    Lee, Chun-Koo
    Kim, Inhee
    Kim, Jeonghoon
    [J]. FRONTIERS IN ANALYTICAL SCIENCE, 2021, 1
  • [3] Characterizing and Managing Missing Structured Data in Electronic Health Records: Data Analysis
    Beaulieu-Jones, Brett K.
    Lavage, Daniel R.
    Snyder, John W.
    Moore, Jason H.
    Pendergrass, Sarah A.
    Bauer, Christopher R.
    [J]. JMIR MEDICAL INFORMATICS, 2018, 6 (01)
  • [4] DNA methylation aging clocks: challenges and recommendations
    Bell, Christopher G.
    Lowe, Robert
    Adams, Peter D.
    Baccarelli, Andrea A.
    Beck, Stephan
    Bell, Jordana T.
    Christensen, Brock C.
    Gladyshev, Vadim N.
    Heijmans, Bastiaan T.
    Horvath, Steve
    Ideker, Trey
    Issa, Jean-Pierre J.
    Kelsey, Karl T.
    Marioni, Riccardo E.
    Reik, Wolf
    Relton, Caroline L.
    Schalkwyk, Leonard C.
    Teschendorff, Andrew E.
    Wagner, Wolfgang
    Zhang, Kang
    Rakyan, Vardhman K.
    [J]. GENOME BIOLOGY, 2019, 20 (01)
  • [5] PhotoAgeClock: deep learning algorithms for development of non-invasive visual biomarkers of aging
    Bobrov, Eugene
    Georgievskaya, Anastasia
    Kiselev, Konstantin
    Sevastopolsky, Artem
    Zhavoronkov, Alex
    Gurov, Sergey
    Rudakov, Konstantin
    Tobar, Maria del Pilar Bonilla
    Jaspers, Soren
    Clemann, Sven
    [J]. AGING-US, 2018, 10 (11): : 3249 - 3259
  • [6] A Machine Learning-Based Aging Measure Among Middle-Aged and Older Chinese Adults: The China Health and Retirement Longitudinal Study
    Cao, Xingqi
    Yang, Guanglai
    Jin, Xurui
    He, Liu
    Li, Xueqin
    Zheng, Zhoutao
    Liu, Zuyun
    Wu, Chenkai
    [J]. FRONTIERS IN MEDICINE, 2021, 8
  • [8] Applying Ant Colony Optimization to configuring stacking ensembles for data mining
    Chen, Yijun
    Wong, Man-Leung
    Li, Haibing
    [J]. EXPERT SYSTEMS WITH APPLICATIONS, 2014, 41 (06) : 2688 - 2702
  • [9] Handling data irregularities in classification: Foundations, trends, and future challenges
    Das, Swagatam
    Datta, Shounak
    Chaudhuri, Bidyut B.
    [J]. PATTERN RECOGNITION, 2018, 81 : 674 - 693
  • [10] de Silva H, 2016, INT CONF ADV ICT, P141, DOI 10.1109/ICTER.2016.7829911