Using Hypothesis-Led Machine Learning and Hierarchical Cluster Analysis to Identify Disease Pathways Prior to Dementia: Longitudinal Cohort Study

被引:7
|
作者
Huang, Shih-Tsung [1 ,2 ]
Hsiao, Fei-Yuan [3 ,4 ,5 ]
Tsai, Tsung-Hsien [6 ]
Chen, Pei-Jung [6 ]
Peng, Li-Ning [2 ,7 ]
Chen, Liang-Kung [2 ,7 ,8 ]
机构
[1] Natl Yang Ming Chiao Tung Univ, Dept Pharm, Taipei, Taiwan
[2] Natl Yang Ming Chiao Tung Univ, Ctr Hlth Longev & Aging Sci, Taipei, Taiwan
[3] Natl Taiwan Univ, Grad Inst Clin Pharm, Coll Med, Taipei, Taiwan
[4] Natl Taiwan Univ, Coll Med, Sch Pharm, Taipei, Taiwan
[5] Natl Taiwan Univ Hosp, Dept Pharm, Taipei, Taiwan
[6] Acer, Adv Tech Business Unit, New Taipei, Taiwan
[7] Taipei Vet Gen Hosp, Ctr Geriatr & Gerontol, Taipei, Taiwan
[8] Taipei Vet Gen Hosp, Taipei Municipal Gan Dau Hosp, Taipei, Taiwan
关键词
dementia; machine learning; cluster analysis; disease; condition; symptoms; data; data set; cardiovascular; neuropsychiatric; infection; mobility; mental conditions; development; COGNITIVE IMPAIRMENT; CAROTID STENOSIS; RISK; ASSOCIATIONS; POPULATION; ADULTS;
D O I
10.2196/41858
中图分类号
R19 [保健组织与事业(卫生事业管理)];
学科分类号
摘要
Background: Dementia development is a complex process in which the occurrence and sequential relationships of different diseases or conditions may construct specific patterns leading to incident dementia.Objective: This study aimed to identify patterns of disease or symptom clusters and their sequences prior to incident dementia using a novel approach incorporating machine learning methods. Methods: Using Taiwan's National Health Insurance Research Database, data from 15,700 older people with dementia and 15,700 nondementia controls matched on age, sex, and index year (n=10,466, 67% for the training data set and n=5234, 33% for the testing data set) were retrieved for analysis. Using machine learning methods to capture specific hierarchical disease triplet clusters prior to dementia, we designed a study algorithm with four steps: (1) data preprocessing, (2) disease or symptom pathway selection, (3) model construction and optimization, and (4) data visualization. Results: Among 15,700 identified older people with dementia, 10,466 and 5234 subjects were randomly assigned to the training and testing data sets, and 6215 hierarchical disease triplet clusters with positive correlations with dementia onset were identified. We subsequently generated 19,438 features to construct prediction models, and the model with the best performance was support vector machine (SVM) with the by-group LASSO (least absolute shrinkage and selection operator) regression method (total corresponding features=2513; accuracy=0.615; sensitivity=0.607; specificity=0.622; positive predictive value=0.612; negative predictive value=0.619; area under the curve=0.639). In total, this study captured 49 hierarchical disease triplet clusters related to dementia development, and the most characteristic patterns leading to incident dementia started with cardiovascular conditions (mainly hypertension), cerebrovascular disease, mobility disorders, or infections, followed by neuropsychiatric conditions.Conclusions: Dementia development in the real world is an intricate process involving various diseases or conditions, their co-occurrence, and sequential relationships. Using a machine learning approach, we identified 49 hierarchical disease triplet clusters with leading roles (cardio-or cerebrovascular disease) and supporting roles (mental conditions, locomotion difficulties, infections, and nonspecific neurological conditions) in dementia development. Further studies using data from other countries are needed to validate the prediction algorithms for dementia development, allowing the development of comprehensive strategies to prevent or care for dementia in the real world.
引用
收藏
页数:16
相关论文
共 50 条
  • [1] Development and validation of a convenient dementia risk prediction tool for diabetic population: A large and longitudinal machine learning cohort study
    Yang, Pei
    Xiao, Xuan
    Li, Yihui
    Cao, Xu
    Li, Maiping
    Liu, Xinting
    Gong, Lianggeng
    Liu, Feng
    Dai, Xi-jian
    JOURNAL OF AFFECTIVE DISORDERS, 2025, 380 : 298 - 307
  • [2] Prediction of depression cases, incidence, and chronicity in a large occupational cohort using machine learning techniques: an analysis of the ELSA-Brasil study
    Librenza-Garcia, Diego
    Passos, Ives Cavalcante
    Feiten, Jacson Gabriel
    Lotufo, Paulo A.
    Goulart, Alessandra C.
    de Souza Santos, Itamar
    Viana, Maria Carmen
    Bensenor, Isabela M.
    Brunoni, Andre Russowsky
    PSYCHOLOGICAL MEDICINE, 2021, 51 (16) : 2895 - 2903
  • [3] Using Machine Learning Models to Identify Factors Associated With 30-Day Readmissions After Posterior Cervical Fusions: A Longitudinal Cohort Study
    Gonzalez-Suarez, Aneysis D.
    Rezaii, Paymon G.
    Herrick, Daniel
    Tigchelaar, Seth Stravers
    Ratliff, John K.
    Rusu, Mirabela
    Scheinker, David
    Jeon, Ikchan
    Desai, Atman M.
    NEUROSPINE, 2024, 21 (02) : 620 - 632
  • [4] Development and validation of a novel predictive model for dementia risk in middle-aged and elderly depression individuals: a large and longitudinal machine learning cohort study
    Xuan Xiao
    Yihui Li
    Qiaoboyang Wu
    Xinting Liu
    Xu Cao
    Maiping Li
    Jianjing Liu
    Lianggeng Gong
    Xi-jian Dai
    Alzheimer's Research & Therapy, 17 (1)
  • [5] SVM Based Machine Learning Approach to Identify Parkinson's Disease Using Gait Analysis
    Shetty, Sachin
    Rao, Y. S.
    2016 INTERNATIONAL CONFERENCE ON INVENTIVE COMPUTATION TECHNOLOGIES (ICICT), VOL 2, 2016, : 437 - +
  • [6] Predicting the Development of Type 2 Diabetes in a Large Australian Cohort Using Machine-Learning Techniques: Longitudinal Survey Study
    Zhang, Lei
    Shang, Xianwen
    Sreedharan, Subhashaan
    Yan, Xixi
    Liu, Jianbin
    Keel, Stuart
    Wu, Jinrong
    Peng, Wei
    He, Mingguang
    JMIR MEDICAL INFORMATICS, 2020, 8 (07)
  • [7] Predictive model and risk analysis for diabetic retinopathy using machine learning: a retrospective cohort study in China
    Li, Wanyue
    Song, Yanan
    Chen, Kang
    Ying, Jun
    Zheng, Zhong
    Qiao, Shen
    Yang, Ming
    Zhang, Maonian
    Zhang, Ying
    BMJ OPEN, 2021, 11 (11):
  • [8] Error and Timeliness Analysis for Using Machine Learning to Predict Asthma Hospital Visits: Retrospective Cohort Study
    Zhang, Xiaoyi
    Luo, Gang
    JMIR MEDICAL INFORMATICS, 2022, 10 (06)
  • [9] Machine learning analysis to identify the association between risk factors and onset of nosocomial diarrhea: a retrospective cohort study
    Kurisu, Ken
    Yoshiuchi, Kazuhiro
    Ogino, Kei
    Oda, Toshimi
    PEERJ, 2019, 7
  • [10] Establishment of a machine learning predictive model for non-alcoholic fatty liver disease: A longitudinal cohort study
    Cao, Tengrui
    Zhu, Qian
    Tong, Chao
    Halengbieke, Aheyeerke
    Ni, Xuetong
    Tang, Jianmin
    Han, Yumei
    Li, Qiang
    Yang, Xinghua
    NUTRITION METABOLISM AND CARDIOVASCULAR DISEASES, 2024, 34 (06) : 1456 - 1466