Using Hypothesis-Led Machine Learning and Hierarchical Cluster Analysis to Identify Disease Pathways Prior to Dementia: Longitudinal Cohort Study

被引:7
作者
Huang, Shih-Tsung [1 ,2 ]
Hsiao, Fei-Yuan [3 ,4 ,5 ]
Tsai, Tsung-Hsien [6 ]
Chen, Pei-Jung [6 ]
Peng, Li-Ning [2 ,7 ]
Chen, Liang-Kung [2 ,7 ,8 ]
机构
[1] Natl Yang Ming Chiao Tung Univ, Dept Pharm, Taipei, Taiwan
[2] Natl Yang Ming Chiao Tung Univ, Ctr Hlth Longev & Aging Sci, Taipei, Taiwan
[3] Natl Taiwan Univ, Grad Inst Clin Pharm, Coll Med, Taipei, Taiwan
[4] Natl Taiwan Univ, Coll Med, Sch Pharm, Taipei, Taiwan
[5] Natl Taiwan Univ Hosp, Dept Pharm, Taipei, Taiwan
[6] Acer, Adv Tech Business Unit, New Taipei, Taiwan
[7] Taipei Vet Gen Hosp, Ctr Geriatr & Gerontol, Taipei, Taiwan
[8] Taipei Vet Gen Hosp, Taipei Municipal Gan Dau Hosp, Taipei, Taiwan
关键词
dementia; machine learning; cluster analysis; disease; condition; symptoms; data; data set; cardiovascular; neuropsychiatric; infection; mobility; mental conditions; development; COGNITIVE IMPAIRMENT; CAROTID STENOSIS; RISK; ASSOCIATIONS; POPULATION; ADULTS;
D O I
10.2196/41858
中图分类号
R19 [保健组织与事业(卫生事业管理)];
学科分类号
摘要
Background: Dementia development is a complex process in which the occurrence and sequential relationships of different diseases or conditions may construct specific patterns leading to incident dementia.Objective: This study aimed to identify patterns of disease or symptom clusters and their sequences prior to incident dementia using a novel approach incorporating machine learning methods. Methods: Using Taiwan's National Health Insurance Research Database, data from 15,700 older people with dementia and 15,700 nondementia controls matched on age, sex, and index year (n=10,466, 67% for the training data set and n=5234, 33% for the testing data set) were retrieved for analysis. Using machine learning methods to capture specific hierarchical disease triplet clusters prior to dementia, we designed a study algorithm with four steps: (1) data preprocessing, (2) disease or symptom pathway selection, (3) model construction and optimization, and (4) data visualization. Results: Among 15,700 identified older people with dementia, 10,466 and 5234 subjects were randomly assigned to the training and testing data sets, and 6215 hierarchical disease triplet clusters with positive correlations with dementia onset were identified. We subsequently generated 19,438 features to construct prediction models, and the model with the best performance was support vector machine (SVM) with the by-group LASSO (least absolute shrinkage and selection operator) regression method (total corresponding features=2513; accuracy=0.615; sensitivity=0.607; specificity=0.622; positive predictive value=0.612; negative predictive value=0.619; area under the curve=0.639). In total, this study captured 49 hierarchical disease triplet clusters related to dementia development, and the most characteristic patterns leading to incident dementia started with cardiovascular conditions (mainly hypertension), cerebrovascular disease, mobility disorders, or infections, followed by neuropsychiatric conditions.Conclusions: Dementia development in the real world is an intricate process involving various diseases or conditions, their co-occurrence, and sequential relationships. Using a machine learning approach, we identified 49 hierarchical disease triplet clusters with leading roles (cardio-or cerebrovascular disease) and supporting roles (mental conditions, locomotion difficulties, infections, and nonspecific neurological conditions) in dementia development. Further studies using data from other countries are needed to validate the prediction algorithms for dementia development, allowing the development of comprehensive strategies to prevent or care for dementia in the real world.
引用
收藏
页数:16
相关论文
共 50 条
  • [41] Application of machine learning for delirium prediction and analysis of associated factors in hospitalized COVID-19 patients: A comparative study using the Korean Multidisciplinary cohort for delirium prevention (KoMCoDe)
    Park, Hye Yoon
    Sohn, Hyoju
    Hong, Arum
    Han, Soo Wan
    Jang, Yuna
    Yoon, Ekyong
    Kim, Myeongju
    Park, Hye Youn
    INTERNATIONAL JOURNAL OF MEDICAL INFORMATICS, 2025, 195
  • [42] Utilizing machine learning for survival analysis to identify risk factors for COVID-19 intensive care unit admission: A retrospective cohort study from the United Arab Emirates
    AlShehhi, Aamna
    Almansoori, Taleb M.
    Alsuwaidi, Ahmed R.
    Alblooshi, Hiba
    PLOS ONE, 2024, 19 (01):
  • [43] Predictive model for acute respiratory distress syndrome events in ICU patients in China using machine learning algorithms: a secondary analysis of a cohort study
    Xian-Fei Ding
    Jin-Bo Li
    Huo-Yan Liang
    Zong-Yu Wang
    Ting-Ting Jiao
    Zhuang Liu
    Liang Yi
    Wei-Shuai Bian
    Shu-Peng Wang
    Xi Zhu
    Tong-Wen Sun
    Journal of Translational Medicine, 17
  • [44] Using a multi-staged strategy based on machine learning and mathematical modeling to predict genotype-phenotype risk patterns in diabetic kidney disease: a prospective case-control cohort analysis
    Leung, Ross K. K.
    Wang, Ying
    Ma, Ronald C. W.
    Luk, Andrea O. Y.
    Lam, Vincent
    Ng, Maggie
    So, Wing Yee
    Tsui, Stephen K. W.
    Chan, Juliana C. N.
    BMC NEPHROLOGY, 2013, 14
  • [45] Dynamics and tipping point of issue attention in newspapers: quantitative and qualitative content analysis at sentence level in a longitudinal study using supervised machine learning and big data
    Opperhuizen A.E.
    Schouten K.
    Quality & Quantity, 2021, 55 (1) : 19 - 37
  • [46] Horizontal analysis and longitudinal cohort study of chronic renal failure correlates and cerebral small vessel disease relationship using peak width of skeletonized mean diffusivity
    Wang, Dan
    Sun, Zheng
    Li, Yuehua
    FRONTIERS IN NEUROLOGY, 2024, 15
  • [47] Machine learning-based analysis for prediction of surgical necrotizing enterocolitis in very low birth weight infants using perinatal factors: a nationwide cohort study
    Seung Hyun Kim
    Yoon Ju Oh
    Joonhyuk Son
    Donggoo Jung
    Daehyun Kim
    Soo Rack Ryu
    Jae Yoon Na
    Jae Kyoon Hwang
    Tae Hyun Kim
    Hyun-Kyung Park
    European Journal of Pediatrics, 2024, 183 : 2743 - 2751
  • [48] Machine learning-based analysis for prediction of surgical necrotizing enterocolitis in very low birth weight infants using perinatal factors: a nationwide cohort study
    Kim, Seung Hyun
    Oh, Yoon Ju
    Son, Joonhyuk
    Jung, Donggoo
    Kim, Daehyun
    Ryu, Soo Rack
    Na, Jae Yoon
    Hwang, Jae Kyoon
    Kim, Tae Hyun
    Park, Hyun-Kyung
    EUROPEAN JOURNAL OF PEDIATRICS, 2024, 183 (06) : 2743 - 2751
  • [49] Using a multi-staged strategy based on machine learning and mathematical modeling to predict genotype-phenotype risk patterns in diabetic kidney disease: a prospective case–control cohort analysis
    Ross KK Leung
    Ying Wang
    Ronald CW Ma
    Andrea OY Luk
    Vincent Lam
    Maggie Ng
    Wing Yee So
    Stephen KW Tsui
    Juliana CN Chan
    BMC Nephrology, 14
  • [50] Prediction of type 2 diabetes using genome-wide polygenic risk score and metabolic profiles: A machine learning analysis of population-based 10-year prospective cohort study
    Hahn, Seok-Ju
    Kim, Suhyeon
    Choi, Young Sik
    Lee, Junghye
    Kang, Jihun
    EBIOMEDICINE, 2022, 86