Using Hypothesis-Led Machine Learning and Hierarchical Cluster Analysis to Identify Disease Pathways Prior to Dementia: Longitudinal Cohort Study

被引:7
|
作者
Huang, Shih-Tsung [1 ,2 ]
Hsiao, Fei-Yuan [3 ,4 ,5 ]
Tsai, Tsung-Hsien [6 ]
Chen, Pei-Jung [6 ]
Peng, Li-Ning [2 ,7 ]
Chen, Liang-Kung [2 ,7 ,8 ]
机构
[1] Natl Yang Ming Chiao Tung Univ, Dept Pharm, Taipei, Taiwan
[2] Natl Yang Ming Chiao Tung Univ, Ctr Hlth Longev & Aging Sci, Taipei, Taiwan
[3] Natl Taiwan Univ, Grad Inst Clin Pharm, Coll Med, Taipei, Taiwan
[4] Natl Taiwan Univ, Coll Med, Sch Pharm, Taipei, Taiwan
[5] Natl Taiwan Univ Hosp, Dept Pharm, Taipei, Taiwan
[6] Acer, Adv Tech Business Unit, New Taipei, Taiwan
[7] Taipei Vet Gen Hosp, Ctr Geriatr & Gerontol, Taipei, Taiwan
[8] Taipei Vet Gen Hosp, Taipei Municipal Gan Dau Hosp, Taipei, Taiwan
关键词
dementia; machine learning; cluster analysis; disease; condition; symptoms; data; data set; cardiovascular; neuropsychiatric; infection; mobility; mental conditions; development; COGNITIVE IMPAIRMENT; CAROTID STENOSIS; RISK; ASSOCIATIONS; POPULATION; ADULTS;
D O I
10.2196/41858
中图分类号
R19 [保健组织与事业(卫生事业管理)];
学科分类号
摘要
Background: Dementia development is a complex process in which the occurrence and sequential relationships of different diseases or conditions may construct specific patterns leading to incident dementia.Objective: This study aimed to identify patterns of disease or symptom clusters and their sequences prior to incident dementia using a novel approach incorporating machine learning methods. Methods: Using Taiwan's National Health Insurance Research Database, data from 15,700 older people with dementia and 15,700 nondementia controls matched on age, sex, and index year (n=10,466, 67% for the training data set and n=5234, 33% for the testing data set) were retrieved for analysis. Using machine learning methods to capture specific hierarchical disease triplet clusters prior to dementia, we designed a study algorithm with four steps: (1) data preprocessing, (2) disease or symptom pathway selection, (3) model construction and optimization, and (4) data visualization. Results: Among 15,700 identified older people with dementia, 10,466 and 5234 subjects were randomly assigned to the training and testing data sets, and 6215 hierarchical disease triplet clusters with positive correlations with dementia onset were identified. We subsequently generated 19,438 features to construct prediction models, and the model with the best performance was support vector machine (SVM) with the by-group LASSO (least absolute shrinkage and selection operator) regression method (total corresponding features=2513; accuracy=0.615; sensitivity=0.607; specificity=0.622; positive predictive value=0.612; negative predictive value=0.619; area under the curve=0.639). In total, this study captured 49 hierarchical disease triplet clusters related to dementia development, and the most characteristic patterns leading to incident dementia started with cardiovascular conditions (mainly hypertension), cerebrovascular disease, mobility disorders, or infections, followed by neuropsychiatric conditions.Conclusions: Dementia development in the real world is an intricate process involving various diseases or conditions, their co-occurrence, and sequential relationships. Using a machine learning approach, we identified 49 hierarchical disease triplet clusters with leading roles (cardio-or cerebrovascular disease) and supporting roles (mental conditions, locomotion difficulties, infections, and nonspecific neurological conditions) in dementia development. Further studies using data from other countries are needed to validate the prediction algorithms for dementia development, allowing the development of comprehensive strategies to prevent or care for dementia in the real world.
引用
收藏
页数:16
相关论文
共 50 条
  • [31] A Cardiovascular Disease Prediction Model Based on Routine Physical Examination Indicators Using Machine Learning Methods: A Cohort Study
    Qian, Xin
    Li, Yu
    Zhang, Xianghui
    Guo, Heng
    He, Jia
    Wang, Xinping
    Yan, Yizhong
    Ma, Jiaolong
    Ma, Rulin
    Guo, Shuxia
    FRONTIERS IN CARDIOVASCULAR MEDICINE, 2022, 9
  • [32] Using a cohort study of diabetes and peripheral artery disease to compare logistic regression and machine learning via random forest modeling
    Andrea M. Austin
    Niveditta Ramkumar
    Barbara Gladders
    Jonathan A. Barnes
    Mark A. Eid
    Kayla O. Moore
    Mark W. Feinberg
    Mark A. Creager
    Marc Bonaca
    Philip P. Goodney
    BMC Medical Research Methodology, 22
  • [33] Longitudinal Risk Prediction of Chronic Kidney Disease in Diabetic Patients Using a Temporal-Enhanced Gradient Boosting Machine: Retrospective Cohort Study
    Song, Xing
    Waitman, Lemuel R.
    Yu, Alan S. L.
    Robbins, David C.
    Hu, Yong
    Liu, Mei
    JMIR MEDICAL INFORMATICS, 2020, 8 (01) : 95 - 113
  • [34] Predicting functional outcome in ischemic stroke patients using genetic, environmental, and clinical factors: a machine learning analysis of population-based prospective cohort study
    Chen, Siding
    Xu, Zhe
    Yin, Jinfeng
    Gu, Hongqiu
    Shi, Yanfeng
    Guo, Cang
    Meng, Xia
    Li, Hao
    Huang, Xinying
    Jiang, Yong
    Wang, Yongjun
    BRIEFINGS IN BIOINFORMATICS, 2024, 25 (06)
  • [35] Predicting body image satisfaction after liposuction surgery in patients with obesity using a longitudinal study and machine-learning analysis
    Lee, Jeung-Hyun
    Kwon, Mina
    Yang, Jaeyeong
    Chang, Rose S.
    Ahn, Woo-Young
    JOURNAL OF BEHAVIORAL ADDICTIONS, 2023, 12 : 256 - 256
  • [36] Automated Machine Learning Analysis of Patients With Chronic Skin Disease Using a Medical Smartphone App: Retrospective Study
    Bibi, Igor
    Schaffert, Daniel
    Blauth, Mara
    Lull, Christian
    von Ahnen, Jan Alwin
    Gross, Georg
    Weigandt, Wanja Alexander
    Knitza, Johannes
    Kuhn, Sebastian
    Benecke, Johannes
    Leipe, Jan
    Schmieder, Astrid
    Olsavszky, Victor
    JOURNAL OF MEDICAL INTERNET RESEARCH, 2023, 25
  • [37] Levodopa-induced dyskinesia in Parkinson's disease: Insights from cross-cohort prognostic analysis using machine learning
    Loo, Rebecca Ting Jiin
    Tsurkalenko, Olena
    Klucken, Jochen
    Mangone, Graziella
    Khoury, Fouad
    Vidailhet, Marie
    Corvol, Jean-Christophe
    Kruger, Rejko
    Glaab, Enrico
    PARKINSONISM & RELATED DISORDERS, 2024, 126
  • [38] An innovative model for predicting coronary heart disease using triglyceride-glucose index: a machine learning-based cohort study
    Mirjalili, Seyed Reza
    Soltani, Sepideh
    Meybodi, Zahra Heidari
    Marques-Vidal, Pedro
    Kraemer, Alexander
    Sarebanhassanabadi, Mohammadtaghi
    CARDIOVASCULAR DIABETOLOGY, 2023, 22 (01)
  • [39] An innovative model for predicting coronary heart disease using triglyceride-glucose index: a machine learning-based cohort study
    Seyed Reza Mirjalili
    Sepideh Soltani
    Zahra Heidari Meybodi
    Pedro Marques-Vidal
    Alexander Kraemer
    Mohammadtaghi Sarebanhassanabadi
    Cardiovascular Diabetology, 22
  • [40] Comparative Study Based on Analysis of Coronavirus Disease (COVID-19) Detection and Prediction Using Machine Learning Models
    R. Sudha Abirami
    G. Suresh Kumar
    SN Computer Science, 2022, 3 (1)