Identifying subtypes of chronic kidney disease with machine learning: development, internal validation and prognostic validation using linked electronic health records in 350,067 individuals

被引:15
作者
Dashtban, Ashkan [1 ]
Mizani, Mehrdad A. [1 ,2 ]
Pasea, Laura [1 ]
Denaxas, Spiros [1 ]
Corbett, Richard [3 ]
Mamza, Jil B.
Gao, He
Morris, Tamsin [4 ]
Hemingway, Harry [1 ,5 ]
Banerjee, Amitava [1 ,6 ,7 ]
机构
[1] UCL, Inst Hlth Informat, 222 Euston Rd, London NW1 2DA, England
[2] British Heart Fdn Data Sci Ctr, Hlth Data Res UK, London, England
[3] Imperial Coll Healthcare NHS Trust, London, England
[4] AstraZeneca, Med & Sci Affairs, BioPharmaceut Med, London, England
[5] UCL, Hlth Data Res UK, London, England
[6] Barts Hlth NHS Trust, London, England
[7] Univ Coll London Hosp NHS Trust, London, England
关键词
PREDICTION MODELS; RISK PREDICTION; DEATH;
D O I
10.1016/j.ebiom.2023.104489
中图分类号
R5 [内科学];
学科分类号
1002 ; 100201 ;
摘要
Background Although chronic kidney disease (CKD) is associated with high multimorbidity, polypharmacy, morbidity and mortality, existing classification systems (mild to severe, usually based on estimated glomerular filtration rate, proteinuria or urine albumin-creatinine ratio) and risk prediction models largely ignore the complexity of CKD, its risk factors and its outcomes. Improved subtype definition could improve prediction of outcomes and inform effective interventions.Methods We analysed individuals >= 18 years with incident and prevalent CKD (n = 350,067 and 195,422 respectively) from a population-based electronic health record resource (2006-2020; Clinical Practice Research Datalink, CPRD). We included factors (n = 264 with 2670 derived variables), e.g. demography, history, examination, blood laboratory values and medications. Using a published framework, we identified subtypes through seven unsupervised machine learning (ML) methods (K-means, Diana, HC, Fanny, PAM, Clara, Model-based) with 66 (of 2670) variables in each dataset. We evaluated subtypes for: (i) internal validity (within dataset, across methods); (ii) prognostic validity (predictive accuracy for 5-year all-cause mortality and admissions); and (iii) medications (new and existing by British National Formulary chapter).Findings After identifying five clusters across seven approaches, we labelled CKD subtypes: 1. Early-onset, 2. Late -onset, 3. Cancer, 4. Metabolic, and 5. Cardiometabolic. Internal validity: We trained a high performing model (using XGBoost) that could predict disease subtypes with 95% accuracy for incident and prevalent CKD (Sensitivity: 0.81-0.98, F1 score:0.84-0.97). Prognostic validity: 5-year all-cause mortality, hospital admissions, and incidence of new chronic diseases differed across CKD subtypes. The 5-year risk of mortality and admissions in the overall incident CKD population were highest in cardiometabolic subtype: 43.3% (42.3-42.8%) and 29.5% (29.1-30.0%), respectively, and lowest in the early-onset subtype: 5.7% (5.5-5.9%) and 18.7% (18.4-19.1%). Medications: Across CKD subtypes, the distribution of prescription medication classes at baseline varied, with highest medication burden in cardiometabolic and metabolic subtypes, and higher burden in prevalent than incident CKD.Interpretation In the largest CKD study using ML, to-date, we identified five distinct subtypes in individuals with incident and prevalent CKD. These subtypes have relevance to study of aetiology, therapeutics and risk prediction.
引用
收藏
页数:13
相关论文
共 32 条
[1]   The 6R's of drug induced nephrotoxicity [J].
Awdishu, Linda ;
Mehta, Ravindra L. .
BMC NEPHROLOGY, 2017, 18
[2]  
Banerjee A, 2022, PREPRINT, DOI [10.1101/2022.06.27.22276961v1.full, DOI 10.1101/2022.06.27.22276961V1.FULL]
[3]   Machine learning for subtype definition and risk prediction in heart failure, acute coronary syndromes and atrial fibrillation: systematic review of validity and clinical utility [J].
Banerjee, Amitava ;
Chen, Suliang ;
Fatemifar, Ghazaleh ;
Zeina, Mohamad ;
Lumbers, R. Thomas ;
Mielke, Johanna ;
Gill, Simrat ;
Kotecha, Dipak ;
Freitag, Daniel F. ;
Denaxas, Spiros ;
Hemingway, Harry .
BMC MEDICINE, 2021, 19 (01)
[4]   Global, regional, and national burden of chronic kidney disease, 1990-2017: a systematic analysis for the Global Burden of Disease Study 2017 [J].
Bikbov, Boris ;
Purcell, Carrie ;
Levey, Andrew S. ;
Smith, Mari ;
Abdoli, Amir ;
Abebe, Molla ;
Adebayo, Oladimeji M. ;
Afarideh, Mohsen ;
Agarwal, Sanjay Kumar ;
Agudelo-Botero, Marcela ;
Ahmadian, Elham ;
Al-Aly, Ziyad ;
Alipour, Vahid ;
Almasi-Hashiani, Amir ;
Al-Raddadi, Rajaa M. ;
Alvis-Guzman, Nelson ;
Amini, Saeed ;
Andrei, Tudorel ;
Andrei, Catalina Liliana ;
Andualem, Zewudu ;
Anjomshoa, Mina ;
Arabloo, Jalal ;
Ashagre, Alebachew Fasil ;
Asmelash, Daniel ;
Ataro, Zerihun ;
Atout, Maha Moh'd Wahbi ;
Ayanore, Martin Amogre ;
Badawi, Alaa ;
Bakhtiari, Ahad ;
Ballew, Shoshana H. ;
Balouchi, Abbas ;
Banach, Maciej ;
Barquera, Simon ;
Basu, Sanjay ;
Bayih, Mulat Tirfie ;
Bedi, Neeraj ;
Bello, Aminu K. ;
Bensenor, Isabela M. ;
Bijani, Ali ;
Boloor, Archith ;
Borzi, Antonio M. ;
Camera, Luis Alberto ;
Carrero, Juan J. ;
Carvalho, Felix ;
Castro, Franz ;
Catala-Lopez, Ferran ;
Chang, Alex R. ;
Chin, Ken Lee ;
Chung, Sheng-Chia ;
Cirillo, Massimo .
LANCET, 2020, 395 (10225) :709-733
[5]   A systematic review finds prediction models for chronic kidney disease were poorly reported and often developed using inappropriate methods [J].
Collins, Gary S. ;
Omar, Omar ;
Shanyinde, Milensu ;
Yu, Ly-Mee .
JOURNAL OF CLINICAL EPIDEMIOLOGY, 2013, 66 (03) :268-277
[6]   A retrospective cohort study predicting and validating impact of the COVID-19 pandemic in individuals with chronic kidney disease [J].
Dashtban, Ashkan ;
Mizani, Mehrdad A. ;
Denaxas, Spiros ;
Nitsch, Dorothea ;
Quint, Jennifer ;
Corbett, Richard ;
Mamza, Jil B. ;
Morris, Tamsin ;
Mamas, Mamas ;
Lawlor, Deborah A. ;
Khunti, Kamlesh ;
Sudlow, Cathie ;
Hemingway, Harry ;
Banerjee, Amitava .
KIDNEY INTERNATIONAL, 2022, 102 (03) :652-660
[7]   UK phenomics platform for developing and validating electronic health record phenotypes: CALIBER [J].
Denaxas, Spiros ;
Gonzalez-Izquierdo, Arturo ;
Direk, Kenan ;
Fitzpatrick, Natalie K. ;
Fatemifar, Ghazaleh ;
Banerjee, Amitava ;
Dobson, Richard J. B. ;
Howe, Laurence J. ;
Kuan, Valerie ;
Lumbers, R. Tom ;
Pasea, Laura ;
Patel, Riyaz S. ;
Shah, Anoop D. ;
Hingorani, Aroon D. ;
Sudlow, Cathie ;
Hemingway, Harry .
JOURNAL OF THE AMERICAN MEDICAL INFORMATICS ASSOCIATION, 2019, 26 (12) :1545-1559
[8]   An external validation of models to predict the onset of chronic kidney disease using population-based electronic health records from Salford, UK [J].
Fraccaro, Paolo ;
van der Veer, Sabine ;
Brown, Benjamin ;
Prosperi, Mattia ;
O'Donoghue, Donal ;
Collins, Gary S. ;
Buchan, Iain ;
Peek, Niels .
BMC MEDICINE, 2016, 14
[9]   Chronic kidney disease and cardiovascular risk: epidemiology, mechanisms, and prevention [J].
Gansevoort, Ron T. ;
Correa-Rotter, Ricardo ;
Hemmelgarn, Brenda R. ;
Jafar, Tazeen H. ;
Heerspink, Hiddo J. Lambers ;
Mann, Johannes F. ;
Matsushita, Kunihiro ;
Wen, Chi Pang .
LANCET, 2013, 382 (9889) :339-352
[10]   Chronic kidney disease and the risks of death, cardiovascular events, and hospitalization [J].
Go, AS ;
Chertow, GM ;
Fan, DJ ;
McCulloch, CE ;
Hsu, CY .
NEW ENGLAND JOURNAL OF MEDICINE, 2004, 351 (13) :1296-1305