Explainable machine learning identifies a polygenic risk score as a key predictor of pancreatic cancer risk in the UK Biobank

被引:3
作者
Peduzzi, Giulia [1 ]
Felici, Alessio [1 ]
Pellungrini, Roberto
Campa, Daniele [1 ,2 ]
机构
[1] Univ Pisa, Dept Biol, Via Luca Ghini 13, I-56126 Pisa, Italy
[2] Scuola Normale Super Pisa, Classe Sci, Piazza Cavalieri 7, I-56126 Pisa, Italy
关键词
Pancreatic cancer; Risk prediction; Explainable artificial intelligence; Polygenic Risk Score; GENOME-WIDE ASSOCIATION; SUSCEPTIBILITY LOCI; BREAST-CANCER; VARIANTS; DISEASE; GENES; MODEL;
D O I
10.1016/j.dld.2024.11.010
中图分类号
R57 [消化系及腹部疾病];
学科分类号
摘要
Background: Predicting the risk of developing pancreatic ductal adenocarcinoma (PDAC) is of paramount importance, given its high mortality rate. Current PDAC risk prediction models rely on a limited number of variables, do not include genetics, and have a modest accuracy. Aim: This study aimed to develop an interpretable PDAC risk prediction model, based on machine learning (ML). Methods: Five ML models (Adaptive Boosting, eXtreme Gradient Boosting, CatBoost, Deep Forest and Random Forest) built on 56 exposome variables and a polygenic risk score (PRS) were tested in 654 PDAC cases and 1,308 controls of the UK Biobank. Additionally, SHapley Additive exPlanation (SHAP) and Global model Interpretation via the Recursive Partitioning (Girp) were employed to explain the models. Results: All models provided similar performance, but based on recall the best was CatBoost (77.10 %). SHAP highlighted age and the PRS as primary contributors across all models. Girp developed rules to discern cases from controls, identifying age, PRS, and pancreatitis in most of the rules. Conclusion: The predictive models tested have exhibited good performance, indicating their potential application in the clinical field in the near future, with the PRS playing a key role in identifying high-risk individuals as demonstrated by the explainers. (c) 2024 Published by Elsevier Ltd on behalf of Editrice Gastroenterologica Italiana S.r.l.
引用
收藏
页码:915 / 922
页数:8
相关论文
共 55 条
[1]   Genome-wide association study identifies variants in the ABO locus associated with susceptibility to pancreatic cancer [J].
Amundadottir, Laufey ;
Kraft, Peter ;
Stolzenberg-Solomon, Rachael Z. ;
Fuchs, Charles S. ;
Petersen, Gloria M. ;
Arslan, Alan A. ;
Bueno-de-Mesquita, H. Bas ;
Gross, Myron ;
Helzlsouer, Kathy ;
Jacobs, Eric J. ;
LaCroix, Andrea ;
Zheng, Wei ;
Albanes, Demetrius ;
Bamlet, William ;
Berg, Christine D. ;
Berrino, Franco ;
Bingham, Sheila ;
Buring, Julie E. ;
Bracci, Paige M. ;
Canzian, Federico ;
Clavel-Chapelon, Francoise ;
Clipp, Sandra ;
Cotterchio, Michelle ;
de Andrade, Mariza ;
Duell, Eric J. ;
Fox, John W., Jr. ;
Gallinger, Steven ;
Gaziano, J. Michael ;
Giovannucci, Edward L. ;
Goggins, Michael ;
Gonzalez, Carlos A. ;
Hallmans, Goran ;
Hankinson, Susan E. ;
Hassan, Manal ;
Holly, Elizabeth A. ;
Hunter, David J. ;
Hutchinson, Amy ;
Jackson, Rebecca ;
Jacobs, Kevin B. ;
Jenab, Mazda ;
Kaaks, Rudolf ;
Klein, Alison P. ;
Kooperberg, Charles ;
Kurtz, Robert C. ;
Li, Donghui ;
Lynch, Shannon M. ;
Mandelson, Margaret ;
McWilliams, Robert R. ;
Mendelsohn, Julie B. ;
Michaud, Dominique S. .
NATURE GENETICS, 2009, 41 (09) :986-U47
[2]   Development and validation of a pancreatic cancer risk model for the general population using electronic health records: An observational study [J].
Appelbaum, Limor ;
Cambronero, Jose P. ;
Stevens, Jennifer P. ;
Horng, Steven ;
Pollick, Karla ;
Silva, George ;
Haneuse, Sebastien ;
Piatkowski, Gail ;
Benhaga, Nordine ;
Duey, Stacey ;
Stevenson, Mary A. ;
Mamon, Harvey ;
Kaplan, Irving D. ;
Rinard, Martin C. .
EUROPEAN JOURNAL OF CANCER, 2021, 143 :19-30
[3]   Development of PancRISK, a urine biomarker-based risk score for stratified screening of pancreatic cancer patients [J].
Blyuss, Oleg ;
Zaikin, Alexey ;
Cherepanova, Valeriia ;
Munblit, Daniel ;
Kiseleva, Elena M. ;
Prytomanova, Olga M. ;
Duffy, Stephen W. ;
Crnogorac-Jurcevic, Tatjana .
BRITISH JOURNAL OF CANCER, 2020, 122 (05) :692-696
[4]   Random forests [J].
Breiman, L .
MACHINE LEARNING, 2001, 45 (01) :5-32
[5]   The PANcreatic Disease ReseArch (PANDoRA) consortium: Ten years' experience of association studies to understand the genetic architecture of pancreatic cancer [J].
Campa, Daniele ;
Gentiluomo, Manuel ;
Stein, Angelika ;
Aoki, Mateus Nobrega ;
Oliverius, Martin ;
Vodickova, Ludmila ;
Jamroziak, Krzysztof ;
Theodoropoulos, George ;
Pasquali, Claudio ;
Greenhalf, William ;
Arcidiacono, Paolo Giorgio ;
Uzunoglu, Faik ;
Pezzilli, Raffaele ;
Luchini, Claudio ;
Puzzono, Marta ;
Loos, Martin ;
Giaccherini, Matteo ;
Katzke, Verena ;
Mambrini, Andrea ;
Kiudeliene, Edita ;
Federico, Kauffmann Emanuele ;
Johansen, Julia ;
Hussein, Tamas ;
Mohelnikova-Duchonova, Beatrice ;
van Eijck, Casper H. J. ;
Brenner, Hermann ;
Farinella, Riccardo ;
Perez, Juan Sainz ;
Lovecek, Martin ;
Buechler, Markus W. ;
Hlavac, Viktor ;
Izbicki, Jakob R. ;
Hackert, Thilo ;
Chammas, Roger ;
Zerbi, Alessandro ;
Lawlor, Rita ;
Felici, Alessio ;
Goetz, Mara ;
Capurso, Gabriele ;
Ginocchi, Laura ;
Gazouli, Maria ;
Kupcinskas, Juozas ;
Cavestro, Giulia Martina ;
Vodicka, Pavel ;
Moz, Stefania ;
Neoptolemos, John P. ;
Kunovsky, Lumir ;
Bojesen, Stig E. ;
Carrara, Silvia ;
Gioffreda, Domenica .
CRITICAL REVIEWS IN ONCOLOGY HEMATOLOGY, 2023, 186
[6]   Functional single nucleotide polymorphisms within the cyclin-dependent kinase inhibitor 2A/2B region affect pancreatic cancer risk [J].
Campa, Daniele ;
Pastore, Manuela ;
Gentiluomo, Manuel ;
Talar-Wojnarowska, Renata ;
Kupcinskas, Juozas ;
Malecka-Panas, Ewa ;
Neoptolemos, John P. ;
Niesen, Willem ;
Vodicka, Pavel ;
Delle Fave, Gianfranco ;
Bueno-de-Mesquita, H. Bas ;
Gazouli, Maria ;
Pacetti, Paola ;
Di Leo, Milena ;
Ito, Hidemi ;
Klueter, Harald ;
Soucek, Pavel ;
Corbo, Vincenzo ;
Yamao, Kenji ;
Hosono, Satoyo ;
Kaaks, Rudolf ;
Vashist, Yogesh ;
Gioffreda, Domenica ;
Strobel, Oliver ;
Shimizu, Yasuhiro ;
Dijk, Frederike ;
Andriulli, Angelo ;
Ivanauskas, Audrius ;
Bugert, Peter ;
Tavano, Francesca ;
Vodickova, Ludmila ;
Zambon, Carlo Federico ;
Lovecek, Martin ;
Landi, Stefano ;
Key, Timothy J. ;
Boggi, Ugo ;
Pezzilli, Raffaele ;
Jamroziak, Krzysztof ;
Mohelnikova-Duchonova, Beatrice ;
Mambrini, Andrea ;
Bambi, Franco ;
Busch, Olivier ;
Pazienza, Valerio ;
Valente, Roberto ;
Theodoropoulos, George E. ;
Hackert, Thilo ;
Capurso, Gabriele ;
Cavestro, Giulia Martina ;
Pasquali, Claudio ;
Basso, Daniela .
ONCOTARGET, 2016, 7 (35) :57011-57020
[7]   TERT gene harbors multiple variants associated with pancreatic cancer susceptibility [J].
Campa, Daniele ;
Rizzato, Cosmeri ;
Stolzenberg-Solomon, Rachael ;
Pacetti, Paola ;
Vodicka, Pavel ;
Cleary, Sean P. ;
Capurso, Gabriele ;
Bueno-de-Mesquita, H. B ;
Werner, Jens ;
Gazouli, Maria ;
Butterbach, Katja ;
Ivanauskas, Audrius ;
Giese, Nathalia ;
Petersen, Gloria M. ;
Fogar, Paola ;
Wang, Zhaoming ;
Bassi, Claudio ;
Ryska, Miroslav ;
Theodoropoulos, George E. ;
Kooperberg, Charles ;
Li, Donghui ;
Greenhalf, William ;
Pasquali, Claudio ;
Hackert, Thilo ;
Fuchs, Charles S. ;
Mohelnikova-Duchonova, Beatrice ;
Sperti, Cosimo ;
Funel, Niccola ;
Dieffenbach, Aida Karina ;
Wareham, Nicholas J. ;
Buring, Julie ;
Holcatova, Ivana ;
Costello, Eithne ;
Zambon, Carlo-Federico ;
Kupcinskas, Juozas ;
Risch, Harvey A. ;
Kraft, Peter ;
Bracci, Paige M. ;
Pezzilli, Raffaele ;
Olson, Sara H. ;
Sesso, Howard D. ;
Hartge, Patricia ;
Strobel, Oliver ;
Malecka-Panas, Ewa ;
Visvanathan, Kala ;
Arslan, Alan A. ;
Pedrazzoli, Sergio ;
Soucek, Pavel ;
Gioffreda, Domenica ;
Key, Timothy J. .
INTERNATIONAL JOURNAL OF CANCER, 2015, 137 (09) :2175-2183
[8]   A novel prediction model of the risk of pancreatic cancer among diabetes patients using multiple clinical data and machine learning [J].
Chen, Shih-Min ;
Phuc, Phan Thanh ;
Nguyen, Phung-Anh ;
Burton, Whitney ;
Lin, Shwu-Jiuan ;
Lin, Weei-Chin ;
Lu, Christine Y. ;
Hsu, Min-Huei ;
Cheng, Chi-Tsun ;
Hsu, Jason C. .
CANCER MEDICINE, 2023, 12 (19) :19987-19999
[9]   XGBoost: A Scalable Tree Boosting System [J].
Chen, Tianqi ;
Guestrin, Carlos .
KDD'16: PROCEEDINGS OF THE 22ND ACM SIGKDD INTERNATIONAL CONFERENCE ON KNOWLEDGE DISCOVERY AND DATA MINING, 2016, :785-794
[10]   Machine learning versus regression for prediction of sporadic pancreatic cancer [J].
Chen, Wansu ;
Zhou, Botao ;
Jeon, Christie Y. ;
Xie, Fagen ;
Lin, Yu-Chen ;
Butler, Rebecca K. ;
Zhou, Yichen ;
Luong, Tiffany Q. ;
Lustigova, Eva ;
Pisegna, Joseph R. ;
Wu, Bechien U. .
PANCREATOLOGY, 2023, 23 (04) :396-402