PIMKL: Pathway-Induced Multiple Kernel Learning

被引:23
作者
Manica, Matteo [1 ,2 ]
Cadow, Joris [1 ,2 ]
Mathis, Roland [1 ]
Martinez, Maria Rodriguez [1 ]
机构
[1] IBM Res, Zurich, Switzerland
[2] ETH, Zurich, Switzerland
关键词
MOLECULAR INTERACTION DATABASE; PROTEIN-PROTEIN; CANCER; COAGULATION; SIGNATURES; EXPRESSION; NETWORKS; KEGG;
D O I
10.1038/s41540-019-0086-3
中图分类号
Q [生物科学];
学科分类号
07 ; 0710 ; 09 ;
摘要
Reliable identification of molecular biomarkers is essential for accurate patient stratification. While state-of-the-art machine learning approaches for sample classification continue to push boundaries in terms of performance, most of these methods are not able to integrate different data types and lack generalization power, limiting their application in a clinical setting. Furthermore, many methods behave as black boxes, and we have very little understanding about the mechanisms that lead to the prediction. While opaqueness concerning machine behavior might not be a problem in deterministic domains, in health care, providing explanations about the molecular factors and phenotypes that are driving the classification is crucial to build trust in the performance of the predictive system. We propose Pathway-Induced Multiple Kernel Learning (PIMKL), a methodology to reliably classify samples that can also help gain insights into the molecular mechanisms that underlie the classification. PIMKL exploits prior knowledge in the form of a molecular interaction network and annotated gene sets, by optimizing a mixture of pathway-induced kernels using a Multiple Kernel Learning (MKL) algorithm, an approach that has demonstrated excellent performance in different machine learning applications. After optimizing the combination of kernels to predict a specific phenotype, the model provides a stable molecular signature that can be interpreted in the light of the ingested prior knowledge and that can be used in transfer learning tasks.
引用
收藏
页数:8
相关论文
共 46 条
[1]  
Aiolli F., 2008, LECT NOTES COMPUTER, V5163
[2]   EasyMKL: a scalable multiple kernel learning algorithm [J].
Aiolli, Fabio ;
Donini, Michele .
NEUROCOMPUTING, 2015, 169 :215-224
[3]  
Anderson J., 1985, Linear Multilinear Algebra, V18, P141, DOI DOI 10.1080/03081088508817681
[4]   Prognostic and predictive immune gene signatures in breast cancer [J].
Bedognetti, Davide ;
Hendrickx, Wouter ;
Marincola, Francesco M. ;
Miller, Lance D. .
CURRENT OPINION IN ONCOLOGY, 2015, 27 (06) :433-444
[5]   Signaling of the tissue factor coagulation pathway in angiogenesis and cancer [J].
Belting, M ;
Ahamed, J ;
Ruf, W .
ARTERIOSCLEROSIS THROMBOSIS AND VASCULAR BIOLOGY, 2005, 25 (08) :1545-1550
[6]  
Bishop C. M., 2006, Pattern Recognition and Machine Learning, V1st
[7]   Pathway Commons, a web resource for biological pathway data [J].
Cerami, Ethan G. ;
Gross, Benjamin E. ;
Demir, Emek ;
Rodchenkov, Igor ;
Babur, Oezguen ;
Anwar, Nadia ;
Schultz, Nikolaus ;
Bader, Gary D. ;
Sander, Chris .
NUCLEIC ACIDS RESEARCH, 2011, 39 :D685-D690
[8]   Pathway-based gene signatures predicting clinical outcome of lung adenocarcinoma [J].
Chang, Ya-Hsuan ;
Chen, Chung-Ming ;
Chen, Hsuan-Yu ;
Yang, Pan-Chyr .
SCIENTIFIC REPORTS, 2015, 5
[9]   MatrixDB, a database focused on extracellular protein-protein and protein-carbohydrate interactions [J].
Chautard, Emilie ;
Ballut, Lionel ;
Thierry-Mieg, Nicolas ;
Ricard-Blum, Sylvie .
BIOINFORMATICS, 2009, 25 (05) :690-691
[10]   Identifying cancer biomarkers by network-constrained support vector machines [J].
Chen, Li ;
Xuan, Jianhua ;
Riggins, Rebecca B. ;
Clarke, Robert ;
Wang, Yue .
BMC SYSTEMS BIOLOGY, 2011, 5