A transfer learning approach via procrustes analysis and mean shift for cancer drug sensitivity prediction

被引:30
作者
Turki, Turki [1 ]
Wei, Zhi [2 ]
Wang, Jason T. L. [2 ]
机构
[1] King Abdulaziz Univ, Dept Comp Sci, Jeddah 21589, Saudi Arabia
[2] New Jersey Inst Technol, Dept Comp Sci, Newark, NJ 07102 USA
关键词
Transfer learning; cancer genomics; clinical informatics; precision medicine; SOCIAL MEDIA; RESISTANCE; FUTURE;
D O I
10.1142/S0219720018400140
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Transfer learning (TL) algorithms aim to improve the prediction performance in a target task (e.g. the prediction of cisplatin sensitivity in triple-negative breast cancer patients) via transferring knowledge from auxiliary data of a related task (e.g. the prediction of docetaxel sensitivity in breast cancer patients), where the distribution and even the feature space of the data pertaining to the tasks can be different. In real-world applications, we sometimes have a limited training set in a target task while we have auxiliary data from a related task. To obtain a better prediction performance in the target task, supervised learning requires a sufficiently large training set in the target task to perform well in predicting future test examples of the target task. In this paper, we propose a TL approach for cancer drug sensitivity prediction, where our approach combines three techniques. First, we shift the representation of a subset of examples from auxiliary data of a related task to a representation closer to a target training set of a target task. Second, we align the shifted representation of the selected examples of the auxiliary data to the target training set to obtain examples with representation aligned to the target training set. Third, we train machine learning algorithms using both the target training set and the aligned examples. We evaluate the performance of our approach against baseline approaches using the Area Under the receiver operating characteristic (ROC) Curve (AUC) on real clinical trial datasets pertaining to multiple myeloma, nonsmall cell lung cancer, triple-negative breast cancer, and breast cancer. Experimental results show that our approach is better than the baseline approaches in terms of performance and statistical significance.
引用
收藏
页数:31
相关论文
共 56 条
[1]  
[Anonymous], 2008, P 25 INT C MACHINE L, DOI [DOI 10.1145/1390156.1390297, 10.1145/1390156.1390297]
[2]   Polypharmacology in Precision Oncology: Current Applications and Future Prospects [J].
Antolin, Albert A. ;
Workman, Paul ;
Mestres, Jordi ;
Al-Lazikani, Bissan .
CURRENT PHARMACEUTICAL DESIGN, 2016, 22 (46) :6935-6945
[3]   Computational models for predicting drug responses in cancer research [J].
Azuaje, Francisco .
BRIEFINGS IN BIOINFORMATICS, 2017, 18 (05) :820-829
[4]   Clustering of gene expression data using a local shape-based similarity measure [J].
Balasubramaniyan, R ;
Hüllermeier, E ;
Weskamp, N ;
Kämper, J .
BIOINFORMATICS, 2005, 21 (07) :1069-1077
[5]   Finding correct protein-protein docking models using ProQDock [J].
Basu, Sankar ;
Wallner, Bjorn .
BIOINFORMATICS, 2016, 32 (12) :262-270
[6]   Laplacian eigenmaps for dimensionality reduction and data representation [J].
Belkin, M ;
Niyogi, P .
NEURAL COMPUTATION, 2003, 15 (06) :1373-1396
[7]   Support vector regression for link load prediction [J].
Bermolen, Paola ;
Rossi, Dario .
COMPUTER NETWORKS, 2009, 53 (02) :191-201
[8]   Divisive Correlation Clustering Algorithm (DCCA) for grouping of genes: detecting varying patterns in expression profiles [J].
Bhattacharya, Anindya ;
De, Rajat K. .
BIOINFORMATICS, 2008, 24 (11) :1359-1366
[9]   Managing drug resistance in cancer: lessons from HIV therapy [J].
Bock, Christoph ;
Lengauer, Thomas .
NATURE REVIEWS CANCER, 2012, 12 (07) :494-501
[10]   rCUR: an R package for CUR matrix decomposition [J].
Bodor, Andras ;
Csabai, Istvan ;
Mahoney, Michael W. ;
Solymosi, Norbert .
BMC BIOINFORMATICS, 2012, 13