PScL-HDeep: image-based prediction of protein subcellular location in human tissue using ensemble learning of handcrafted and deep learned features with two-layer feature selection

被引:29
作者
Ullah, Matee [1 ,2 ]
Han, Ke [1 ,2 ]
Hadi, Fazal [1 ]
Xu, Jian [1 ,3 ,4 ,5 ]
Song, Jiangning [6 ,7 ,8 ]
Yu, Dong-Jun [1 ,4 ,9 ]
机构
[1] Nanjing Univ Sci & Technol, Sch Comp Sci & Engn, Nanjing, Jiangsu, Peoples R China
[2] Pattern Recognit & Bioinformat Grp, Delft, Netherlands
[3] Florida Int Univ, Sch Comp Sci, Miami, FL 33199 USA
[4] China Comp Federat CCF, Beijing, Peoples R China
[5] IEEE, Piscataway, NJ USA
[6] Monash Univ, Monash Biomed Discovery Inst, Melbourne, Vic 3800, Australia
[7] Monash Univ, Dept Biochem & Mol Biol, Melbourne, Vic 3800, Australia
[8] Monash Univ, Fac Informat Technol, Monash Ctr Data Sci, Clayton, Vic, Australia
[9] China Assoc Artificial Intelligence CAAI, Beijing, Peoples R China
基金
英国医学研究理事会; 中国国家自然科学基金; 澳大利亚研究理事会; 美国国家卫生研究院;
关键词
protein subcellular location; bioimage analysis; feature selection; handcrafted features; deep learned features; NEURAL-NETWORKS; GENE SELECTION; CLASSIFICATION; LOCALIZATION; PATTERNS; SCALE;
D O I
10.1093/bib/bbab278
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Protein subcellular localization plays a crucial role in characterizing the function of proteins and understanding various cellular processes. Therefore, accurate identification of protein subcellular location is an important yet challenging task. Numerous computational methods have been proposed to predict the subcellular location of proteins. However, most existing methods have limited capability in terms of the overall accuracy, time consumption and generalization power. To address these problems, in this study, we developed a novel computational approach based on human protein atlas (HPA) data, referred to as PScL-HDeep, for accurate and efficient image-based prediction of protein subcellular location in human tissues. We extracted different handcrafted and deep learned (by employing pretrained deep learning model) features from different viewpoints of the image. The step-wise discriminant analysis (SDA) algorithm was applied to generate the optimal feature set from each original raw feature set. To further obtain a more informative feature subset, support vector machine-based recursive feature elimination with correlation bias reduction (SVM-RFE+CBR) feature selection algorithm was applied to the integrated feature set. Finally, the classification models, namely support vector machine with radial basis function (SVM-RBF) and support vector machine with linear kernel (SVM-LNR), were learned on the final selected feature set. To evaluate the performance of the proposed method, a new gold standard benchmark training dataset was constructed from the HPA databank. PScL-HDeep achieved the maximum performance on 10-fold cross validation test on this dataset and showed a better efficacy over existing predictors. Furthermore, we also illustrated the generalization ability of the proposed method by conducting a stringent independent validation test.
引用
收藏
页数:17
相关论文
共 85 条
[1]   DeepPPSite: A deep learning-based model for analysis and prediction of phosphorylation sites using efficient sequence information [J].
Ahmed, Saeed ;
Kabir, Muhammad ;
Arif, Muhammad ;
Khan, Zaheer Ullah ;
Yu, Dong-Jun .
ANALYTICAL BIOCHEMISTRY, 2021, 612
[2]   DBPPred-PDSD: Machine learning approach for prediction of DNA-binding proteins using Discrete Wavelet Transform and optimized integrated features space [J].
Ali, Farman ;
Kabir, Muhammad ;
Arif, Muhammad ;
Swati, Zar Nawab Khan ;
Khan, Zaheer Ullah ;
Ullah, Matee ;
Yu, Dong-Jun .
CHEMOMETRICS AND INTELLIGENT LABORATORY SYSTEMS, 2018, 182 :21-30
[3]  
[Anonymous], 2014, INT C INNOVATIONS EN
[4]  
[Anonymous], 2013, Advances in neural information processing systems
[5]  
[Anonymous], 2011, Acm T. Intel. Syst. Tec., DOI DOI 10.1145/1961189.1961199
[6]   Deep learning with non-medical training used for chest pathology identification [J].
Bar, Yaniv ;
Diamant, Idit ;
Wolf, Lior ;
Greenspan, Hayit .
MEDICAL IMAGING 2015: COMPUTER-AIDED DIAGNOSIS, 2015, 9414
[7]  
Breiman L., 2001, IEEE Trans. Broadcast., V45, P5
[8]   A multiresolution approach to automated classification of protein subcellular location images [J].
Chebira, Amina ;
Barbotin, Yann ;
Jackson, Charles ;
Merryman, Thomas ;
Srinivasa, Gowri ;
Murphy, Robert F. ;
Kovacevic, Jelena .
BMC BIOINFORMATICS, 2007, 8 (1)
[9]   Spectral-Spatial Classification of Hyperspectral Data Based on Deep Belief Network [J].
Chen, Yushi ;
Zhao, Xing ;
Jia, Xiuping .
IEEE JOURNAL OF SELECTED TOPICS IN APPLIED EARTH OBSERVATIONS AND REMOTE SENSING, 2015, 8 (06) :2381-2392
[10]   pLoc-mHum: predict subcellular localization of multi-location human proteins via general PseAAC to winnow out the crucial GO information [J].
Cheng, Xiang ;
Xiao, Xuan ;
Chou, Kuo-Chen .
BIOINFORMATICS, 2018, 34 (09) :1448-1456