Predicting Subcellular Localization of Apoptosis Proteins Combining GO Features of Homologous Proteins and Distance Weighted KNN Classifier

被引:6
作者
Wang, Xiao [1 ]
Li, Hui [1 ]
Zhang, Qiuwen [1 ]
Wang, Rong [1 ]
机构
[1] Zhengzhou Univ Light Ind, Sch Comp & Commun Engn, Zhengzhou 450002, Peoples R China
基金
中国国家自然科学基金;
关键词
AMINO-ACID-COMPOSITION; LABEL LEARNING CLASSIFIER; TOP-DOWN APPROACH; GENE ONTOLOGY; VIRUS PROTEINS; LOCATION; MPLOC; SINGLE; SITES; ENSEMBLE;
D O I
10.1155/2016/1793272
中图分类号
Q81 [生物工程学(生物技术)]; Q93 [微生物学];
学科分类号
071005 ; 0836 ; 090102 ; 100705 ;
摘要
Apoptosis proteins play a key role in maintaining the stability of organism; the functions of apoptosis proteins are related to their subcellular locations which are used to understand the mechanism of programmed cell death. In this paper, we utilize GO annotation information of apoptosis proteins and their homologous proteins retrieved from GOA database to formulate feature vectors and then combine the distance weighted KNN classification algorithm with them to solve the data imbalance problem existing in CL317 data set to predict subcellular locations of apoptosis proteins. It is found that the number of homologous proteins can affect the overall prediction accuracy. Under the optimal number of homologous proteins, the overall prediction accuracy of our method on CL317 data set reaches 96.8% by Jackknife test. Compared with other existing methods, it shows that our proposed method is very effective and better than others for predicting subcellular localization of apoptosis proteins.
引用
收藏
页数:8
相关论文
共 64 条
[1]   Predicting protein subcellular locations using hierarchical ensemble of Bayesian classifiers based on Markov chains [J].
Bulashevska, Alla ;
Eils, Roland .
BMC BIOINFORMATICS, 2006, 7 (1)
[2]   The Gene Ontology Annotation (GOA) Database: sharing knowledge in Uniprot with Gene Ontology [J].
Camon, E ;
Magrane, M ;
Barrell, D ;
Lee, V ;
Dimmer, E ;
Maslen, J ;
Binns, D ;
Harte, N ;
Lopez, R ;
Apweiler, R .
NUCLEIC ACIDS RESEARCH, 2004, 32 :D262-D266
[3]   IACP: a sequence-based tool for identifying anticancer peptides [J].
Chen, Wei ;
Ding, Hui ;
Feng, Pengmian ;
Lin, Hao ;
Chou, Kuo-Chen .
ONCOTARGET, 2016, 7 (13) :16895-16909
[4]   iRSpot-PseDNC: identify recombination spots with pseudo dinucleotide composition [J].
Chen, Wei ;
Feng, Peng-Mian ;
Lin, Hao ;
Chou, Kuo-Chen .
NUCLEIC ACIDS RESEARCH, 2013, 41 (06) :e68
[5]   Prediction of apoptosis protein subcellular location using improved hybrid approach and pseudo-amino acid composition [J].
Chen, Ying-Li ;
Li, Qian-Zhong .
JOURNAL OF THEORETICAL BIOLOGY, 2007, 248 (02) :377-381
[6]   Prediction of the subcellular location of apoptosis proteins [J].
Chen, Ying-Li ;
Li, Qian-Zhong .
JOURNAL OF THEORETICAL BIOLOGY, 2007, 245 (04) :775-783
[7]   Using functional domain composition and support vector machines for prediction of protein subcellular location [J].
Chou, KC ;
Cai, YD .
JOURNAL OF BIOLOGICAL CHEMISTRY, 2002, 277 (48) :45765-45769
[8]   Prediction of protein cellular attributes using pseudo-amino acid composition [J].
Chou, KC .
PROTEINS-STRUCTURE FUNCTION AND GENETICS, 2001, 43 (03) :246-255
[9]   Cell-PLoc: a package of Web servers for predicting subcellular localization of proteins in various organisms [J].
Chou, Kuo-Chen ;
Shen, Hong-Bin .
NATURE PROTOCOLS, 2008, 3 (02) :153-162
[10]   Predicting eukaryotic protein subcellular location by fusing optimized evidence-theoretic K-Nearest Neighbor classifiers [J].
Chou, Kuo-Chen ;
Shen, Hong-Bin .
JOURNAL OF PROTEOME RESEARCH, 2006, 5 (08) :1888-1897