Prediction of protein subcellular localization by weighted gene ontology terms

被引:16
作者
Chi, Sang-Mun [1 ]
机构
[1] Kyungsung Univ, Sch Engn & Comp Sci, Pusan 608736, South Korea
关键词
Protein subcellular localization; Text categorization; Term-weighting; Gene ontology; Chi-square; SEQUENCE; LOCATIONS; DATABASE; TEXT; GO;
D O I
10.1016/j.bbrc.2010.07.086
中图分类号
Q5 [生物化学]; Q7 [分子生物学];
学科分类号
071010 ; 081704 ;
摘要
We develop a new weighting approach of gene ontology (GO) terms for predicting protein subcellular localization. The weights of individual GO terms, corresponding to their contribution to the prediction algorithm, are determined by the term-weighting methods used in text categorization. We evaluate several term-weighting methods, which are based on inverse document frequency, information gain, gain ratio, odds ratio, and chi-square and its variants. Additionally, we propose a new term-weighting method based on the logarithmic transformation of chi-square. The proposed term-weighting method performs better than other term-weighting methods, and also outperforms state-of-the-art subcellular prediction methods. Our proposed method achieves 98.1%, 99.3%, 98.1%, 98.1%, and 95.9% overall accuracies for the animal BaCelLo independent dataset (IDS), fungal BaCelLo IDS, animal Hoglund IDS, fungal Hoglund IDS, and PLOC dataset, respectively. Furthermore, the close correlation between high-weighted GO terms and subcellular localizations suggests that our proposed method appropriately weights GO terms according to their relevance to the localizations. (C) 2010 Elsevier Inc. All rights reserved.
引用
收藏
页码:402 / 405
页数:4
相关论文
共 30 条
[1]   Gapped BLAST and PSI-BLAST: a new generation of protein database search programs [J].
Altschul, SF ;
Madden, TL ;
Schaffer, AA ;
Zhang, JH ;
Zhang, Z ;
Miller, W ;
Lipman, DJ .
NUCLEIC ACIDS RESEARCH, 1997, 25 (17) :3389-3402
[2]  
[Anonymous], 1997, ICML
[3]   Gene Ontology: tool for the unification of biology [J].
Ashburner, M ;
Ball, CA ;
Blake, JA ;
Botstein, D ;
Butler, H ;
Cherry, JM ;
Davis, AP ;
Dolinski, K ;
Dwight, SS ;
Eppig, JT ;
Harris, MA ;
Hill, DP ;
Issel-Tarver, L ;
Kasarskis, A ;
Lewis, S ;
Matese, JC ;
Richardson, JE ;
Ringwald, M ;
Rubin, GM ;
Sherlock, G .
NATURE GENETICS, 2000, 25 (01) :25-29
[4]   The GOA database in 2009-an integrated Gene Ontology Annotation resource [J].
Barrell, Daniel ;
Dimmer, Emily ;
Huntley, Rachael P. ;
Binns, David ;
O'Donovan, Claire ;
Apweiler, Rolf .
NUCLEIC ACIDS RESEARCH, 2009, 37 :D396-D403
[5]   MultiLoc2: integrating phylogeny and Gene Ontology terms improves subcellular protein localization prediction [J].
Blum, Torsten ;
Briesemeister, Sebastian ;
Kohlbacher, Oliver .
BMC BIOINFORMATICS, 2009, 10 :274
[6]  
Brady Scott, 2008, Pac Symp Biocomput, P604
[7]   SherLoc2: A High-Accuracy Hybrid Method for Predicting Subcellular Localization of Proteins [J].
Briesemeister, Sebastian ;
Blum, Torsten ;
Brady, Scott ;
Lam, Yin ;
Kohlbacher, Oliver ;
Shatkay, Hagit .
JOURNAL OF PROTEOME RESEARCH, 2009, 8 (11) :5363-5366
[8]  
Casadio Rita, 2008, Briefings in Functional Genomics & Proteomics, V7, P63, DOI 10.1093/bfgp/eln003
[9]   LIBSVM: A Library for Support Vector Machines [J].
Chang, Chih-Chung ;
Lin, Chih-Jen .
ACM TRANSACTIONS ON INTELLIGENT SYSTEMS AND TECHNOLOGY, 2011, 2 (03)
[10]   Prediction of protein subcellular locations by GO-FunD-PseAA predictor [J].
Chou, KC ;
Cai, YD .
BIOCHEMICAL AND BIOPHYSICAL RESEARCH COMMUNICATIONS, 2004, 320 (04) :1236-1239