共 33 条
Predicting Gene Ontology functions based on support vector machines and statistical significance estimation
被引:12
作者:

Bi, Ran
论文数: 0 引用数: 0
h-index: 0
机构:
Huazhong Univ Sci & Technol, Hubei Bioinformat & Mol Imaging Key Lab, Wuhan 430074, Hubei, Peoples R China Huazhong Univ Sci & Technol, Hubei Bioinformat & Mol Imaging Key Lab, Wuhan 430074, Hubei, Peoples R China

Zhou, Yanhong
论文数: 0 引用数: 0
h-index: 0
机构:
Huazhong Univ Sci & Technol, Hubei Bioinformat & Mol Imaging Key Lab, Wuhan 430074, Hubei, Peoples R China Huazhong Univ Sci & Technol, Hubei Bioinformat & Mol Imaging Key Lab, Wuhan 430074, Hubei, Peoples R China

Lu, Feng
论文数: 0 引用数: 0
h-index: 0
机构:
Huazhong Univ Sci & Technol, Hubei Bioinformat & Mol Imaging Key Lab, Wuhan 430074, Hubei, Peoples R China Huazhong Univ Sci & Technol, Hubei Bioinformat & Mol Imaging Key Lab, Wuhan 430074, Hubei, Peoples R China

Wang, Weiqiang
论文数: 0 引用数: 0
h-index: 0
机构:
Huazhong Univ Sci & Technol, Hubei Bioinformat & Mol Imaging Key Lab, Wuhan 430074, Hubei, Peoples R China Huazhong Univ Sci & Technol, Hubei Bioinformat & Mol Imaging Key Lab, Wuhan 430074, Hubei, Peoples R China
机构:
[1] Huazhong Univ Sci & Technol, Hubei Bioinformat & Mol Imaging Key Lab, Wuhan 430074, Hubei, Peoples R China
基金:
中国国家自然科学基金;
关键词:
protein function;
Gene Ontology;
support vector machines;
statistical significance;
D O I:
10.1016/j.neucom.2006.10.006
中图分类号:
TP18 [人工智能理论];
学科分类号:
081104 ;
0812 ;
0835 ;
1405 ;
摘要:
Gene Ontology (GO) is a common language for the functional annotation of gene products. We have developed a computational tool, GOKey, to predict the GO function of proteins based on their sequence features and the support vector machine (SVM) method. Several measures, including improved handling of the problem caused by unbalanced positive and negative training data and postprocessing strategies to evaluate the posterior probability and statistical significance of SVM outputs, have been adopted to improve the prediction performance of GOKey. The GOKey has been trained to predict the 36 GO categories of the 'molecular function' of GO slims, and could be easily extended to other GO categories. The results of 5-fold cross validation with 10,603 GO-mapped proteins demonstrate that the performance of GOKey is better than that of standard SVMs. Comparisons with other computational tools for GO function prediction also show that the performance of GOKey is satisfactory. Further, GOKey has been applied to predict the GO functions for 5381 novel human proteins in the Ensembl database. The results show that 93% of the novel proteins can be assigned one or more GO terms, and some evidences supporting the predictions have been found. GOKey can be accessed at http://infosci.hust.edu.cn. (c) 2006 Published by Elsevier B.V.
引用
收藏
页码:718 / 725
页数:8
相关论文
共 33 条
[1]
Gene Ontology: tool for the unification of biology
[J].
Ashburner, M
;
Ball, CA
;
Blake, JA
;
Botstein, D
;
Butler, H
;
Cherry, JM
;
Davis, AP
;
Dolinski, K
;
Dwight, SS
;
Eppig, JT
;
Harris, MA
;
Hill, DP
;
Issel-Tarver, L
;
Kasarskis, A
;
Lewis, S
;
Matese, JC
;
Richardson, JE
;
Ringwald, M
;
Rubin, GM
;
Sherlock, G
.
NATURE GENETICS,
2000, 25 (01)
:25-29

Ashburner, M
论文数: 0 引用数: 0
h-index: 0
机构:
Stanford Univ, Sch Med, Dept Genet, Stanford, CA 94305 USA Stanford Univ, Sch Med, Dept Genet, Stanford, CA 94305 USA

Ball, CA
论文数: 0 引用数: 0
h-index: 0
机构:
Stanford Univ, Sch Med, Dept Genet, Stanford, CA 94305 USA Stanford Univ, Sch Med, Dept Genet, Stanford, CA 94305 USA

Blake, JA
论文数: 0 引用数: 0
h-index: 0
机构:
Stanford Univ, Sch Med, Dept Genet, Stanford, CA 94305 USA Stanford Univ, Sch Med, Dept Genet, Stanford, CA 94305 USA

Botstein, D
论文数: 0 引用数: 0
h-index: 0
机构:
Stanford Univ, Sch Med, Dept Genet, Stanford, CA 94305 USA Stanford Univ, Sch Med, Dept Genet, Stanford, CA 94305 USA

Butler, H
论文数: 0 引用数: 0
h-index: 0
机构:
Stanford Univ, Sch Med, Dept Genet, Stanford, CA 94305 USA Stanford Univ, Sch Med, Dept Genet, Stanford, CA 94305 USA

Cherry, JM
论文数: 0 引用数: 0
h-index: 0
机构:
Stanford Univ, Sch Med, Dept Genet, Stanford, CA 94305 USA Stanford Univ, Sch Med, Dept Genet, Stanford, CA 94305 USA

Davis, AP
论文数: 0 引用数: 0
h-index: 0
机构:
Stanford Univ, Sch Med, Dept Genet, Stanford, CA 94305 USA Stanford Univ, Sch Med, Dept Genet, Stanford, CA 94305 USA

Dolinski, K
论文数: 0 引用数: 0
h-index: 0
机构:
Stanford Univ, Sch Med, Dept Genet, Stanford, CA 94305 USA Stanford Univ, Sch Med, Dept Genet, Stanford, CA 94305 USA

Dwight, SS
论文数: 0 引用数: 0
h-index: 0
机构:
Stanford Univ, Sch Med, Dept Genet, Stanford, CA 94305 USA Stanford Univ, Sch Med, Dept Genet, Stanford, CA 94305 USA

Eppig, JT
论文数: 0 引用数: 0
h-index: 0
机构:
Stanford Univ, Sch Med, Dept Genet, Stanford, CA 94305 USA Stanford Univ, Sch Med, Dept Genet, Stanford, CA 94305 USA

Harris, MA
论文数: 0 引用数: 0
h-index: 0
机构:
Stanford Univ, Sch Med, Dept Genet, Stanford, CA 94305 USA Stanford Univ, Sch Med, Dept Genet, Stanford, CA 94305 USA

Hill, DP
论文数: 0 引用数: 0
h-index: 0
机构:
Stanford Univ, Sch Med, Dept Genet, Stanford, CA 94305 USA Stanford Univ, Sch Med, Dept Genet, Stanford, CA 94305 USA

Issel-Tarver, L
论文数: 0 引用数: 0
h-index: 0
机构:
Stanford Univ, Sch Med, Dept Genet, Stanford, CA 94305 USA Stanford Univ, Sch Med, Dept Genet, Stanford, CA 94305 USA

Kasarskis, A
论文数: 0 引用数: 0
h-index: 0
机构:
Stanford Univ, Sch Med, Dept Genet, Stanford, CA 94305 USA Stanford Univ, Sch Med, Dept Genet, Stanford, CA 94305 USA

Lewis, S
论文数: 0 引用数: 0
h-index: 0
机构:
Stanford Univ, Sch Med, Dept Genet, Stanford, CA 94305 USA Stanford Univ, Sch Med, Dept Genet, Stanford, CA 94305 USA

Matese, JC
论文数: 0 引用数: 0
h-index: 0
机构:
Stanford Univ, Sch Med, Dept Genet, Stanford, CA 94305 USA Stanford Univ, Sch Med, Dept Genet, Stanford, CA 94305 USA

Richardson, JE
论文数: 0 引用数: 0
h-index: 0
机构:
Stanford Univ, Sch Med, Dept Genet, Stanford, CA 94305 USA Stanford Univ, Sch Med, Dept Genet, Stanford, CA 94305 USA

Ringwald, M
论文数: 0 引用数: 0
h-index: 0
机构:
Stanford Univ, Sch Med, Dept Genet, Stanford, CA 94305 USA Stanford Univ, Sch Med, Dept Genet, Stanford, CA 94305 USA

Rubin, GM
论文数: 0 引用数: 0
h-index: 0
机构:
Stanford Univ, Sch Med, Dept Genet, Stanford, CA 94305 USA Stanford Univ, Sch Med, Dept Genet, Stanford, CA 94305 USA

Sherlock, G
论文数: 0 引用数: 0
h-index: 0
机构:
Stanford Univ, Sch Med, Dept Genet, Stanford, CA 94305 USA Stanford Univ, Sch Med, Dept Genet, Stanford, CA 94305 USA
[2]
Genomics - The babel of bioinformatics
[J].
Attwood, TK
.
SCIENCE,
2000, 290 (5491)
:471-473

Attwood, TK
论文数: 0 引用数: 0
h-index: 0
机构:
Univ Manchester, Sch Biol Sci, Manchester, Lancs, England Univ Manchester, Sch Biol Sci, Manchester, Lancs, England
[3]
Predicting protein-protein interactions from primary structure
[J].
Bock, JR
;
Gough, DA
.
BIOINFORMATICS,
2001, 17 (05)
:455-460

Bock, JR
论文数: 0 引用数: 0
h-index: 0
机构:
Univ Calif San Diego, Dept Bioengn, La Jolla, CA 92093 USA Univ Calif San Diego, Dept Bioengn, La Jolla, CA 92093 USA

Gough, DA
论文数: 0 引用数: 0
h-index: 0
机构:
Univ Calif San Diego, Dept Bioengn, La Jolla, CA 92093 USA Univ Calif San Diego, Dept Bioengn, La Jolla, CA 92093 USA
[4]
The SWISS-PROT protein knowledgebase and its supplement TrEMBL in 2003
[J].
Boeckmann, B
;
Bairoch, A
;
Apweiler, R
;
Blatter, MC
;
Estreicher, A
;
Gasteiger, E
;
Martin, MJ
;
Michoud, K
;
O'Donovan, C
;
Phan, I
;
Pilbout, S
;
Schneider, M
.
NUCLEIC ACIDS RESEARCH,
2003, 31 (01)
:365-370

Boeckmann, B
论文数: 0 引用数: 0
h-index: 0
机构: Ctr Med Univ Geneva, Swiss Inst Bioinformat, CH-1211 Geneva 4, Switzerland

Bairoch, A
论文数: 0 引用数: 0
h-index: 0
机构: Ctr Med Univ Geneva, Swiss Inst Bioinformat, CH-1211 Geneva 4, Switzerland

论文数: 引用数:
h-index:
机构:

Blatter, MC
论文数: 0 引用数: 0
h-index: 0
机构: Ctr Med Univ Geneva, Swiss Inst Bioinformat, CH-1211 Geneva 4, Switzerland

Estreicher, A
论文数: 0 引用数: 0
h-index: 0
机构: Ctr Med Univ Geneva, Swiss Inst Bioinformat, CH-1211 Geneva 4, Switzerland

Gasteiger, E
论文数: 0 引用数: 0
h-index: 0
机构: Ctr Med Univ Geneva, Swiss Inst Bioinformat, CH-1211 Geneva 4, Switzerland

Martin, MJ
论文数: 0 引用数: 0
h-index: 0
机构: Ctr Med Univ Geneva, Swiss Inst Bioinformat, CH-1211 Geneva 4, Switzerland

Michoud, K
论文数: 0 引用数: 0
h-index: 0
机构: Ctr Med Univ Geneva, Swiss Inst Bioinformat, CH-1211 Geneva 4, Switzerland

论文数: 引用数:
h-index:
机构:

Phan, I
论文数: 0 引用数: 0
h-index: 0
机构: Ctr Med Univ Geneva, Swiss Inst Bioinformat, CH-1211 Geneva 4, Switzerland

Pilbout, S
论文数: 0 引用数: 0
h-index: 0
机构: Ctr Med Univ Geneva, Swiss Inst Bioinformat, CH-1211 Geneva 4, Switzerland

论文数: 引用数:
h-index:
机构:
[5]
Predicting function: From genes to genomes and back
[J].
Bork, P
;
Dandekar, T
;
Diaz-Lazcoz, Y
;
Eisenhaber, F
;
Huynen, M
;
Yuan, YP
.
JOURNAL OF MOLECULAR BIOLOGY,
1998, 283 (04)
:707-725

Bork, P
论文数: 0 引用数: 0
h-index: 0
机构: European Mol Biol Lab, D-69117 Heidelberg, Germany

Dandekar, T
论文数: 0 引用数: 0
h-index: 0
机构: European Mol Biol Lab, D-69117 Heidelberg, Germany

Diaz-Lazcoz, Y
论文数: 0 引用数: 0
h-index: 0
机构: European Mol Biol Lab, D-69117 Heidelberg, Germany

Eisenhaber, F
论文数: 0 引用数: 0
h-index: 0
机构: European Mol Biol Lab, D-69117 Heidelberg, Germany

Huynen, M
论文数: 0 引用数: 0
h-index: 0
机构: European Mol Biol Lab, D-69117 Heidelberg, Germany

Yuan, YP
论文数: 0 引用数: 0
h-index: 0
机构: European Mol Biol Lab, D-69117 Heidelberg, Germany
[6]
Knowledge-based analysis of microarray gene expression data by using support vector machines
[J].
Brown, MPS
;
Grundy, WN
;
Lin, D
;
Cristianini, N
;
Sugnet, CW
;
Furey, TS
;
Ares, M
;
Haussler, D
.
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA,
2000, 97 (01)
:262-267

Brown, MPS
论文数: 0 引用数: 0
h-index: 0
机构: Univ Calif Santa Cruz, Dept Comp Sci, Santa Cruz, CA 95064 USA

Grundy, WN
论文数: 0 引用数: 0
h-index: 0
机构: Univ Calif Santa Cruz, Dept Comp Sci, Santa Cruz, CA 95064 USA

Lin, D
论文数: 0 引用数: 0
h-index: 0
机构: Univ Calif Santa Cruz, Dept Comp Sci, Santa Cruz, CA 95064 USA

Cristianini, N
论文数: 0 引用数: 0
h-index: 0
机构: Univ Calif Santa Cruz, Dept Comp Sci, Santa Cruz, CA 95064 USA

Sugnet, CW
论文数: 0 引用数: 0
h-index: 0
机构: Univ Calif Santa Cruz, Dept Comp Sci, Santa Cruz, CA 95064 USA

Furey, TS
论文数: 0 引用数: 0
h-index: 0
机构: Univ Calif Santa Cruz, Dept Comp Sci, Santa Cruz, CA 95064 USA

Ares, M
论文数: 0 引用数: 0
h-index: 0
机构: Univ Calif Santa Cruz, Dept Comp Sci, Santa Cruz, CA 95064 USA

Haussler, D
论文数: 0 引用数: 0
h-index: 0
机构: Univ Calif Santa Cruz, Dept Comp Sci, Santa Cruz, CA 95064 USA
[7]
A tutorial on Support Vector Machines for pattern recognition
[J].
Burges, CJC
.
DATA MINING AND KNOWLEDGE DISCOVERY,
1998, 2 (02)
:121-167

Burges, CJC
论文数: 0 引用数: 0
h-index: 0
机构:
Lucent Technol, Bell Labs, Murray Hill, NJ 07974 USA Lucent Technol, Bell Labs, Murray Hill, NJ 07974 USA
[8]
Enzyme family classification by support vector machines
[J].
Cai, CZ
;
Han, LY
;
Ji, ZL
;
Chen, YZ
.
PROTEINS-STRUCTURE FUNCTION AND BIOINFORMATICS,
2004, 55 (01)
:66-76

Cai, CZ
论文数: 0 引用数: 0
h-index: 0
机构: Natl Univ Singapore, Dept Computat Sci, Singapore 117543, Singapore

Han, LY
论文数: 0 引用数: 0
h-index: 0
机构: Natl Univ Singapore, Dept Computat Sci, Singapore 117543, Singapore

Ji, ZL
论文数: 0 引用数: 0
h-index: 0
机构: Natl Univ Singapore, Dept Computat Sci, Singapore 117543, Singapore

Chen, YZ
论文数: 0 引用数: 0
h-index: 0
机构: Natl Univ Singapore, Dept Computat Sci, Singapore 117543, Singapore
[9]
SVM-Prot: web-based support vector machine software for functional classification of a protein from its primary sequence
[J].
Cai, CZ
;
Han, LY
;
Ji, ZL
;
Chen, X
;
Chen, YZ
.
NUCLEIC ACIDS RESEARCH,
2003, 31 (13)
:3692-3697

Cai, CZ
论文数: 0 引用数: 0
h-index: 0
机构: Natl Univ Singapore, Dept Computat Sci, Singapore 117543, Singapore

Han, LY
论文数: 0 引用数: 0
h-index: 0
机构: Natl Univ Singapore, Dept Computat Sci, Singapore 117543, Singapore

Ji, ZL
论文数: 0 引用数: 0
h-index: 0
机构: Natl Univ Singapore, Dept Computat Sci, Singapore 117543, Singapore

Chen, X
论文数: 0 引用数: 0
h-index: 0
机构: Natl Univ Singapore, Dept Computat Sci, Singapore 117543, Singapore

Chen, YZ
论文数: 0 引用数: 0
h-index: 0
机构: Natl Univ Singapore, Dept Computat Sci, Singapore 117543, Singapore
[10]
The gene ontology annotation (GOA) project: Implementation of GO in SWISS-PROT, TrEMBL, and InterPro
[J].
Camon, E
;
Magrane, M
;
Barrell, D
;
Binns, D
;
Fleischmann, W
;
Kersey, P
;
Mulder, N
;
Oinn, T
;
Maslen, J
;
Cox, A
;
Apweiler, R
.
GENOME RESEARCH,
2003, 13 (04)
:662-672

Camon, E
论文数: 0 引用数: 0
h-index: 0
机构: European Bioinformat Inst, EMBL Outstn, Cambridge CB10 1SD, England

论文数: 引用数:
h-index:
机构:

论文数: 引用数:
h-index:
机构:

Binns, D
论文数: 0 引用数: 0
h-index: 0
机构: European Bioinformat Inst, EMBL Outstn, Cambridge CB10 1SD, England

Fleischmann, W
论文数: 0 引用数: 0
h-index: 0
机构: European Bioinformat Inst, EMBL Outstn, Cambridge CB10 1SD, England

Kersey, P
论文数: 0 引用数: 0
h-index: 0
机构: European Bioinformat Inst, EMBL Outstn, Cambridge CB10 1SD, England

Mulder, N
论文数: 0 引用数: 0
h-index: 0
机构: European Bioinformat Inst, EMBL Outstn, Cambridge CB10 1SD, England

Oinn, T
论文数: 0 引用数: 0
h-index: 0
机构: European Bioinformat Inst, EMBL Outstn, Cambridge CB10 1SD, England

Maslen, J
论文数: 0 引用数: 0
h-index: 0
机构: European Bioinformat Inst, EMBL Outstn, Cambridge CB10 1SD, England

Cox, A
论文数: 0 引用数: 0
h-index: 0
机构: European Bioinformat Inst, EMBL Outstn, Cambridge CB10 1SD, England

论文数: 引用数:
h-index:
机构: