Hierarchical Cost-Sensitive Algorithms for Genome-Wide Gene Function Prediction

被引:0
作者
Cesa-Bianchi, Nicolo [1 ]
Valentini, Giorgio [1 ]
机构
[1] Univ Milan, DSI, Via Comelico 39, I-20135 Milan, Italy
来源
PROCEEDINGS OF THE THIRD INTERNATIONAL WORKSHOP ON MACHINE LEARNING IN SYSTEMS BIOLOGY | 2010年 / 8卷
关键词
Hierarchical classification; Gene function prediction; Bayesian ensembles; Cost-sensitive classification; FunCat taxonomy; PROBABILISTIC OUTPUTS; PROTEIN; CLASSIFICATION;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In this work we propose new ensemble methods for the hierarchical classification of gene functions. Our methods exploit the hierarchical relationships between the classes in different ways: each ensemble node is trained "locally", according to its position in the hierarchy; moreover, in the evaluation phase the set of predicted annotations is built so to minimize a global loss function defined over the hierarchy. We also address the problem of sparsity of annotations by introducing a cost-sensitive parameter that allows to control the precision-recall trade-off. Experiments with the model organism S. cerevisiae, using the FunCat taxonomy and seven biomolecular data sets, reveal a significant advantage of our techniques over "flat" and cost-insensitive hierarchical ensembles.
引用
收藏
页码:14 / 29
页数:16
相关论文
共 22 条
[1]  
[Anonymous], 2005, Advances in neural information processing systems
[2]   Gene Ontology: tool for the unification of biology [J].
Ashburner, M ;
Ball, CA ;
Blake, JA ;
Botstein, D ;
Butler, H ;
Cherry, JM ;
Davis, AP ;
Dolinski, K ;
Dwight, SS ;
Eppig, JT ;
Harris, MA ;
Hill, DP ;
Issel-Tarver, L ;
Kasarskis, A ;
Lewis, S ;
Matese, JC ;
Richardson, JE ;
Ringwald, M ;
Rubin, GM ;
Sherlock, G .
NATURE GENETICS, 2000, 25 (01) :25-29
[3]  
Astikainen Katja, 2008, BMC Proc, V2 Suppl 4, pS2
[4]  
Cesa-Bianchi N., 2006, Proceedings of the 23rd international conference on Machine learning, P177
[5]  
Demsar J, 2006, J MACH LEARN RES, V7, P1
[6]   Approximate statistical tests for comparing supervised classification learning algorithms [J].
Dietterich, TG .
NEURAL COMPUTATION, 1998, 10 (07) :1895-1923
[7]   Profile hidden Markov models [J].
Eddy, SR .
BIOINFORMATICS, 1998, 14 (09) :755-763
[8]   The Pfam protein families database [J].
Finn, Robert D. ;
Tate, John ;
Mistry, Jaina ;
Coggill, Penny C. ;
Sammut, Stephen John ;
Hotz, Hans-Rudolf ;
Ceric, Goran ;
Forslund, Kristoffer ;
Eddy, Sean R. ;
Sonnhammer, Erik L. L. ;
Bateman, Alex .
NUCLEIC ACIDS RESEARCH, 2008, 36 :D281-D288
[9]   Genomic expression programs in the response of yeast cells to environmental changes [J].
Gasch, AP ;
Spellman, PT ;
Kao, CM ;
Carmel-Harel, O ;
Eisen, MB ;
Storz, G ;
Botstein, D ;
Brown, PO .
MOLECULAR BIOLOGY OF THE CELL, 2000, 11 (12) :4241-4257
[10]   Predicting gene function in a hierarchical context with an ensemble of classifiers [J].
Guan, Yuanfang ;
Myers, Chad L. ;
Hess, David C. ;
Barutcuoglu, Zafer ;
Caudy, Amy A. ;
Troyanskaya, Olga G. .
GENOME BIOLOGY, 2008, 9 (Suppl 1)