Semi-supervised multi-task learning for predicting interactions between HIV-1 and human proteins

被引:96
作者
Qi, Yanjun [1 ]
Tastan, Oznur [2 ]
Carbonell, Jaime G. [2 ]
Klein-Seetharaman, Judith [2 ]
Weston, Jason [3 ]
机构
[1] NEC Labs Amer, Princeton, NJ 08540 USA
[2] Carnegie Mellon Univ, Sch Comp Sci, Pittsburgh, PA 15213 USA
[3] Google Res NY, New York, NY 10011 USA
关键词
IDENTIFICATION; NETWORKS; MAP;
D O I
10.1093/bioinformatics/btq394
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Motivation: Protein-protein interactions (PPIs) are critical for virtually every biological function. Recently, researchers suggested to use supervised learning for the task of classifying pairs of proteins as interacting or not. However, its performance is largely restricted by the availability of truly interacting proteins ( labeled). Meanwhile, there exists a considerable amount of protein pairs where an association appears between two partners, but not enough experimental evidence to support it as a direct interaction ( partially labeled). Results: We propose a semi-supervised multi-task framework for predicting PPIs from not only labeled, but also partially labeled reference sets. The basic idea is to perform multi-task learning on a supervised classification task and a semi-supervised auxiliary task. The supervised classifier trains a multi-layer perceptron network for PPI predictions from labeled examples. The semi-supervised auxiliary task shares network layers of the supervised classifier and trains with partially labeled examples. Semi-supervision could be utilized in multiple ways. We tried three approaches in this article, (i) classification (to distinguish partial positives with negatives); (ii) ranking (to rate partial positive more likely than negatives); (iii) embedding (to make data clusters get similar labels). We applied this framework to improve the identification of interacting pairs between HIV-1 and human proteins. Our method improved upon the state-of-the-art method for this task indicating the benefits of semi-supervised multi-task learning using auxiliary information.
引用
收藏
页码:i645 / i652
页数:8
相关论文
共 37 条
[1]  
[Anonymous], 2008, P 25 INT C MACH LEAR
[2]   Kernel methods for predicting protein-protein interactions [J].
Ben-Hur, A ;
Noble, WS .
BIOINFORMATICS, 2005, 21 :I38-I46
[3]  
Brass AL, 2008, SCIENCE, V319, P921, DOI 10.1126/science.1152725
[4]   Random forests [J].
Breiman, L .
MACHINE LEARNING, 2001, 45 (01) :5-32
[5]   Multitask learning [J].
Caruana, R .
MACHINE LEARNING, 1997, 28 (01) :41-75
[6]  
Chapelle Olivier, 2006, IEEE Transactions on Neural Networks, DOI DOI 10.1109/TNN.2009.2015974
[7]  
Cusick ME, 2009, NAT METHODS, V6, P39, DOI [10.1038/NMETH.1284, 10.1038/nmeth.1284]
[8]   Host-pathogen protein interactions predicted by comparative modeling [J].
Davis, Fred P. ;
Barkan, David T. ;
Eswar, Narayanan ;
Mckerrow, James H. ;
Sali, Andrej .
PROTEIN SCIENCE, 2007, 16 (12) :2585-2596
[9]   German History: A Silver Jubilee Editorial INTRODUCTION [J].
Evans, Richard J. ;
Fulbrook, Mary .
GERMAN HISTORY, 2009, 27 (01) :1-3
[10]   Human immunodeficiency virus type 1, human protein interaction database at NCBI [J].
Fu, William ;
Sanders-Beer, Brigitte E. ;
Katz, Kenneth S. ;
Maglott, Donna R. ;
Pruitt, Kim D. ;
Ptak, Roger G. .
NUCLEIC ACIDS RESEARCH, 2009, 37 :D417-D422