The influence of alignment-free sequence representations on the semi-supervised classification of class C G protein-coupled receptors

被引:10
作者
Cruz-Barbosa, Raul [1 ]
Vellido, Alfredo [2 ,3 ]
Giraldo, Jesus [4 ,5 ]
机构
[1] Univ Tecnol Mixteca, Inst Comp Sci, Huajuapan, Oaxaca, Mexico
[2] Univ Politecn Cataluna, BarcelonaTech, Dept Ciencies Comp, Barcelona, Spain
[3] Ctr Invest Biomed Red Bioingn Biomat Nanomed CIBE, Barcelona, Spain
[4] Univ Autonoma Barcelona, Inst Neurociencies, Bellaterra, Spain
[5] Univ Autonoma Barcelona, Unitat Bioestadast, Bellaterra, Spain
关键词
Class C G protein-coupled receptors; Semi-supervised learning; Alignment-free sequence representations; PARTIAL LEAST-SQUARES; SEROTONIN RECEPTORS; DRUG TARGETS; PHARMACOLOGY; INFORMATION;
D O I
10.1007/s11517-014-1218-y
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
G protein-coupled receptors (GPCRs) are integral cell membrane proteins of relevance for pharmacology. The tertiary structure of the transmembrane domain, a gate to the study of protein functionality, is unknown for almost all members of class C GPCRs, which are the target of the current study. As a result, their investigation must often rely on alignments of their amino acid sequences. Sequence alignment entails the risk of missing relevant information. Various approaches have attempted to circumvent this risk through alignment-free transformations of the sequences on the basis of different amino acid physicochemical properties. In this paper, we use several of these alignment-free methods, as well as a basic amino acid composition representation, to transform the available sequences. Novel semi-supervised statistical machine learning methods are then used to discriminate the different class C GPCRs types from the transformed data. This approach is relevant due to the existence of orphan proteins to which type labels should be assigned in a process of deorphanization or reverse pharmacology. The reported experiments show that the proposed techniques provide accurate classification even in settings of extreme class-label scarcity and that fair accuracy can be achieved even with very simple transformation strategies that ignore the sequence ordering.
引用
收藏
页码:137 / 149
页数:13
相关论文
共 44 条
[1]   THE CONCISE GUIDE TO PHARMACOLOGY 2013/14: G PROTEIN-COUPLED RECEPTORS [J].
Alexander, Stephen P. H. ;
Benson, Helen E. ;
Faccenda, Elena ;
Pawson, Adam J. ;
Sharman, Joanna L. ;
Spedding, Michael ;
Peters, John A. ;
Harmar, Anthony J. .
BRITISH JOURNAL OF PHARMACOLOGY, 2013, 170 (08) :1459-1581
[2]  
Aliferis CF, 2006, CANCER INFORM, V2, P133
[3]  
Bengio Y, 2006, 11 label propagation and quadratic criterion
[4]   GTM: The generative topographic mapping [J].
Bishop, CM ;
Svensen, M ;
Williams, CKI .
NEURAL COMPUTATION, 1998, 10 (01) :215-234
[5]  
Branden C., 1999, INTRO PROTEIN STRUCT
[6]  
Cardenas M.I., 2012, LNCS LNBI, V7548, P136
[7]  
Cruz-Barbosa R, 2013, PROCEEDINGS IWBBIO 2013: INTERNATIONAL WORK-CONFERENCE ON BIOINFORMATICS AND BIOMEDICAL ENGINEERING, P759
[8]   SEMI-SUPERVISED ANALYSIS OF HUMAN BRAIN TUMOURS FROM PARTIALLY LABELED MRS INFORMATION, USING MANIFOLD LEARNING MODELS [J].
Cruz-Barbosa, Raul ;
Vellido, Alfredo .
INTERNATIONAL JOURNAL OF NEURAL SYSTEMS, 2011, 21 (01) :17-29
[9]   Semi-supervised geodesic Generative Topographic Mapping [J].
Cruz-Barbosa, Raul ;
Vellido, Alfredo .
PATTERN RECOGNITION LETTERS, 2010, 31 (03) :202-209
[10]   On the hierarchical classification of G protein-coupled receptors [J].
Davies, Matthew N. ;
Secker, Andrew ;
Freitas, Alex A. ;
Mendao, Miguel ;
Timmis, Jon ;
Flower, Darren R. .
BIOINFORMATICS, 2007, 23 (23) :3113-3118