dSPRINT: predicting DNA, RNA, ion, peptide and small molecule interaction sites within protein domains

被引:5
作者
Etzion-Fuchs, Anat [1 ]
Todd, David A. [2 ]
Singh, Mona [1 ,2 ]
机构
[1] Princeton Univ, Lewis Sigler Inst Integrat Genom, Carl Icahn Lab, Princeton, NJ 08544 USA
[2] Princeton Univ, Dept Comp Sci, 35 Olden St, Princeton, NJ 08544 USA
基金
美国国家科学基金会;
关键词
BINDING-SITES; RESIDUES; EXPRESSION; EVOLUTION; CENSUS; FAMILY; SERVER;
D O I
10.1093/nar/gkab356
中图分类号
Q5 [生物化学]; Q7 [分子生物学];
学科分类号
071010 ; 081704 ;
摘要
Domains are instrumental in facilitating protein interactions with DNA, RNA, small molecules, ions and peptides. Identifying ligand-binding domains within sequences is a critical step in protein function annotation, and the ligand-binding properties of proteins are frequently analyzed based upon whether they contain one of these domains. To date, however, knowledge of whether and how protein domains interact with ligands has been limited to domains that have been observed in co-crystal structures; this leaves approximately two-thirds of human protein domain families uncharacterized with respect to whether and how they bind DNA, RNA, small molecules, ions and peptides. To fill this gap, we introduce dSPRINT, a novel ensemble machine learning method for predicting whether a domain binds DNA, RNA, small molecules, ions or peptides, along with the positions within it that participate in these types of interactions. In stringent cross-validation testing, we demonstrate that dSPRINT has an excellent performance in uncovering ligand-binding positions and domains. We also apply dSPRINT to newly characterize the molecular functions of domains of unknown function. dSPRINT's predictions can be transferred from domains to sequences, enabling predictions about the ligand-binding properties of 95% of human genes. The dSPRINT framework and its predictions for 6503 human protein domains are freely available at http://protdomain.princeton.edu/dsprint.
引用
收藏
页数:13
相关论文
共 67 条
[1]   A method and server for predicting damaging missense mutations [J].
Adzhubei, Ivan A. ;
Schmidt, Steffen ;
Peshkin, Leonid ;
Ramensky, Vasily E. ;
Gerasimova, Anna ;
Bork, Peer ;
Kondrashov, Alexey S. ;
Sunyaev, Shamil R. .
NATURE METHODS, 2010, 7 (04) :248-249
[2]  
Aggarwal C.C., 2014, DATA CLASSIFICATION, P498
[3]   Domain combinations in archaeal, eubacterial and eukaryotic proteomes [J].
Apic, G ;
Gough, J ;
Teichmann, SA .
JOURNAL OF MOLECULAR BIOLOGY, 2001, 310 (02) :311-325
[4]   Multi-disciplinary methods to define RNA-protein interactions and regulatory networks [J].
Ascano, Manuel ;
Gerstberger, Stefanie ;
Tuschl, Thomas .
CURRENT OPINION IN GENETICS & DEVELOPMENT, 2013, 23 (01) :20-28
[5]  
Bergstra J, 2012, J MACH LEARN RES, V13, P281
[6]   Homeodomain proteins: an update [J].
Buerglin, Thomas R. ;
Affolter, Markus .
CHROMOSOMA, 2016, 125 (03) :497-521
[7]   Characterization and prediction of residues determining protein functional specificity [J].
Capra, John A. ;
Singh, Mona .
BIOINFORMATICS, 2008, 24 (13) :1473-1480
[8]   Predicting functionally important residues from sequence conservation [J].
Capra, John A. ;
Singh, Mona .
BIOINFORMATICS, 2007, 23 (15) :1875-1882
[9]   Comprehensive Identification of RNA-Binding Domains in Human Cells [J].
Castello, Alfredo ;
Fischer, Bernd ;
Frese, Christian K. ;
Horos, Rastislav ;
Alleaume, Anne-Marie ;
Foehr, Sophia ;
Curk, Tomaz ;
Krijgsveld, Jeroen ;
Hentze, Matthias W. .
MOLECULAR CELL, 2016, 63 (04) :696-710
[10]   Insights into RNA Biology from an Atlas of Mammalian mRNA-Binding Proteins [J].
Castello, Alfredo ;
Fischer, Bernd ;
Eichelbaum, Katrin ;
Horos, Rastislav ;
Beckmann, Benedikt M. ;
Strein, Claudia ;
Davey, Norman E. ;
Humphreys, David T. ;
Preiss, Thomas ;
Steinmetz, Lars M. ;
Krijgsveld, Jeroen ;
Hentze, Matthias W. .
CELL, 2012, 149 (06) :1393-1406