Sequence2Vec: a novel embedding approach for modeling transcription factor binding affinity landscape

被引:33
作者
Dai, Hanjun [1 ]
Umarov, Ramzan [2 ]
Kuwahara, Hiroyuki [2 ]
Li, Yu [2 ]
Song, Le [1 ]
Gao, Xin [2 ]
机构
[1] Georgia Inst Technol, Coll Comp, Atlanta, GA 30332 USA
[2] KAUST, CBRC, Comp Elect & Math Sci & Engn CEMSE Div, Thuwal 239556900, Saudi Arabia
关键词
PARAMETER-ESTIMATION; MASTER REGULATOR; GENE-EXPRESSION; DNA; PROTEIN; SITES; SPECIFICITIES; GCN4; MICROARRAYS; STARVATION;
D O I
10.1093/bioinformatics/btx480
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
An accurate characterization of transcription factor (TF)-DNA affinity landscape is crucial to a quantitative understanding of the molecular mechanisms underpinning endogenous gene regulation. While recent advances in biotechnology have brought the opportunity for building binding affinity prediction methods, the accurate characterization of TF-DNA binding affinity landscape still remains a challenging problem. Here we propose a novel sequence embedding approach for modeling the transcription factor binding affinity landscape. Our method represents DNA binding sequences as a hidden Markov model which captures both position specific information and long-range dependency in the sequence. A cornerstone of our method is a novel message passing-like embedding algorithm, called Sequence2Vec, which maps these hidden Markov models into a common nonlinear feature space and uses these embedded features to build a predictive model. Our method is a novel combination of the strength of probabilistic graphical models, feature space embedding and deep learning. We conducted comprehensive experiments on over 90 large-scale TF-DNA datasets which were measured by different high-throughput experimental technologies. Sequence2Vec outperforms alternative machine learning methods as well as the state-of-the-art binding affinity prediction methods.
引用
收藏
页码:3575 / 3583
页数:9
相关论文
共 59 条
[1]   High Resolution Models of Transcription Factor-DNA Affinities Improve In Vitro and In Vivo Binding Predictions [J].
Agius, Phaedra ;
Arvey, Aaron ;
Chang, William ;
Noble, William Stafford ;
Leslie, Christina .
PLOS COMPUTATIONAL BIOLOGY, 2010, 6 (09)
[2]  
Alberts B., 2002, Molecular Biology of the Cell. (4th edition), V4th ed
[3]   Predicting the sequence specificities of DNA- and RNA-binding proteins by deep learning [J].
Alipanahi, Babak ;
Delong, Andrew ;
Weirauch, Matthew T. ;
Frey, Brendan J. .
NATURE BIOTECHNOLOGY, 2015, 33 (08) :831-+
[4]   A Linear Model for Transcription Factor Binding Affinity Prediction in Protein Binding Microarrays [J].
Annala, Matti ;
Laurila, Kirsti ;
Lahdesmaki, Harri ;
Nykter, Matti .
PLOS ONE, 2011, 6 (05)
[5]  
[Anonymous], 2002, approach
[6]  
[Anonymous], 2004, Kernel methods in computational biology
[7]   Survey of variation in human transcription factors reveals prevalent DNA binding changes [J].
Barrera, Luis A. ;
Vedenko, Anastasia ;
Kurland, Jesse V. ;
Rogers, Julia M. ;
Gisselbrecht, Stephen S. ;
Rossin, Elizabeth J. ;
Woodard, Jaie ;
Mariani, Luca ;
Kock, Kian Hong ;
Inukai, Sachi ;
Siggers, Trevor ;
Shokri, Leila ;
Gordan, Raluca ;
Sahni, Nidhi ;
Cotsapas, Chris ;
Hao, Tong ;
Yi, Song ;
Kellis, Manolis ;
Daly, Mark J. ;
Vidal, Marc ;
Hill, David E. ;
Bulyk, Martha L. .
SCIENCE, 2016, 351 (6280) :1450-1454
[8]   UNIVERSAL APPROXIMATION BOUNDS FOR SUPERPOSITIONS OF A SIGMOIDAL FUNCTION [J].
BARRON, AR .
IEEE TRANSACTIONS ON INFORMATION THEORY, 1993, 39 (03) :930-945
[9]   Learning Deep Architectures for AI [J].
Bengio, Yoshua .
FOUNDATIONS AND TRENDS IN MACHINE LEARNING, 2009, 2 (01) :1-127
[10]  
Berger Michael F., 2006, V338, P245