Collaborative Completion of Transcription Factor Binding Profiles via Local Sensitive Unified Embedding

被引:2
作者
Zhu, Lin [1 ]
Guo, Wei-Li [1 ]
Lu, Canyi [2 ]
Huang, De-Shuang [1 ]
机构
[1] Tongji Univ, Coll Elect & Informat Engn, Inst Machine Learning & Syst Biol, Shanghai 201804, Peoples R China
[2] Natl Univ Singapore, Dept Elect & Comp Engn, Singapore 117583, Singapore
基金
美国国家科学基金会; 中国博士后科学基金;
关键词
ChIP-seq; embedding model; gene regulation; high throughput sequencing data; regulatory information; transcription factor; PROBABILISTIC NEURAL-NETWORKS; CHIP-SEQ; MATRIX COMPLETION; PROTEIN INTERACTIONS; DISCOVERY; PREDICTION; IMPUTATION; LANDSCAPE; NEIGHBORS; ENHANCERS;
D O I
10.1109/TNB.2016.2625823
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Although the newly available ChIP-seq data provides immense opportunities for comparative study of regulatory activities across different biological conditions, due to cost, time or sample material availability, it is not always possible for researchers to obtain binding profiles for every protein in every sample of interest, which considerably limits the power of integrative studies. Recently, by leveraging related information from measured data, Ernst et al. proposed ChromImpute for predicting additional ChIP-seq and other types of datasets, it is demonstrated that the imputed signal tracks accurately approximate the experimentally measured signals, and thereby could potentially enhance the power of integrative analysis. Despite the success of ChromImpute, in this paper, we reexamine its learning process, and show that its performance may degrade substantially and sometimes may even fail to output a prediction when the available data is scarce. This limitation could hurt its applicability to important predictive tasks, such as the imputation of TF binding data. To alleviate this problem, we propose a novel method called Local Sensitive Unified Embedding (LSUE) for imputing new ChIP-seq datasets. In LSUE, the ChIP-seq data compendium are fused together by mapping proteins, samples, and genomic positions simultaneously into the Euclidean space, thereby making their underling associations directly evaluable using simple calculations. In contrast to ChromImpute which mainly makes use of the local correlations between available datasets, LSUE can better estimate the overall data structure by formulating the representation learning of all involved entities as a single unified optimization problem. Meanwhile, a novel form of local sensitive low rank regularization is also proposed to further improve the performance of LSUE. Experimental evaluations on the ENCODE TF ChIP-seq data illustrate the performance of the proposed model. The code of LSUE is available at https://github.com/ekffar/LSUE.
引用
收藏
页码:946 / 958
页数:13
相关论文
共 56 条
  • [1] Friends and neighbors on the Web
    Adamic, LA
    Adar, E
    [J]. SOCIAL NETWORKS, 2003, 25 (03) : 211 - 230
  • [2] BLUEPRINT to decode the epigenetic signature written in blood
    Adams, David
    Altucci, Lucia
    Antonarakis, Stylianos E.
    Ballesteros, Juan
    Beck, Stephan
    Bird, Adrian
    Bock, Christoph
    Boehm, Bernhard
    Campo, Elias
    Caricasole, Andrea
    Dahl, Fredrik
    Dermitzakis, Emmanouil T.
    Enver, Tariq
    Esteller, Manel
    Estivill, Xavier
    Ferguson-Smith, Anne
    Fitzgibbon, Jude
    Flicek, Paul
    Giehl, Claudia
    Graf, Thomas
    Grosveld, Frank
    Guigo, Roderic
    Gut, Ivo
    Helin, Kristian
    Jarvius, Jonas
    Kueppers, Ralf
    Lehrach, Hans
    Lengauer, Thomas
    Lernmark, Ake
    Leslie, David
    Loeffler, Markus
    Macintyre, Elizabeth
    Mai, Antonello
    Martens, Joost H. A.
    Minucci, Saverio
    Ouwehand, Willem H.
    Pelicci, Pier Giuseppe
    Pendeville, Helene
    Porse, Bo
    Rakyan, Vardhman
    Reik, Wolf
    Schrappe, Martin
    Schuebeler, Dirk
    Seifert, Martin
    Siebert, Reiner
    Simmons, David
    Soranzo, Nicole
    Spicuglia, Salvatore
    Stratton, Michael
    Stunnenberg, Hendrik G.
    [J]. NATURE BIOTECHNOLOGY, 2012, 30 (03) : 224 - 226
  • [3] [Anonymous], 2014, CONSTRAINED OPTIMIZA
  • [4] Practical Guidelines for the Comprehensive Analysis of ChIP-seq Data
    Bailey, Timothy
    Krajewski, Pawel
    Ladunga, Istvan
    Lefebvre, Celine
    Li, Qunhua
    Liu, Tao
    Madrigal, Pedro
    Taslim, Cenny
    Zhang, Jie
    [J]. PLOS COMPUTATIONAL BIOLOGY, 2013, 9 (11)
  • [5] Distributed optimization and statistical learning via the alternating direction method of multipliers
    Boyd S.
    Parikh N.
    Chu E.
    Peleato B.
    Eckstein J.
    [J]. Foundations and Trends in Machine Learning, 2010, 3 (01): : 1 - 122
  • [6] Comparative analysis of regulatory information and circuits across distant species
    Boyle, Alan P.
    Araya, Carlos L.
    Brdlik, Cathleen
    Cayting, Philip
    Cheng, Chao
    Cheng, Yong
    Gardner, Kathryn
    Hillier, LaDeana W.
    Janette, Judith
    Jiang, Lixia
    Kasper, Dionna
    Kawli, Trupti
    Kheradpour, Pouya
    Kundaje, Anshul
    Li, Jingyi Jessica
    Ma, Lijia
    Niu, Wei
    Rehm, E. Jay
    Rozowsky, Joel
    Slattery, Matthew
    Spokony, Rebecca
    Terrell, Robert
    Vafeados, Dionne
    Wang, Daifeng
    Weisdepp, Peter
    Wu, Yi-Chieh
    Xie, Dan
    Yan, Koon-Kiu
    Feingold, Elise A.
    Good, Peter J.
    Pazin, Michael J.
    Huang, Haiyan
    Bickel, Peter J.
    Brenner, Steven E.
    Reinke, Valerie
    Waterston, Robert H.
    Gerstein, Mark
    White, Kevin P.
    Kellis, Manolis
    Snyder, Michael
    [J]. NATURE, 2014, 512 (7515) : 453 - +
  • [7] A SINGULAR VALUE THRESHOLDING ALGORITHM FOR MATRIX COMPLETION
    Cai, Jian-Feng
    Candes, Emmanuel J.
    Shen, Zuowei
    [J]. SIAM JOURNAL ON OPTIMIZATION, 2010, 20 (04) : 1956 - 1982
  • [8] Robust Principal Component Analysis?
    Candes, Emmanuel J.
    Li, Xiaodong
    Ma, Yi
    Wright, John
    [J]. JOURNAL OF THE ACM, 2011, 58 (03)
  • [9] Chan HM, 2001, J CELL SCI, V114, P2363
  • [10] A novel statistical method for quantitative comparison of multiple ChIP-seq datasets
    Chen, Li
    Wang, Chi
    Qin, Zhaohui S.
    Wu, Hao
    [J]. BIOINFORMATICS, 2015, 31 (12) : 1889 - 1896