MOCCS: Clarifying DNA-binding motif ambiguity using ChIP-Seq data

被引:5
作者
Ozaki, Haruka [1 ,4 ]
Iwasaki, Wataru [1 ,2 ,3 ]
机构
[1] Univ Tokyo, Grad Sch Frontier Sci, Dept Computat Biol, Kashiwanoha 5-1-5, Kashiwa, Chiba 2778568, Japan
[2] Univ Tokyo, Grad Sch Sci, Dept Biol Sci, Bunkyo Ku, Hongo 7-3-1, Tokyo 1130032, Japan
[3] Univ Tokyo, Atmosphere & Ocean Res Inst, Kashiwanoha 5-1-5, Kashiwa, Chiba 2778564, Japan
[4] RIKEN, Adv Ctr Comp & Commun, Bioinformat Res Unit, 2-1 Hirosawa, Wako, Saitama 3510198, Japan
关键词
DNA binding motifs; ChIP-Seq; Transcription factors; SERUM RESPONSE FACTOR; TRANSCRIPTION-FACTOR; SEQUENCE; SITES; GENE; CREB; EXPRESSION; DISCOVERY; TRANSACTIVATION; ELEMENTS;
D O I
10.1016/j.compbiolchem.2016.01.014
中图分类号
Q [生物科学];
学科分类号
07 ; 0710 ; 09 ;
摘要
Background: As a key mechanism of gene regulation, transcription factors (TFs) bind to DNA by recognizing specific short sequence patterns that are called DNA-binding motifs. A single TF can accept ambiguity within its DNA-binding motifs, which comprise both canonical (typical) and non-canonical motifs. Clarification of such DNA-binding motif ambiguity is crucial for revealing gene regulatory networks and evaluating mutations in cis-regulatory elements. Although chromatin immunoprecipitation sequencing (ChIP-seq) now provides abundant data on the genomic sequences to which a given TF binds, existing motif discovery methods are unable to directly answer whether a given TF can bind to a specific DNA-binding motif. Results: Here, we report a method for clarifying the DNA-binding motif ambiguity, MOCCS. Given ChIP-Seq data of any TF, MOCCS comprehensively analyzes and describes every k-mer to which that TF binds. Analysis of simulated datasets revealed that MOCCS is applicable to various ChIP-Seq datasets, requiring only a few minutes per dataset. Application to the ENCODE ChIP-Seq datasets proved that MOCCS directly evaluates whether a given TF binds to each DNA-binding motif, even if known position weight matrix models do not provide sufficient information on DNA-binding motif ambiguity. Furthermore, users are not required to provide numerous parameters or background genomic sequence models that are typically unavailable. MOCCS is implemented in Perl and R and is freely available via https://github.com/yuifu/moccs. Conclusions: By complementing existing motif-discovery software, MOCCS will contribute to the basic understanding of how the genome controls diverse cellular processes via DNA-protein interactions. (C) 2016 Elsevier Ltd. All rights reserved.
引用
收藏
页码:62 / 72
页数:11
相关论文
共 50 条
[21]   The UCSC Table Browser data retrieval tool [J].
Karolchik, D ;
Hinrichs, AS ;
Furey, TS ;
Roskin, KM ;
Sugnet, CW ;
Haussler, D ;
Kent, WJ .
NUCLEIC ACIDS RESEARCH, 2004, 32 :D493-D496
[22]   Protein-RNA interactions: new genomic technologies and perspectives [J].
Koenig, Julian ;
Zarnack, Kathi ;
Luscombe, Nicholas M. ;
Ule, Jernej .
NATURE REVIEWS GENETICS, 2012, 13 (02) :77-83
[23]   MOODS: fast search for position weight matrix matches in DNA sequences [J].
Korhonen, Janne ;
Martinmaki, Petri ;
Pizzi, Cinzia ;
Rastas, Pasi ;
Ukkonen, Esko .
BIOINFORMATICS, 2009, 25 (23) :3181-3182
[24]   Deep and wide digging for binding motifs in ChIP-Seq data [J].
Kulakovskiy, I. V. ;
Boeva, V. A. ;
Favorov, A. V. ;
Makeev, V. J. .
BIOINFORMATICS, 2010, 26 (20) :2622-2623
[25]   Analysis and synthesis of high-amplitude Cis-elements in the mammalian circadian clock [J].
Kumaki, Yuichi ;
Ukai-Tadenuma, Maki ;
Uno, Ken-ichiro D. ;
Nishio, Junko ;
Masumoto, Koh-hei ;
Nagano, Mamoru ;
Komori, Takashi ;
Shigeyoshi, Yasufumi ;
Hogenesch, John B. ;
Ueda, Hiroki R. .
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2008, 105 (39) :14946-14951
[26]   Integrative analysis of 111 reference human epigenomes [J].
Kundaje, Anshul ;
Meuleman, Wouter ;
Ernst, Jason ;
Bilenky, Misha ;
Yen, Angela ;
Heravi-Moussavi, Alireza ;
Kheradpour, Pouya ;
Zhang, Zhizhuo ;
Wang, Jianrong ;
Ziller, Michael J. ;
Amin, Viren ;
Whitaker, John W. ;
Schultz, Matthew D. ;
Ward, Lucas D. ;
Sarkar, Abhishek ;
Quon, Gerald ;
Sandstrom, Richard S. ;
Eaton, Matthew L. ;
Wu, Yi-Chieh ;
Pfenning, Andreas R. ;
Wang, Xinchen ;
Claussnitzer, Melina ;
Liu, Yaping ;
Coarfa, Cristian ;
Harris, R. Alan ;
Shoresh, Noam ;
Epstein, Charles B. ;
Gjoneska, Elizabeta ;
Leung, Danny ;
Xie, Wei ;
Hawkins, R. David ;
Lister, Ryan ;
Hong, Chibo ;
Gascard, Philippe ;
Mungall, Andrew J. ;
Moore, Richard ;
Chuah, Eric ;
Tam, Angela ;
Canfield, Theresa K. ;
Hansen, R. Scott ;
Kaul, Rajinder ;
Sabo, Peter J. ;
Bansal, Mukul S. ;
Carles, Annaick ;
Dixon, Jesse R. ;
Farh, Kai-How ;
Feizi, Soheil ;
Karlic, Rosa ;
Kim, Ah-Ram ;
Kulkarni, Ashwinikumar .
NATURE, 2015, 518 (7539) :317-330
[27]   GADEM: A Genetic Algorithm Guided Formation of Spaced Dyads Coupled with an EM Algorithm for Motif Discovery [J].
Li, Leping .
JOURNAL OF COMPUTATIONAL BIOLOGY, 2009, 16 (02) :317-329
[28]   A highly efficient and effective motif discovery method for ChIP-seq/ChIP-chip data using positional information [J].
Ma, Xiaotu ;
Kulkarni, Ashwinikumar ;
Zhang, Zhihua ;
Xuan, Zhenyu ;
Serfling, Robert ;
Zhang, Michael Q. .
NUCLEIC ACIDS RESEARCH, 2012, 40 (07) :e50
[29]   MEME-ChIP: motif analysis of large DNA datasets [J].
Machanick, Philip ;
Bailey, Timothy L. .
BIOINFORMATICS, 2011, 27 (12) :1696-1697
[30]   JASPAR 2014: an extensively expanded and updated open-access database of transcription factor binding profiles [J].
Mathelier, Anthony ;
Zhao, Xiaobei ;
Zhang, Allen W. ;
Parcy, Francois ;
Worsley-Hunt, Rebecca ;
Arenillas, David J. ;
Buchman, Sorana ;
Chen, Chih-yu ;
Chou, Alice ;
Ienasescu, Hans ;
Lim, Jonathan ;
Shyr, Casper ;
Tan, Ge ;
Zhou, Michelle ;
Lenhard, Boris ;
Sandelin, Albin ;
Wasserman, Wyeth W. .
NUCLEIC ACIDS RESEARCH, 2014, 42 (D1) :D142-D147