Discovering Gene Regulatory Elements Using Coverage-Based Heuristics

被引:2
作者
Al-Ouran, Rami [1 ]
Schmidt, Robert [1 ]
Naik, Ashwini [2 ]
Jones, Jeffrey [3 ]
Drews, Frank [1 ]
Juedes, David [1 ]
Elnitski, Laura [4 ]
Welch, Lonnie [1 ]
机构
[1] Ohio Univ, Dept Elect Engn & Comp Sci, Athens, OH 45701 USA
[2] Nationwide Childrens Hosp, Res Inst, Columbus, OH 43110 USA
[3] Ohio State Univ, Columbus, OH 43210 USA
[4] NHGRI, Bethesda, MD 20892 USA
关键词
Motif discovery; ChIP-seq; RNA-seq; biology of disease; ENCODE; NOVO MOTIF DISCOVERY; FACTOR-BINDING SITES; HIGH-THROUGHPUT; HUMAN GENOME; CHROMATIN-IMMUNOPRECIPITATION; ALGORITHM; CHIPMOTIFS; SEQUENCES; PIPELINE;
D O I
10.1109/TCBB.2015.2496261
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Data mining algorithms and sequencing methods (such as RNA-seq and ChIP-seq) are being combined to discover genomic regulatory motifs that relate to a variety of phenotypes. However, motif discovery algorithms often produce very long lists of putative transcription factor binding sites, hindering the discovery of phenotype-related regulatory elements by making it difficult to select a manageable set of candidate motifs for experimental validation. To address this issue, the authors introduce the motif selection problem and provide coverage-based search heuristics for its solution. Analysis of 203 ChIP-seq experiments from the ENCyclopedia of DNA Elements project shows that our algorithms produce motifs that have high sensitivity and specificity and reveals new insights about the regulatory code of the human genome. The greedy algorithm performs the best, selecting a median of two motifs per ChIP-seq transcription factor group while achieving a median sensitivity of 77 percent.
引用
收藏
页码:1290 / 1300
页数:11
相关论文
共 42 条
[1]  
[Anonymous], 2001, Approximation algorithms
[2]  
[Anonymous], 1979, Computers and Intractablity: A Guide to the Theory of NP-Completeness
[3]  
[Anonymous], 2015, GLPK (GNU linear programming kit)
[4]   Environmentally induced foregut remodeling by PHA-4/FoxA and DAF-12/NHR [J].
Ao, W ;
Gaudet, J ;
Kent, WJ ;
Muttumu, S ;
Mango, SE .
SCIENCE, 2004, 305 (5691) :1743-1746
[5]  
Bailey T L, 1994, Proc Int Conf Intell Syst Mol Biol, V2, P28
[6]   A novel ensemble learning method for de novo computational identification of DNA binding sites [J].
Chakravarty, Arijit ;
Carlson, Jonathan M. ;
Khetani, Radhika S. ;
Gross, Robert H. .
BMC BIOINFORMATICS, 2007, 8 (1)
[7]   An integrated encyclopedia of DNA elements in the human genome [J].
Dunham, Ian ;
Kundaje, Anshul ;
Aldred, Shelley F. ;
Collins, Patrick J. ;
Davis, CarrieA. ;
Doyle, Francis ;
Epstein, Charles B. ;
Frietze, Seth ;
Harrow, Jennifer ;
Kaul, Rajinder ;
Khatun, Jainab ;
Lajoie, Bryan R. ;
Landt, Stephen G. ;
Lee, Bum-Kyu ;
Pauli, Florencia ;
Rosenbloom, Kate R. ;
Sabo, Peter ;
Safi, Alexias ;
Sanyal, Amartya ;
Shoresh, Noam ;
Simon, Jeremy M. ;
Song, Lingyun ;
Trinklein, Nathan D. ;
Altshuler, Robert C. ;
Birney, Ewan ;
Brown, James B. ;
Cheng, Chao ;
Djebali, Sarah ;
Dong, Xianjun ;
Dunham, Ian ;
Ernst, Jason ;
Furey, Terrence S. ;
Gerstein, Mark ;
Giardine, Belinda ;
Greven, Melissa ;
Hardison, Ross C. ;
Harris, Robert S. ;
Herrero, Javier ;
Hoffman, Michael M. ;
Iyer, Sowmya ;
Kellis, Manolis ;
Khatun, Jainab ;
Kheradpour, Pouya ;
Kundaje, Anshul ;
Lassmann, Timo ;
Li, Qunhua ;
Lin, Xinying ;
Marinov, Georgi K. ;
Merkel, Angelika ;
Mortazavi, Ali .
NATURE, 2012, 489 (7414) :57-74
[8]   MUSCLE: multiple sequence alignment with high accuracy and high throughput [J].
Edgar, RC .
NUCLEIC ACIDS RESEARCH, 2004, 32 (05) :1792-1797
[9]  
Eskin Eleazar, 2002, Bioinformatics, V18 Suppl 1, pS354
[10]   Trawler:: de novo regulatory motif discovery pipeline for chromatin immunoprecipitation [J].
Ettwiller, Laurence ;
Paten, Benedict ;
Ramialison, Mirana ;
Birney, Ewan ;
Wittbrodt, Joachim .
NATURE METHODS, 2007, 4 (07) :563-565