AIControl: replacing matched control experiments with machine learning improves ChIP-seq peak identification

被引:7
作者
Hiranuma, Naozumi [1 ]
Lundberg, Scott M. [1 ]
Lee, Su-In [1 ]
机构
[1] Univ Washington, Paul G Allen Sch Comp Sci & Engn, Seattle, WA 98195 USA
基金
美国国家科学基金会; 美国国家卫生研究院;
关键词
INTEGRATIVE ANALYSIS; CHROMATIN-STATE; BINDING; NETWORK; DISCOVERY; ENCODE; SITES; CELLS;
D O I
10.1093/nar/gkz156
中图分类号
Q5 [生物化学]; Q7 [分子生物学];
学科分类号
071010 ; 081704 ;
摘要
ChIP-seq is a technique to determine binding locations of transcription factors, which remains a central challenge in molecular biology. Current practice is to use a control' dataset to remove background signals from a immunoprecipitation (IP) target' dataset. We introduce the AIControl framework, which eliminates the need to obtain a control dataset and instead identifies binding peaks by estimating the distributions of background signals from many publicly available control ChIP-seq datasets. We thereby avoid the cost of running control experiments while simultaneously increasing the accuracy of binding location identification. Specifically, AIControl can (i) estimate background signals at fine resolution, (ii) systematically weigh the most appropriate control datasets in a data-driven way, (iii) capture sources of potential biases that may be missed by one control dataset and (iv) remove the need for costly and time-consuming control experiments. We applied AIControl to 410 IP datasets in the ENCODE ChIP-seq database, using 440 control datasets from 107 cell types to impute background signal. Without using matched control datasets, AIControl identified peaks that were more enriched for putative binding sites than those identified by other popular peak callers that used a matched control dataset. We also demonstrated that our framework identifies binding sites that recover documented protein interactions more accurately.
引用
收藏
页数:16
相关论文
共 46 条
[21]   A practical comparison of methods for detecting transcription factor binding sites in ChIP-seq experiments [J].
Laajala, Teemu D. ;
Raghav, Sunil ;
Tuomela, Soile ;
Lahesmaa, Riitta ;
Aittokallio, Tero ;
Elo, Laura L. .
BMC GENOMICS, 2009, 10
[22]   ChIP-seq guidelines and practices of the ENCODE and modENCODE consortia [J].
Landt, Stephen G. ;
Marinov, Georgi K. ;
Kundaje, Anshul ;
Kheradpour, Pouya ;
Pauli, Florencia ;
Batzoglou, Serafim ;
Bernstein, Bradley E. ;
Bickel, Peter ;
Brown, James B. ;
Cayting, Philip ;
Chen, Yiwen ;
DeSalvo, Gilberto ;
Epstein, Charles ;
Fisher-Aylor, Katherine I. ;
Euskirchen, Ghia ;
Gerstein, Mark ;
Gertz, Jason ;
Hartemink, Alexander J. ;
Hoffman, Michael M. ;
Iyer, Vishwanath R. ;
Jung, Youngsook L. ;
Karmakar, Subhradip ;
Kellis, Manolis ;
Kharchenko, Peter V. ;
Li, Qunhua ;
Liu, Tao ;
Liu, X. Shirley ;
Ma, Lijia ;
Milosavljevic, Aleksandar ;
Myers, Richard M. ;
Park, Peter J. ;
Pazin, Michael J. ;
Perry, Marc D. ;
Raha, Debasish ;
Reddy, Timothy E. ;
Rozowsky, Joel ;
Shoresh, Noam ;
Sidow, Arend ;
Slattery, Matthew ;
Stamatoyannopoulos, John A. ;
Tolstorukov, Michael Y. ;
White, Kevin P. ;
Xi, Simon ;
Farnham, Peggy J. ;
Lieb, Jason D. ;
Wold, Barbara J. ;
Snyder, Michael .
GENOME RESEARCH, 2012, 22 (09) :1813-1831
[23]  
Langmead B, 2012, NAT METHODS, V9, P357, DOI [10.1038/NMETH.1923, 10.1038/nmeth.1923]
[24]   MEASURING REPRODUCIBILITY OF HIGH-THROUGHPUT EXPERIMENTS [J].
Li, Qunhua ;
Brown, James B. ;
Huang, Haiyan ;
Bickel, Peter J. .
ANNALS OF APPLIED STATISTICS, 2011, 5 (03) :1752-1779
[25]   Transcriptional Amplification in Tumor Cells with Elevated c-Myc [J].
Lin, Charles Y. ;
Loven, Jakob ;
Rahl, Peter B. ;
Paranal, Ronald M. ;
Burge, Christopher B. ;
Bradner, James E. ;
Lee, Tong Ihn ;
Young, Richard A. .
CELL, 2012, 151 (01) :56-67
[26]   ChromNet: Learning the human chromatin network from all ENCODE ChIP-seq data [J].
Lundberg, Scott M. ;
Tu, William B. ;
Raught, Brian ;
Penn, Linda Z. ;
Hoffman, Michael M. ;
Lee, Su-In .
GENOME BIOLOGY, 2016, 17
[27]   HCFC1 is a common component of active human CpG-island promoters and coincides with ZNF143, THAP11, YY1, and GABP transcription factor occupancy [J].
Michaud, Joelle ;
Praz, Viviane ;
Faresse, Nicole James ;
JnBaptiste, Courtney K. ;
Tyagi, Shweta ;
Schuetz, Frederic ;
Herr, Winship .
GENOME RESEARCH, 2013, 23 (06) :907-916
[28]   Genome-wide maps of chromatin state in pluripotent and lineage-committed cells [J].
Mikkelsen, Tarjei S. ;
Ku, Manching ;
Jaffe, David B. ;
Issac, Biju ;
Lieberman, Erez ;
Giannoukos, Georgia ;
Alvarez, Pablo ;
Brockman, William ;
Kim, Tae-Kyung ;
Koche, Richard P. ;
Lee, William ;
Mendenhall, Eric ;
O'Donovan, Aisling ;
Presser, Aviva ;
Russ, Carsten ;
Xie, Xiaohui ;
Meissner, Alexander ;
Wernig, Marius ;
Jaenisch, Rudolf ;
Nusbaum, Chad ;
Lander, Eric S. ;
Bernstein, Bradley E. .
NATURE, 2007, 448 (7153) :553-U2
[29]   Comparative genomics modeling of the NRSF/REST repressor network: From single conserved sites to genome-wide repertoire [J].
Mortazavi, Ali ;
Thompson, Evonne Chen Leeper ;
Garcia, Sarah T. ;
Myers, Richard M. ;
Wold, Barbara .
GENOME RESEARCH, 2006, 16 (10) :1208-1221
[30]  
Narlikar L, 2012, METHODS MOL BIOL, V802, P305, DOI 10.1007/978-1-61779-400-1_20