Genome-wide prediction of DNase I hypersensitivity using gene expression

被引:26
作者
Zhou, Weiqiang [1 ]
Sherwood, Ben [1 ,4 ]
Ji, Zhicheng [1 ]
Xue, Yingchao [2 ,3 ]
Du, Fang [1 ]
Bai, Jiawei [1 ]
Ying, Mingyao [2 ,3 ]
Ji, Hongkai [1 ]
机构
[1] Johns Hopkins Univ, Bloomberg Sch Publ Hlth, Dept Biostat, 615 North Wolfe St, Baltimore, MD 21205 USA
[2] Kennedy Krieger, Hugo W Moser Res Inst, Dept Neurol, Baltimore, MD 21205 USA
[3] Johns Hopkins Univ, Sch Med, Baltimore, MD 21205 USA
[4] Univ Kansas, Sch Business, 1654 Naismith Dr, Lawrence, KS 66045 USA
基金
美国国家卫生研究院;
关键词
TRANSCRIPTION FACTOR-BINDING; SELECTION; REGRESSION; SOX2; DIFFERENTIATION; VARIABLES; CHROMATIN; ELEMENTS; NEURONS; CELLS;
D O I
10.1038/s41467-017-01188-x
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
07 ; 0710 ; 09 ;
摘要
We evaluate the feasibility of using a biological sample's transcriptome to predict its genomewide regulatory element activities measured by DNase I hypersensitivity (DH). We develop BIRD, Big Data Regression for predicting DH, to handle this high-dimensional problem. Applying BIRD to the Encyclopedia of DNA Elements (ENCODE) data, we found that to a large extent gene expression predicts DH, and information useful for prediction is contained in the whole transcriptome rather than limited to a regulatory element's neighboring genes. We show applications of BIRD-predicted DH in predicting transcription factor-binding sites (TFBSs), turning publicly available gene expression samples in Gene Expression Omnibus (GEO) into a regulome database, predicting differential regulatory element activities, and facilitating regulome data analyses by serving as pseudo-replicates. Besides improving our understanding of the regulome-transcriptome relationship, this study suggests that transcriptome-based prediction can provide a useful new approach for regulome mapping.
引用
收藏
页数:17
相关论文
共 49 条
[1]   AN INTRODUCTION TO KERNEL AND NEAREST-NEIGHBOR NONPARAMETRIC REGRESSION [J].
ALTMAN, NS .
AMERICAN STATISTICIAN, 1992, 46 (03) :175-185
[2]  
[Anonymous], 2006, Journal of the Royal Statistical Society, Series B
[3]  
Bolstad B.M., 2015, preprocessCore: A collection of pre-processing functions
[4]  
Breheny P, 2009, STAT INTERFACE, V2, P369
[5]   Random forests [J].
Breiman, L .
MACHINE LEARNING, 2001, 45 (01) :5-32
[6]   Correlated variables in regression: Clustering and sparse estimation [J].
Buehlmann, Peter ;
Ruetimann, Philipp ;
van de Geer, Sara ;
Zhang, Cun-Hui .
JOURNAL OF STATISTICAL PLANNING AND INFERENCE, 2013, 143 (11) :1835-1858
[7]  
Buenrostro JD, 2013, NAT METHODS, V10, P1213, DOI [10.1038/NMETH.2688, 10.1038/nmeth.2688]
[8]   The transcriptional foundation of pluripotency [J].
Chambers, Ian ;
Tomlinson, Simon R. .
DEVELOPMENT, 2009, 136 (14) :2311-2322
[9]   Lin-28B transactivation is necessary for Myc-mediated let-7 repression and proliferation [J].
Chang, Tsung-Cheng ;
Zeiteis, Lauren R. ;
Hwang, Hun-Way ;
Chivukula, Raghu R. ;
Wentzel, Erik A. ;
Dews, Michael ;
Jung, Jason ;
Gao, Ping ;
Dang, Chi V. ;
Beer, Michael A. ;
Thomas-Tikhonenko, Andrei ;
Mendell, Joshua T. .
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2009, 106 (09) :3384-3389
[10]   Understanding transcriptional regulation by integrative analysis of transcription factor binding data [J].
Cheng, Chao ;
Alexander, Roger ;
Min, Renqiang ;
Leng, Jing ;
Yip, Kevin Y. ;
Rozowsky, Joel ;
Yan, Koon-Kiu ;
Dong, Xianjun ;
Djebali, Sarah ;
Ruan, Yijun ;
Davis, Carrie A. ;
Carninci, Piero ;
Lassman, Timo ;
Gingerasi, Thomas R. ;
Guigo, Roderic ;
Birney, Ewan ;
Weng, Zhiping ;
Snyder, Michael ;
Gerstein, Mark .
GENOME RESEARCH, 2012, 22 (09) :1658-1667