A sequence-based global map of regulatory activity for deciphering human genetics

被引:108
作者
Chen, Kathleen M. [1 ,2 ]
Wong, Aaron K. [2 ]
Troyanskaya, Olga G. [1 ,2 ,3 ]
Zhou, Jian [4 ]
机构
[1] Princeton Univ, Dept Comp Sci, Princeton, NJ 08544 USA
[2] Simons Fdn, Flatiron Inst, New York, NY 10010 USA
[3] Princeton Univ, Lewis Sigler Inst Integrat Genom, Princeton, NJ 08544 USA
[4] Univ Texas Southwestern Med Ctr Dallas, Lyda Hill Dept Bioinformat, Dallas, TX 75390 USA
基金
美国国家卫生研究院; 美国国家科学基金会;
关键词
POINT MUTATIONS; HERITABILITY; ASSOCIATION; FAMILIES; DISEASE; BINDING; DNA;
D O I
10.1038/s41588-022-01102-2
中图分类号
Q3 [遗传学];
学科分类号
071007 ; 090102 ;
摘要
Epigenomic profiling has enabled large-scale identification of regulatory elements, yet we still lack a systematic mapping from any sequence or variant to regulatory activities. We address this challenge with Sei, a framework for integrating human genetics data with sequence information to discover the regulatory basis of traits and diseases. Sei learns a vocabulary of regulatory activities, called sequence classes, using a deep learning model that predicts 21,907 chromatin profiles across >1,300 cell lines and tissues. Sequence classes provide a global classification and quantification of sequence and variant effects based on diverse regulatory activities, such as cell type-specific enhancer functions. These predictions are supported by tissue-specific expression, expression quantitative trait loci and evolutionary constraint data. Furthermore, sequence classes enable characterization of the tissue-specific, regulatory architecture of complex traits and generate mechanistic hypotheses for individual regulatory pathogenic mutations. We provide Sei as a resource to elucidate the regulatory basis of human health and disease. Sei is a new framework for integrating human genetics data with a sequence-based mapping of predicted regulatory activities to elucidate mechanisms contributing to complex traits and diseases.
引用
收藏
页码:940 / +
页数:14
相关论文
共 36 条
[1]   An integrated map of genetic variation from 1,092 human genomes [J].
Altshuler, David M. ;
Durbin, Richard M. ;
Abecasis, Goncalo R. ;
Bentley, David R. ;
Chakravarti, Aravinda ;
Clark, Andrew G. ;
Donnelly, Peter ;
Eichler, Evan E. ;
Flicek, Paul ;
Gabriel, Stacey B. ;
Gibbs, Richard A. ;
Green, Eric D. ;
Hurles, Matthew E. ;
Knoppers, Bartha M. ;
Korbel, Jan O. ;
Lander, Eric S. ;
Lee, Charles ;
Lehrach, Hans ;
Mardis, Elaine R. ;
Marth, Gabor T. ;
McVean, Gil A. ;
Nickerson, Deborah A. ;
Schmidt, Jeanette P. ;
Sherry, Stephen T. ;
Wang, Jun ;
Wilson, Richard K. ;
Gibbs, Richard A. ;
Dinh, Huyen ;
Kovar, Christie ;
Lee, Sandra ;
Lewis, Lora ;
Muzny, Donna ;
Reid, Jeff ;
Wang, Min ;
Wang, Jun ;
Fang, Xiaodong ;
Guo, Xiaosen ;
Jian, Min ;
Jiang, Hui ;
Jin, Xin ;
Li, Guoqing ;
Li, Jingxiang ;
Li, Yingrui ;
Li, Zhuo ;
Liu, Xiao ;
Lu, Yao ;
Ma, Xuedi ;
Su, Zhe ;
Tai, Shuaishuai ;
Tang, Meifang .
NATURE, 2012, 491 (7422) :56-65
[2]   The GTEx Consortium atlas of genetic regulatory effects across human tissues [J].
Aguet, Francois ;
Barbeira, Alvaro N. ;
Bonazzola, Rodrigo ;
Brown, Andrew ;
Castel, Stephane E. ;
Jo, Brian ;
Kasela, Silva ;
Kim-Hellmuth, Sarah ;
Liang, Yanyu ;
Parsana, Princy ;
Flynn, Elise ;
Fresard, Laure ;
Gamazon, Eric R. ;
Hamel, Andrew R. ;
He, Yuan ;
Hormozdiari, Farhad ;
Mohammadi, Pejman ;
Munoz-Aguirre, Manuel ;
Ardlie, Kristin G. ;
Battle, Alexis ;
Bonazzola, Rodrigo ;
Brown, Christopher D. ;
Cox, Nancy ;
Dermitzakis, Emmanouil T. ;
Engelhardt, Barbara E. ;
Garrido-Martin, Diego ;
Gay, Nicole R. ;
Getz, Gad ;
Guigo, Roderic ;
Hamel, Andrew R. ;
Handsaker, Robert E. ;
He, Yuan ;
Hoffman, Paul J. ;
Hormozdiari, Farhad ;
Im, Hae Kyung ;
Jo, Brian ;
Kasela, Silva ;
Kashin, Seva ;
Kim-Hellmuth, Sarah ;
Kwong, Alan ;
Lappalainen, Tuuli ;
Li, Xiao ;
Liang, Yanyu ;
MacArthur, Daniel G. ;
Mohammadi, Pejman ;
Montgomery, Stephen B. ;
Munoz-Aguirre, Manuel ;
Rouhana, John M. ;
Hormozdiari, Farhad ;
Im, Hae Kyung .
SCIENCE, 2020, 369 (6509) :1318-1330
[3]   Predicting the sequence specificities of DNA- and RNA-binding proteins by deep learning [J].
Alipanahi, Babak ;
Delong, Andrew ;
Weirauch, Matthew T. ;
Frey, Brendan J. .
NATURE BIOTECHNOLOGY, 2015, 33 (08) :831-+
[4]   The ENCODE Blacklist: Identification of Problematic Regions of the Genome [J].
Amemiya, Haley M. ;
Kundaje, Anshul ;
Boyle, Alan P. .
SCIENTIFIC REPORTS, 2019, 9 (1)
[5]   Base-resolution models of transcription-factor binding reveal soft motif syntax [J].
Avsec, Ziga ;
Weilert, Melanie ;
Shrikumar, Avanti ;
Krueger, Sabrina ;
Alexandari, Amr ;
Dalal, Khyati ;
Fropf, Robin ;
McAnany, Charles ;
Gagneur, Julien ;
Kundaje, Anshul ;
Zeitlinger, Julia .
NATURE GENETICS, 2021, 53 (03) :354-+
[6]   Fast unfolding of communities in large networks [J].
Blondel, Vincent D. ;
Guillaume, Jean-Loup ;
Lambiotte, Renaud ;
Lefebvre, Etienne .
JOURNAL OF STATISTICAL MECHANICS-THEORY AND EXPERIMENT, 2008,
[7]   Polycomb Repressive Complex 2 and H3K27me3 Cooperate with H3K9 Methylation To Maintain Heterochromatin Protein 1α at Chromatin [J].
Boros, Joanna ;
Arnoult, Nausica ;
Stroobant, Vincent ;
Collet, Jean-Francois ;
Decottignies, Anabelle .
MOLECULAR AND CELLULAR BIOLOGY, 2014, 34 (19) :3662-3674
[8]   Selene: a PyTorch-based deep learning library for sequence data [J].
Chen, Kathleen M. ;
Cofer, Evan M. ;
Zhou, Jian ;
Troyanskaya, Olga G. .
NATURE METHODS, 2019, 16 (04) :315-+
[9]   Modeling transcriptional regulation of model species with deep learning [J].
Cofer, Evan M. ;
Raimundo, Joao ;
Tadych, Alicja ;
Yamazaki, Yuji ;
Wong, Aaron K. ;
Theesfeld, Chandra L. ;
Levine, Michael S. ;
Troyanskaya, Olga G. .
GENOME RESEARCH, 2021, 31 (06) :1097-1105
[10]   A regulatory SNP causes a human genetic disease by creating a new transcriptional promoter [J].
De Gobbi, Marco ;
Viprakisit, Vip ;
Hughes, Jim R. ;
Fisher, Chris ;
Buckle, Veronica J. ;
Ayyub, Helena ;
Gibbons, Richard J. ;
Vernimmen, Douglas ;
Yoshinaga, Yuko ;
de Jong, Pieter ;
Cheng, Jan-Fang ;
Rubin, Edward M. ;
Wood, William G. ;
Bowden, Don ;
Higgs, Douglas R. .
SCIENCE, 2006, 312 (5777) :1215-1217