Capturing large genomic contexts for accurately predicting enhancer-promoter interactions

被引:18
作者
Chen, Ken [1 ]
Zhao, Huiying [2 ]
Yang, Yuedong [1 ,3 ]
机构
[1] Sun Yat Sen Univ, Sch Comp Sci & Engn, Guangzhou, Peoples R China
[2] Sun Yat Sen Univ, Sun Yat Sen Mem Hosp, Guangzhou, Peoples R China
[3] Sun Yat Sen Univ, Natl Super Comp Ctr, Guangzhou, Peoples R China
基金
中国国家自然科学基金; 国家重点研发计划;
关键词
enhancer-promoter interaction; chromatin structure; Transformer; non-coding mutation; TRANSCRIPTION FACTOR; ASSOCIATION; PRINCIPLES; LANDSCAPE; VARIANTS; GENES; CELLS;
D O I
10.1093/bib/bbab577
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Enhancer-promoter interaction (EPI) is a key mechanism underlying gene regulation. EPI prediction has always been a challenging task because enhancers could regulate promoters of distant target genes. Although many machine learning models have been developed, they leverage only the features in enhancers and promoters, or simply add the average genomic signals in the regions between enhancers and promoters, without utilizing detailed features between or outside enhancers and promoters. Due to a lack of large-scale features, existing methods could achieve only moderate performance, especially for predicting EPIs in different cell types. Here, we present a Transformer-based model, TransEPI, for EPI prediction by capturing large genomic contexts. TransEPI was developed based on EPI datasets derived from Hi-C or ChIA-PET data in six cell lines. To avoid over-fitting, we evaluated the TransEPI model by testing it on independent test datasets where the cell line and chromosome are different from the training data. TransEPI not only achieved consistent performance across the cross-validation and test datasets from different cell types but also outperformed the state-of-the-art machine learning and deep learning models. In addition, we found that the improved performance of TransEPI was attributed to the integration of large genomic contexts. Lastly, TransEPI was extended to study the non-coding mutations associated with brain disorders or neural diseases, and we found that TransEPI was also useful for predicting the target genes of non-coding mutations.
引用
收藏
页数:11
相关论文
共 61 条
[1]   Effective gene expression prediction from sequence by integrating long-range interactions [J].
Avsec, Ziga ;
Agarwal, Vikram ;
Visentin, Daniel ;
Ledsam, Joseph R. ;
Grabska-Barwinska, Agnieszka ;
Taylor, Kyle R. ;
Assael, Yannis ;
Jumper, John ;
Kohli, Pushmeet ;
Kelley, David R. .
NATURE METHODS, 2021, 18 (10) :1196-+
[2]   Accurate prediction of protein structures and interactions using a three-track neural network [J].
Baek, Minkyung ;
DiMaio, Frank ;
Anishchenko, Ivan ;
Dauparas, Justas ;
Ovchinnikov, Sergey ;
Lee, Gyu Rie ;
Wang, Jue ;
Cong, Qian ;
Kinch, Lisa N. ;
Schaeffer, R. Dustin ;
Millan, Claudia ;
Park, Hahnbeom ;
Adams, Carson ;
Glassman, Caleb R. ;
DeGiovanni, Andy ;
Pereira, Jose H. ;
Rodrigues, Andria V. ;
van Dijk, Alberdina A. ;
Ebrecht, Ana C. ;
Opperman, Diederik J. ;
Sagmeister, Theo ;
Buhlheller, Christoph ;
Pavkov-Keller, Tea ;
Rathinaswamy, Manoj K. ;
Dalwadi, Udit ;
Yip, Calvin K. ;
Burke, John E. ;
Garcia, K. Christopher ;
Grishin, Nick V. ;
Adams, Paul D. ;
Read, Randy J. ;
Baker, David .
SCIENCE, 2021, 373 (6557) :871-+
[3]   Quantitative prediction of enhancer-promoter interactions [J].
Belokopytova, Polina S. ;
Nuriddinov, Miroslav A. ;
Mozheiko, Evgeniy A. ;
Fishman, Daniil ;
Fishman, Veniamin .
GENOME RESEARCH, 2020, 30 (01) :72-84
[4]   Chromatin interaction neural network (ChINN): a machine learning-based method for predicting chromatin interactions from DNA sequences [J].
Cao, Fan ;
Zhang, Yu ;
Cai, Yichao ;
Animesh, Sambhavi ;
Zhang, Ying ;
Akincilar, Semih Can ;
Loh, Yan Ping ;
Li, Xinya ;
Chng, Wee Joo ;
Tergaonkar, Vinay ;
Kwoh, Chee Keong ;
Fullwood, Melissa J. .
GENOME BIOLOGY, 2021, 22 (01)
[5]   Inflated performance measures in enhancer-promoter interaction-prediction methods [J].
Cao, Fan ;
Fullwood, Melissa J. .
NATURE GENETICS, 2019, 51 (08) :1196-1198
[6]   Reconstruction of enhancer-target networks in 935 samples of human primary cells, tissues and cell lines [J].
Cao, Qin ;
Anyansi, Christine ;
Hu, Xihao ;
Xu, Liangliang ;
Xiong, Lei ;
Tang, Wenshu ;
Mok, Myth T. S. ;
Cheng, Chao ;
Fan, Xiaodan ;
Gerstein, Mark ;
Cheng, Alfred S. L. ;
Yip, Kevin Y. .
NATURE GENETICS, 2017, 49 (10) :1428-+
[7]  
Chang S., 2017, 31 C NEURAL INFORM P
[8]   Explaining the disease phenotype of intergenic SNP through predicted long range regulation [J].
Chen, Jingqi ;
Tian, Weidong .
NUCLEIC ACIDS RESEARCH, 2016, 44 (18) :8641-8654
[9]  
Choromanski K. M., 2021, P ICLR, P1
[10]   Genome-Wide Association Study of Inattention and Hyperactivity-Impulsivity Measured as Quantitative Traits [J].
Ebejer, Jane L. ;
Duffy, David L. ;
van der Werf, Julius ;
Wright, Margaret J. ;
Montgomery, Grant ;
Gillespie, Nathan A. ;
Hickie, Ian B. ;
Martin, Nicholas G. ;
Medland, Sarah E. .
TWIN RESEARCH AND HUMAN GENETICS, 2013, 16 (02) :560-574