DeepSATA: A Deep Learning-Based Sequence Analyzer Incorporating the Transcription Factor Binding Affinity to Dissect the Effects of Non-Coding Genetic Variants

被引:4
作者
Ma, Wenlong [1 ,2 ]
Fu, Yang [1 ,2 ]
Bao, Yongzhou [1 ,2 ,3 ]
Wang, Zhen [1 ,2 ,3 ]
Lei, Bowen [1 ,2 ,4 ,5 ]
Zheng, Weigang [1 ,2 ,4 ,5 ]
Wang, Chao [1 ,2 ,4 ,5 ]
Liu, Yuwen [1 ,2 ,6 ]
机构
[1] Chinese Acad Agr Sci, Agr Genom Inst Shenzhen, Shenzhen Branch, Guangdong Lab Lingnan Modern Agr,Key Lab Livestock, Shenzhen 518124, Peoples R China
[2] Chinese Acad Agr Sci, Agr Genom Inst Shenzhen, Res Ctr Anim Genome, Innovat Grp Pig Genome Design & Breeding, Shenzhen 518124, Peoples R China
[3] Henan Univ, Sch Life Sci, Kaifeng 475004, Peoples R China
[4] Huazhong Agr Univ, Minist Educ, Key Lab Agr Anim Genet Breeding & Reprod, Wuhan 430070, Peoples R China
[5] Huazhong Agr Univ, Minist Agr & Rural Affairs, Key Lab Swine Genet & Breeding, Wuhan 430070, Peoples R China
[6] Chinese Acad Agr Sci, Kunpeng Inst Modern Agr Foshan, Foshan 528226, Peoples R China
基金
中国国家自然科学基金;
关键词
non-coding variants; deep learning; transcription factor binding affinity; cross-species prediction; chromatin accessibility; genomic prediction;
D O I
10.3390/ijms241512023
中图分类号
Q5 [生物化学]; Q7 [分子生物学];
学科分类号
071010 ; 081704 ;
摘要
Utilizing large-scale epigenomics data, deep learning tools can predict the regulatory activity of genomic sequences, annotate non-coding genetic variants, and uncover mechanisms behind complex traits. However, these tools primarily rely on human or mouse data for training, limiting their performance when applied to other species. Furthermore, the limited exploration of many species, particularly in the case of livestock, has led to a scarcity of comprehensive and high-quality epigenetic data, posing challenges in developing reliable deep learning models for decoding their non-coding genomes. The cross-species prediction of the regulatory genome can be achieved by leveraging publicly available data from extensively studied organisms and making use of the conserved DNA binding preferences of transcription factors within the same tissue. In this study, we introduced DeepSATA, a novel deep learning-based sequence analyzer that incorporates the transcription factor binding affinity for the cross-species prediction of chromatin accessibility. By applying DeepSATA to analyze the genomes of pigs, chickens, cattle, humans, and mice, we demonstrated its ability to improve the prediction accuracy of chromatin accessibility and achieve reliable cross-species predictions in animals. Additionally, we showcased its effectiveness in analyzing pig genetic variants associated with economic traits and in increasing the accuracy of genomic predictions. Overall, our study presents a valuable tool to explore the epigenomic landscape of various species and pinpoint regulatory deoxyribonucleic acid (DNA) variants associated with complex traits.
引用
收藏
页数:18
相关论文
共 46 条
[1]   A One-Penny Imputed Genome from Next-Generation Reference Panels [J].
Browning, Brian L. ;
Zhou, Ying ;
Browning, Sharon R. .
AMERICAN JOURNAL OF HUMAN GENETICS, 2018, 103 (03) :338-348
[2]  
Buenrostro Jason D, 2015, Curr Protoc Mol Biol, V109, DOI 10.1002/0471142727.mb2129s109
[3]   JASPAR 2022: the 9th release of the open-access database of transcription factor binding profiles [J].
Castro-Mondragon, Jaime A. ;
Riudavets-Puig, Rafael ;
Rauluseviciute, Ieva ;
Lemma, Roza Berhanu ;
Turchi, Laura ;
Blanc-Mathieu, Romain ;
Lucas, Jeremy ;
Boddie, Paul ;
Khan, Aziz ;
Perez, Nicolas Manosalva ;
Fornes, Oriol ;
Leung, Tiffany Y. ;
Aguirre, Alejandro ;
Hammal, Fayrouz ;
Schmelter, Daniel ;
Baranasic, Damir ;
Ballester, Benoit ;
Sandelin, Albin ;
Lenhard, Boris ;
Vandepoele, Klaas ;
Wasserman, Wyeth W. ;
Parcy, Francois ;
Mathelier, Anthony .
NUCLEIC ACIDS RESEARCH, 2022, 50 (D1) :D165-D173
[4]   DeepGRN: prediction of transcription factor binding site across cell-types using attention-based deep neural networks [J].
Chen, Chen ;
Hou, Jie ;
Shi, Xiaowen ;
Yang, Hua ;
Birchler, James A. ;
Cheng, Jianlin .
BMC BIOINFORMATICS, 2021, 22 (01)
[5]   Prediction of gene regulatory enhancers across species reveals evolutionarily conserved sequence properties [J].
Chen, Ling ;
Fish, Alexandra E. ;
Capra, John A. .
PLOS COMPUTATIONAL BIOLOGY, 2018, 14 (10)
[6]   Twelve years of SAMtools and BCFtools [J].
Danecek, Petr ;
Bonfield, James K. ;
Liddle, Jennifer ;
Marshall, John ;
Ohan, Valeriu ;
Pollard, Martin O. ;
Whitwham, Andrew ;
Keane, Thomas ;
McCarthy, Shane A. ;
Davies, Robert M. ;
Li, Heng .
GIGASCIENCE, 2021, 10 (02)
[7]  
ENCODE, US
[8]   Ridge Regression and Other Kernels for Genomic Selection with R Package rrBLUP [J].
Endelman, Jeffrey B. .
PLANT GENOME, 2011, 4 (03) :250-255
[9]   Deep learning: new computational modelling techniques for genomics [J].
Eraslan, Gokcen ;
Avsec, Ziga ;
Gagneur, Julien ;
Theis, Fabian J. .
NATURE REVIEWS GENETICS, 2019, 20 (07) :389-403
[10]  
GigaDB, US