Deep Learning for Genomics: From Early Neural Nets to Modern Large Language Models

被引:11
作者
Yue, Tianwei [1 ]
Wang, Yuanxin [1 ]
Zhang, Longxiang [1 ]
Gu, Chunming [2 ]
Xue, Haoru [3 ]
Wang, Wenping [1 ]
Lyu, Qi [4 ]
Dun, Yujie [5 ]
机构
[1] Carnegie Mellon Univ, Sch Comp Sci, Pittsburgh, PA 15213 USA
[2] Johns Hopkins Univ, Sch Med, Dept Biomed Engn, Baltimore, MD 21218 USA
[3] Carnegie Mellon Univ, Robot Inst, Pittsburgh, PA 15213 USA
[4] Michigan State Univ, Dept Computat Math Sci & Engn, E Lansing, MI 48824 USA
[5] Xi An Jiao Tong Univ, Sch Informat & Commun Engn, Xian 710049, Peoples R China
关键词
deep learning; genomics; large language model; computer vision; multi-modal machine learning; PROTEIN SECONDARY STRUCTURE; PREDICTING GENE-EXPRESSION; SUPPORT VECTOR MACHINES; SUBCELLULAR-LOCALIZATION; STRUCTURAL CLASSIFICATION; HOMOLOGY DETECTION; SEQUENCE; DNA; QUALITY; NETWORKS;
D O I
10.3390/ijms242115858
中图分类号
Q5 [生物化学]; Q7 [分子生物学];
学科分类号
071010 ; 081704 ;
摘要
The data explosion driven by advancements in genomic research, such as high-throughput sequencing techniques, is constantly challenging conventional methods used in genomics. In parallel with the urgent demand for robust algorithms, deep learning has succeeded in various fields such as vision, speech, and text processing. Yet genomics entails unique challenges to deep learning, since we expect a superhuman intelligence that explores beyond our knowledge to interpret the genome from deep learning. A powerful deep learning model should rely on the insightful utilization of task-specific knowledge. In this paper, we briefly discuss the strengths of different deep learning models from a genomic perspective so as to fit each particular task with proper deep learning-based architecture, and we remark on practical considerations of developing deep learning architectures for genomics. We also provide a concise review of deep learning applications in various aspects of genomic research and point out current challenges and potential research directions for future genomics applications. We believe the collaborative use of ever-growing diverse data and the fast iteration of deep learning models will continue to contribute to the future of genomics.
引用
收藏
页数:35
相关论文
共 305 条
[21]   Genetic Changes Shaping the Human Brain [J].
Bae, Byoung-Il ;
Jayaraman, Divya ;
Walsh, Christopher A. .
DEVELOPMENTAL CELL, 2015, 32 (04) :423-434
[22]   Exploiting the past and the future in protein secondary structure prediction [J].
Baldi, P ;
Brunak, S ;
Frasconi, P ;
Soda, G ;
Pollastri, G .
BIOINFORMATICS, 1999, 15 (11) :937-946
[23]   Deciphering the splicing code [J].
Barash, Yoseph ;
Calarco, John A. ;
Gao, Weijun ;
Pan, Qun ;
Wang, Xinchen ;
Shai, Ofer ;
Blencowe, Benjamin J. ;
Frey, Brendan J. .
NATURE, 2010, 465 (7294) :53-59
[24]   The Cancer Cell Line Encyclopedia enables predictive modelling of anticancer drug sensitivity [J].
Barretina, Jordi ;
Caponigro, Giordano ;
Stransky, Nicolas ;
Venkatesan, Kavitha ;
Margolin, Adam A. ;
Kim, Sungjoon ;
Wilson, Christopher J. ;
Lehar, Joseph ;
Kryukov, Gregory V. ;
Sonkin, Dmitriy ;
Reddy, Anupama ;
Liu, Manway ;
Murray, Lauren ;
Berger, Michael F. ;
Monahan, John E. ;
Morais, Paula ;
Meltzer, Jodi ;
Korejwa, Adam ;
Jane-Valbuena, Judit ;
Mapa, Felipa A. ;
Thibault, Joseph ;
Bric-Furlong, Eva ;
Raman, Pichai ;
Shipway, Aaron ;
Engels, Ingo H. ;
Cheng, Jill ;
Yu, Guoying K. ;
Yu, Jianjun ;
Aspesi, Peter, Jr. ;
de Silva, Melanie ;
Jagtap, Kalpana ;
Jones, Michael D. ;
Wang, Li ;
Hatton, Charles ;
Palescandolo, Emanuele ;
Gupta, Supriya ;
Mahan, Scott ;
Sougnez, Carrie ;
Onofrio, Robert C. ;
Liefeld, Ted ;
MacConaill, Laura ;
Winckler, Wendy ;
Reich, Michael ;
Li, Nanxin ;
Mesirov, Jill P. ;
Gabriel, Stacey B. ;
Getz, Gad ;
Ardlie, Kristin ;
Chan, Vivien ;
Myer, Vic E. .
NATURE, 2012, 483 (7391) :603-607
[25]   Predicting gene expression from sequence [J].
Beer, MA ;
Tavazoie, S .
CELL, 2004, 117 (02) :185-198
[26]  
Benegas G, 2022, bioRxiv, DOI [10.1101/2022.08.22.504706, 10.1101/2022.08.22.504706, DOI 10.1101/2022.08.22.504706, 10.1101/2022.08.22.504706v2, DOI 10.1101/2022.08.22.504706V2]
[27]  
Bengio, 2011, P 28 INT C MACH LEAR, P833
[28]   DeepND: Deep multitask learning of gene risk for comorbid neurodevelopmental disorders [J].
Beyreli, Ilayda ;
Karakahya, Oguzhan ;
Cicek, A. Ercument .
PATTERNS, 2022, 3 (07)
[29]   PROTEIN SECONDARY STRUCTURE AND HOMOLOGY BY NEURAL NETWORKS - THE ALPHA-HELICES IN RHODOPSIN [J].
BOHR, H ;
BOHR, J ;
BRUNAK, S ;
COTTERILL, RMJ ;
LAUTRUP, B ;
NORSKOV, L ;
OLSEN, OH ;
PETERSEN, SB .
FEBS LETTERS, 1988, 241 (1-2) :223-228
[30]   MathFeature: feature extraction package for DNA, RNA and protein sequences based on mathematical descriptors [J].
Bonidia, Robson P. ;
Domingues, Douglas S. ;
Sanches, Danilo S. ;
de Carvalho, Andre C. P. L. F. .
BRIEFINGS IN BIOINFORMATICS, 2022, 23 (01)