Deep Learning for Genomics: From Early Neural Nets to Modern Large Language Models

被引:11
作者
Yue, Tianwei [1 ]
Wang, Yuanxin [1 ]
Zhang, Longxiang [1 ]
Gu, Chunming [2 ]
Xue, Haoru [3 ]
Wang, Wenping [1 ]
Lyu, Qi [4 ]
Dun, Yujie [5 ]
机构
[1] Carnegie Mellon Univ, Sch Comp Sci, Pittsburgh, PA 15213 USA
[2] Johns Hopkins Univ, Sch Med, Dept Biomed Engn, Baltimore, MD 21218 USA
[3] Carnegie Mellon Univ, Robot Inst, Pittsburgh, PA 15213 USA
[4] Michigan State Univ, Dept Computat Math Sci & Engn, E Lansing, MI 48824 USA
[5] Xi An Jiao Tong Univ, Sch Informat & Commun Engn, Xian 710049, Peoples R China
关键词
deep learning; genomics; large language model; computer vision; multi-modal machine learning; PROTEIN SECONDARY STRUCTURE; PREDICTING GENE-EXPRESSION; SUPPORT VECTOR MACHINES; SUBCELLULAR-LOCALIZATION; STRUCTURAL CLASSIFICATION; HOMOLOGY DETECTION; SEQUENCE; DNA; QUALITY; NETWORKS;
D O I
10.3390/ijms242115858
中图分类号
Q5 [生物化学]; Q7 [分子生物学];
学科分类号
071010 ; 081704 ;
摘要
The data explosion driven by advancements in genomic research, such as high-throughput sequencing techniques, is constantly challenging conventional methods used in genomics. In parallel with the urgent demand for robust algorithms, deep learning has succeeded in various fields such as vision, speech, and text processing. Yet genomics entails unique challenges to deep learning, since we expect a superhuman intelligence that explores beyond our knowledge to interpret the genome from deep learning. A powerful deep learning model should rely on the insightful utilization of task-specific knowledge. In this paper, we briefly discuss the strengths of different deep learning models from a genomic perspective so as to fit each particular task with proper deep learning-based architecture, and we remark on practical considerations of developing deep learning architectures for genomics. We also provide a concise review of deep learning applications in various aspects of genomic research and point out current challenges and potential research directions for future genomics applications. We believe the collaborative use of ever-growing diverse data and the fast iteration of deep learning models will continue to contribute to the future of genomics.
引用
收藏
页数:35
相关论文
共 305 条
  • [1] Achiam OJ, 2023, Arxiv, DOI [arXiv:2303.08774, DOI 10.48550/ARXIV.2303.08774]
  • [2] DNCON2: improved protein contact prediction using two-level deep convolutional neural networks
    Adhikari, Badri
    Hou, Jie
    Cheng, Jianlin
    [J]. BIOINFORMATICS, 2018, 34 (09) : 1466 - 1472
  • [3] Transfer learning for class imbalance problems with inadequate data
    Al-Stouhi, Samir
    Reddy, Chandan K.
    [J]. KNOWLEDGE AND INFORMATION SYSTEMS, 2016, 48 (01) : 201 - 228
  • [4] Predicting the sequence specificities of DNA- and RNA-binding proteins by deep learning
    Alipanahi, Babak
    Delong, Andrew
    Weirauch, Matthew T.
    Frey, Brendan J.
    [J]. NATURE BIOTECHNOLOGY, 2015, 33 (08) : 831 - +
  • [5] Gapped BLAST and PSI-BLAST: a new generation of protein database search programs
    Altschul, SF
    Madden, TL
    Schaffer, AA
    Zhang, JH
    Zhang, Z
    Miller, W
    Lipman, DJ
    [J]. NUCLEIC ACIDS RESEARCH, 1997, 25 (17) : 3389 - 3402
  • [6] ALTSCHUL SF, 1990, J MOL BIOL, V215, P403, DOI 10.1006/jmbi.1990.9999
  • [7] Structural classification of proteins and structural genomics: new insights into protein folding and evolution
    Andreeva, Antonina
    Murzin, Alexey G.
    [J]. ACTA CRYSTALLOGRAPHICA SECTION F-STRUCTURAL BIOLOGY COMMUNICATIONS, 2010, 66 : 1190 - 1197
  • [8] DeepCpG: accurate prediction of single-cell DNA methylation states using deep learning
    Angermueller, Christof
    Lee, Heather J.
    Reik, Wolf
    Stegle, Oliver
    [J]. GENOME BIOLOGY, 2017, 18
  • [9] [Anonymous], 2017, bioRxiv, DOI 10.1101/174474
  • [10] [Anonymous], 2011, Proc. International Conference on Artificial Intelligence and Statistics