Predicting mutational function using machine learning

被引:1
作者
Shea, Anthony [1 ,2 ]
Bartz, Josh [1 ,2 ,3 ]
Zhang, Lei [1 ,4 ]
Dong, Xiao [1 ,2 ]
机构
[1] Univ Minnesota, Inst Biol Aging & Metab, Minneapolis, MN 55455 USA
[2] Univ Minnesota, Dept Genet Cell Biol & Dev, Minneapolis, MN 55455 USA
[3] Univ Minnesota, Bioinformat & Computat Biol Program, Minneapolis, MN 55455 USA
[4] Univ Minnesota, Dept Biochem Mol Biol & Biophys, Minneapolis, MN 55455 USA
关键词
Mutation; Machine Learning; Protein Structure; Gene Expression; Disease Risk; PROTEIN SECONDARY STRUCTURE; NONCODING VARIANTS; RANGE INTERACTIONS; SOMATIC MUTATIONS; SEQUENCE; PATHOGENICITY; DATABASES; GENOME;
D O I
10.1016/j.mrrev.2023.108457
中图分类号
Q81 [生物工程学(生物技术)]; Q93 [微生物学];
学科分类号
071005 ; 0836 ; 090102 ; 100705 ;
摘要
Genetic variations are one of the major causes of phenotypic variations between human individuals. Although beneficial as being the substrate of evolution, germline mutations may cause diseases, including Mendelian diseases and complex diseases such as diabetes and heart diseases. Mutations occurring in somatic cells are a main cause of cancer and likely cause age-related phenotypes and other age-related diseases. Because of the high abundance of genetic variations in the human genome, i.e., millions of germline variations per human subject and thousands of additional somatic mutations per cell, it is technically challenging to experimentally verify the function of every possible mutation and their interactions. Significant progress has been made to solve this problem using computational approaches, especially machine learning (ML). Here, we review the progress and achievements made in recent years in this field of research. We classify the computational models in two ways: one according to their prediction goals including protein structural alterations, gene expression changes, and disease risks, and the other according to their methodologies, including non-machine learning methods, classical machine learning methods, and deep neural network methods. For models in each category, we discuss their architecture, prediction accuracy, and potential limitations. This review provides new insights into the appli-cations and future directions of computational approaches in understanding the role of mutations in aging and disease.
引用
收藏
页数:7
相关论文
共 64 条
  • [1] A method and server for predicting damaging missense mutations
    Adzhubei, Ivan A.
    Schmidt, Steffen
    Peshkin, Leonid
    Ramensky, Vasily E.
    Gerasimova, Anna
    Bork, Peer
    Kondrashov, Alexey S.
    Sunyaev, Shamil R.
    [J]. NATURE METHODS, 2010, 7 (04) : 248 - 249
  • [2] Machine learning in protein structure prediction
    AlQuraishi, Mohammed
    [J]. CURRENT OPINION IN CHEMICAL BIOLOGY, 2021, 65 : 1 - 8
  • [3] PRINCIPLES THAT GOVERN FOLDING OF PROTEIN CHAINS
    ANFINSEN, CB
    [J]. SCIENCE, 1973, 181 (4096) : 223 - 230
  • [4] Effective gene expression prediction from sequence by integrating long-range interactions
    Avsec, Ziga
    Agarwal, Vikram
    Visentin, Daniel
    Ledsam, Joseph R.
    Grabska-Barwinska, Agnieszka
    Taylor, Kyle R.
    Assael, Yannis
    Jumper, John
    Kohli, Pushmeet
    Kelley, David R.
    [J]. NATURE METHODS, 2021, 18 (10) : 1196 - +
  • [5] The NIH Roadmap Epigenomics Mapping Consortium
    Bernstein, Bradley E.
    Stamatoyannopoulos, John A.
    Costello, Joseph F.
    Ren, Bing
    Milosavljevic, Aleksandar
    Meissner, Alexander
    Kellis, Manolis
    Marra, Marco A.
    Beaudet, Arthur L.
    Ecker, Joseph R.
    Farnham, Peggy J.
    Hirst, Martin
    Lander, Eric S.
    Mikkelsen, Tarjei S.
    Thomson, James A.
    [J]. NATURE BIOTECHNOLOGY, 2010, 28 (10) : 1045 - 1048
  • [6] Single-cell analysis reveals different age-related somatic mutation profiles between stem and differentiated cells in human liver
    Brazhnik, K.
    Sun, S.
    Alani, O.
    Kinkhabwala, M.
    Wolkoff, A. W.
    Maslov, A. Y.
    Dong, X.
    Vijg, J.
    [J]. SCIENCE ADVANCES, 2020, 6 (05)
  • [7] Accurate prediction of protein-protein interactions from sequence alignments using a Bayesian method
    Burger, Lukas
    van Nimwegen, Erik
    [J]. MOLECULAR SYSTEMS BIOLOGY, 2008, 4 (1)
  • [8] Protein Data Bank: the single global archive for 3D macromolecular structure data
    Burley, Stephen K.
    Berman, Helen M.
    Bhikadiya, Charmi
    Bi, Chunxiao
    Chen, Li
    Di Costanzo, Luigi
    Christie, Cole
    Duarte, Jose M.
    Dutta, Shuchismita
    Feng, Zukang
    Ghosh, Sutapa
    Goodsell, David S.
    Green, Rachel Kramer
    Guranovic, Vladimir
    Guzenko, Dmytro
    Hudson, Brian P.
    Liang, Yuhe
    Lowe, Robert
    Peisach, Ezra
    Periskova, Irina
    Randle, Chris
    Rose, Alexander
    Sekharan, Monica
    Shao, Chenghua
    Tao, Yi-Ping
    Valasatava, Yana
    Voigt, Maria
    Westbrook, John
    Young, Jasmine
    Zardecki, Christine
    Zhuravleva, Marina
    Kurisu, Genji
    Nakamura, Haruki
    Kengaku, Yumiko
    Cho, Hasumi
    Sato, Junko
    Kim, Ju Yaen
    Ikegawa, Yasuyo
    Nakagawa, Atsushi
    Yamashita, Reiko
    Kudou, Takahiro
    Bekker, Gert-Jan
    Suzuki, Hirofumi
    Iwata, Takeshi
    Yokochi, Masashi
    Kobayashi, Naohiro
    Fujiwara, Toshimichi
    Velankar, Sameer
    Kleywegt, Gerard J.
    Anyango, Stephen
    [J]. NUCLEIC ACIDS RESEARCH, 2019, 47 (D1) : D520 - D528
  • [9] Cancer-Specific High-Throughput Annotation of Somatic Mutations: Computational Prediction of Driver Missense Mutations
    Carter, Hannah
    Chen, Sining
    Isik, Leyla
    Tyekucheva, Svitlana
    Velculescu, Victor E.
    Kinzler, Kenneth W.
    Vogelstein, Bert
    Karchin, Rachel
    [J]. CANCER RESEARCH, 2009, 69 (16) : 6660 - 6667
  • [10] Emerging methods in protein co-evolution
    de Juan, David
    Pazos, Florencio
    Valencia, Alfonso
    [J]. NATURE REVIEWS GENETICS, 2013, 14 (04) : 249 - 261