Machine learning predicts translation initiation sites in neurologic diseases with nucleotide repeat expansions

被引:28
作者
Gleason, Alec C. [1 ]
Ghadge, Ghanashyam [1 ,2 ]
Chen, Jin [3 ]
Sonobe, Yoshifumi [1 ,2 ]
Roos, Raymond P. [1 ,2 ]
机构
[1] Univ Chicago, Chicago, IL 60637 USA
[2] Univ Chicago, Dept Neurol, 5841 S Maryland Ave, Chicago, IL 60637 USA
[3] Univ Texas Southwestern Med Ctr Dallas, Dept Pharmacol, Dallas, TX USA
关键词
NON-AUG TRANSLATION; PROTEIN-SYNTHESIS; CODON; SELECTION; START;
D O I
10.1371/journal.pone.0256411
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
07 ; 0710 ; 09 ;
摘要
A number of neurologic diseases associated with expanded nucleotide repeats, including an inherited form of amyotrophic lateral sclerosis, have an unconventional form of translation called repeat-associated non-AUG (RAN) translation. It has been speculated that the repeat regions in the RNA fold into secondary structures in a length-dependent manner, promoting RAN translation. Repeat protein products are translated, accumulate, and may contribute to disease pathogenesis. Nucleotides that flank the repeat region, especially ones closest to the initiation site, are believed to enhance translation initiation. A machine learning model has been published to help identify ATG and near-cognate translation initiation sites; however, this model has diminished predictive power due to its extensive feature selection and limited training data. Here, we overcome this limitation and increase prediction accuracy by the following: a) capture the effect of nucleotides most critical for translation initiation via feature reduction, b) implement an alternative machine learning algorithm better suited for limited data, c) build comprehensive and balanced training data (via sampling without replacement) that includes previously unavailable sequences, and d) split ATG and near-cognate translation initiation codon data to train two separate models. We also design a supplementary scoring system to provide an additional prognostic assessment of model predictions. The resultant models have high performance, with similar to 85-88% accuracy, exceeding that of the previously published model by >18%. The models presented here are used to identify translation initiation sites in genes associated with a number of neurologic repeat expansion disorders. The results confirm a number of sites of translation initiation upstream of the expanded repeats that have been found experimentally, and predict sites that are not yet established.
引用
收藏
页数:30
相关论文
共 57 条
[1]  
Abadi Martin, 2016, Proceedings of OSDI '16: 12th USENIX Symposium on Operating Systems Design and Implementation. OSDI '16, P265
[2]   Translational Control under Stress: Reshaping the Translatome [J].
Advani, Vivek M. ;
Ivanov, Pavel .
BIOESSAYS, 2019, 41 (05)
[3]   The Ensembl gene annotation system [J].
Aken, Bronwen L. ;
Ayling, Sarah ;
Barrell, Daniel ;
Clarke, Laura ;
Curwen, Valery ;
Fairley, Susan ;
Banet, Julio Fernandez ;
Billis, Konstantinos ;
Giron, Carlos Garcia ;
Hourlier, Thibaut ;
Howe, Kevin ;
Kahari, Andreas ;
Kokocinski, Felix ;
Martin, Fergal J. ;
Murphy, Daniel N. ;
Nag, Rishi ;
Ruffier, Magali ;
Schuster, Michael ;
Tang, Y. Amy ;
Vogel, Jan-Hinnerk ;
White, Simon ;
Zadissa, Amonida ;
Flicek, Paul ;
Searle, Stephen M. J. .
DATABASE-THE JOURNAL OF BIOLOGICAL DATABASES AND CURATION, 2016,
[4]  
[Anonymous], 2013, Applied predictive modeling
[5]   C9orf72-mediated ALS and FTD: multiple pathways to disease [J].
Balendra, Rubika ;
Isaacs, Adrian M. .
NATURE REVIEWS NEUROLOGY, 2018, 14 (09) :544-558
[6]   Translation of GGC repeat expansions into a toxic polyglycine protein in NIID defines a novel class of human genetic disorders: The polyG diseases [J].
Boivin, Manon ;
Deng, Jianwen ;
Pfister, Veronique ;
Grandgirard, Erwan ;
Oulad-Abdelghani, Mustapha ;
Morlet, Bastien ;
Ruffenach, Frank ;
Negroni, Luc ;
Jacob, Hugues ;
Riet, Fabrice ;
Dijkstra, Anke A. ;
McFadden, Kathryn ;
Clayton, Wiley A. ;
Hong, Daojun ;
Miyahara, Hiroaki ;
Iwasaki, Yasushi ;
Sone, Jun ;
Wang, Zhaoxia ;
Charlet-Berguerand, Nicolas .
NEURON, 2021, 109 (11) :1825-+
[7]   Reduced autophagy upon C9ORF72 loss synergizes with dipeptide repeat protein toxicity in G4C2 repeat expansion disorders [J].
Boivin, Manon ;
Pfister, Veronique ;
Gaucherot, Angeline ;
Ruffenach, Frank ;
Negroni, Luc ;
Sellier, Chantal ;
Charlet-Berguerand, Nicolas .
EMBO JOURNAL, 2020, 39 (04)
[8]   Pervasive functional translation of noncanonical human open reading frames [J].
Chen, Jin ;
Brunner, Andreas-David ;
Cogan, J. Zachery ;
Nunez, James K. ;
Fields, Alexander P. ;
Adamson, Britt ;
Itzhak, Daniel N. ;
Li, Jason Y. ;
Mann, Matthias ;
Leonetti, Manuel D. ;
Weissman, Jonathan S. .
SCIENCE, 2020, 367 (6482) :1140-+
[9]  
Chollet F., 2015, KERAS
[10]   Mechanisms of protein toxicity in neurodegenerative diseases [J].
Chung, Chang Geon ;
Lee, Hyosang ;
Lee, Sung Bae .
CELLULAR AND MOLECULAR LIFE SCIENCES, 2018, 75 (17) :3159-3180