Automated Prediction and Annotation of Small Open Reading Frames in Microbial Genomes

被引:24
|
作者
Durrant, Matthew G. [1 ,2 ]
Bhatt, Ami S. [1 ,2 ]
机构
[1] Stanford Univ, Dept Genet, Stanford, CA 94305 USA
[2] Stanford Univ, Dept Med Hematol Blood & Marrow Transplantat, Stanford, CA 94305 USA
基金
美国国家科学基金会;
关键词
RNA; ALIGNMENT; BACTERIAL; PROTEINS; HIDDEN; SUITE;
D O I
10.1016/j.chom.2020.11.002
中图分类号
Q93 [微生物学];
学科分类号
071005 ; 100705 ;
摘要
Small open reading frames (smORFs) and their encoded microproteins play central roles in microbes. However, there is a vast unexplored space of smORFs within human-associated microbes. A recent bioinformatic analysis used evolutionary conservation signals to enhance prediction of small protein families. To facilitate the annotation of specific smORFs, we introduce SmORFinder. This tool combines profile hidden Markov models of each smORF family and deep learning models that better generalize to smORF families not seen in the training set, resulting in predictions enriched for Ribo-seq translation signals. Feature importance analysis reveals that the deep learning models learn to identify Shine-Dalgarno sequences, deprioritize the wobble position in each codon, and group codon synonyms found in the codon table. A core-genome analysis of 26 bacterial species identifies several core smORFs of unknown function. We pre-compute smORF annotations for thousands of RefSeq isolate genomes and Human Microbiome Project metagenomes and provide these data through a public web portal.
引用
收藏
页码:121 / +
页数:15
相关论文
共 46 条
  • [31] The Small Open Reading Frame-Encoded Peptides: Advances in Methodologies and Functional Studies
    Chen, Lei
    Yang, Ying
    Zhang, Yuanliang
    Li, Kecheng
    Cai, Hongmin
    Wang, Hongwei
    Zhao, Qian
    CHEMBIOCHEM, 2022, 23 (08)
  • [32] Chemical labeling and proteomics for characterization of unannotated small and alternative open reading frame-encoded polypeptides
    Chen, Yanran
    Cao, Xiongwen
    Loh, Ken H.
    Slavoff, Sarah A.
    BIOCHEMICAL SOCIETY TRANSACTIONS, 2023, 51 (03) : 1071 - 1082
  • [33] Small open reading frame-encoded microproteins in cancer: identification, biological functions and clinical significance
    Zhang, Tingting
    Li, Zhang
    Li, Jiao
    Peng, Yong
    MOLECULAR CANCER, 2025, 24 (01)
  • [34] Identification of novel Arabidopsis thaliana upstream open reading frames that control expression of the main coding sequences in a peptide sequence-dependent manner
    Ebina, Isao
    Takemoto-Tsutsumi, Mariko
    Watanabe, Shun
    Koyama, Hiroaki
    Endo, Yayoi
    Kimata, Kaori
    Igarashi, Takuya
    Murakami, Karin
    Kudo, Rin
    Ohsumi, Arisa
    Noh, Abdul Latif
    Takahashi, Hiro
    Naito, Satoshi
    Onouchi, Hitoshi
    NUCLEIC ACIDS RESEARCH, 2015, 43 (03) : 1562 - 1576
  • [35] Expression and strain variation of the novel "small open reading frame" (smorf) multigene family in Babesia bovis
    Ferreri, Lucas M.
    Brayton, Kelly A.
    Sondgeroth, Kerry S.
    Lau, Audrey O. T.
    Suarez, Carlos E.
    McElwain, Terry F.
    INTERNATIONAL JOURNAL FOR PARASITOLOGY, 2012, 42 (02) : 131 - 138
  • [36] Translation initiation landscape profiling reveals hidden open-reading frames required for the pathogenesis of tomato yellow leaf curl Thailand virus
    Chiu, Ching-Wen
    Li, Ya-Ru
    Lin, Cheng-Yuan
    Yeh, Hsin-Hung
    Liu, Ming-Jung
    PLANT CELL, 2022, 34 (05) : 1804 - 1821
  • [37] Zm401p10, encoded by an anther-specific gene with short open reading frames, is essential for tapetum degeneration and anther development in maize
    Wang, Dongxue
    Lia, Chengxia
    Zhao, Qian
    Zhao, Linna
    Wang, Meizhen
    Zhu, Dengyun
    Ao, Guangming
    Yu, Jingjuan
    FUNCTIONAL PLANT BIOLOGY, 2009, 36 (01) : 73 - 85
  • [38] The mitochondrial genome of Morchella importuna (272.2 kb) is the largest among fungi and contains numerous introns, mitochondrial non-conserved open reading frames and repetitive sequences
    Liu, Wei
    Cai, Yingli
    Zhang, Qianqian
    Chen, Lianfu
    Shu, Fang
    Ma, Xiaolong
    Bian, Yinbing
    INTERNATIONAL JOURNAL OF BIOLOGICAL MACROMOLECULES, 2020, 143 : 373 - 381
  • [39] A Translation-Aborting Small Open Reading Frame in the Intergenic Region Promotes Translation of a Mg2+ Transporter in Salmonella Typhimurium
    Choi, Eunna
    Han, Yoontak
    Park, Shinae
    Koo, Hyojeong
    Lee, Jung-Shin
    Lee, Eun-Jin
    MBIO, 2021, 12 (02):
  • [40] Roles of genomic island 3 (GI-3) BAB1_0267 and BAB1_0270 open reading frames (ORFs) in the virulence of Brucella abortus 2308
    Ortiz-Roman, Luisa
    Riquelme-Neira, Roberto
    RobertoVidal
    Onate, Angel
    VETERINARY MICROBIOLOGY, 2014, 172 (1-2) : 279 - 284