Improving protein domain classification for third-generation sequencing reads using deep learning

被引:5
|
作者
Du, Nan [1 ]
Shang, Jiayu [2 ]
Sun, Yanni [2 ]
机构
[1] Michigan State Univ, Comp Sci & Engn, E Lansing, MI 48824 USA
[2] City Univ Hong Kong, Elect Engn, Hong Kong, Peoples R China
关键词
HUMAN GENOME; ALIGNMENT;
D O I
10.1186/s12864-021-07468-7
中图分类号
Q81 [生物工程学(生物技术)]; Q93 [微生物学];
学科分类号
071005 ; 0836 ; 090102 ; 100705 ;
摘要
Background With the development of third-generation sequencing (TGS) technologies, people are able to obtain DNA sequences with lengths from 10s to 100s of kb. These long reads allow protein domain annotation without assembly, thus can produce important insights into the biological functions of the underlying data. However, the high error rate in TGS data raises a new challenge to established domain analysis pipelines. The state-of-the-art methods are not optimized for noisy reads and have shown unsatisfactory accuracy of domain classification in TGS data. New computational methods are still needed to improve the performance of domain prediction in long noisy reads. Results In this work, we introduce ProDOMA, a deep learning model that conducts domain classification for TGS reads. It uses deep neural networks with 3-frame translation encoding to learn conserved features from partially correct translations. In addition, we formulate our problem as an open-set problem and thus our model can reject reads not containing the targeted domains. In the experiments on simulated long reads of protein coding sequences and real TGS reads from the human genome, our model outperforms HMMER and DeepFam on protein domain classification. Conclusions In summary, ProDOMA is a useful end-to-end protein domain analysis tool for long noisy reads without relying on error correction.
引用
收藏
页数:13
相关论文
共 50 条
  • [21] On the study of microbial transcriptomes using second- and third-generation sequencing technologies
    Sang Chul Choi
    Journal of Microbiology, 2016, 54 : 527 - 536
  • [22] A Crowdsourcing Method for Correcting Sequencing Errors for the Third-generation Sequencing Data
    Geng, Yu
    Zhao, Zhongmeng
    Du, Zhaofang
    Wang, Yixuan
    Zheng, Tian
    He, Siyu
    Zhang, Xuanping
    Wang, Jiayin
    2017 IEEE INTERNATIONAL CONFERENCE ON BIOINFORMATICS AND BIOMEDICINE (BIBM), 2017, : 1626 - 1633
  • [23] Enhancing Thalassemia Diagnosis: Advantages of Third-Generation Sequencing
    Huang, Minjun
    Huang, Jiexiang
    Yu, Liumin
    Lin, Kun
    CLINICAL LABORATORY, 2025, 71 (01) : 64 - 72
  • [24] Third-generation sequencing: any future opportunities for PGT?
    Sai Liu
    Hui Wang
    Don Leigh
    David S. Cram
    Li Wang
    Yuanqing Yao
    Journal of Assisted Reproduction and Genetics, 2021, 38 : 357 - 364
  • [25] Micropathogen community identification in ticks (Acari: Ixodidae) using third-generation sequencing
    Luo, Jin
    Ren, Qiaoyun
    Liu, Wenge
    Li, Xiangrui
    Yin, Hong
    Song, Mingxin
    Zhao, Bo
    Guan, Guiquan
    Luo, Jianxun
    Liu, Guangyuan
    INTERNATIONAL JOURNAL FOR PARASITOLOGY-PARASITES AND WILDLIFE, 2021, 15 : 238 - 248
  • [26] Next-Generation Sequencing (NGS) and Third-Generation Sequencing (TGS) for the Diagnosis of Thalassemia
    Hassan, Syahzuwan
    Bahar, Rosnah
    Johan, Muhammad Farid
    Mohamed Hashim, Ezzeddin Kamil
    Abdullah, Wan Zaidah
    Esa, Ezalia
    Abdul Hamid, Faidatul Syazlin
    Zulkafli, Zefarina
    DIAGNOSTICS, 2023, 13 (03)
  • [27] Third-Generation Sequencing Reveals the Adaptive Role of the Epigenome in Three Deep-Sea Polychaetes
    Perez, Maeva
    Aroh, Oluchi
    Sun, Yanan
    Lan, Yi
    Juniper, Stanley Kim
    Young, Curtis Robert
    Angers, Bernard
    Qian, Pei-Yuan
    MOLECULAR BIOLOGY AND EVOLUTION, 2023, 40 (08)
  • [28] Improving the performance of third-generation wireless communication systems
    Van Der Hofstad, R
    Klok, MJ
    ADVANCES IN APPLIED PROBABILITY, 2004, 36 (04) : 1046 - 1084
  • [29] Third-Generation Sequencing in the Clinical Laboratory: Exploring the Advantages and Challenges of Nanopore Sequencing
    Petersen, Lauren M.
    Martin, Isabella W.
    Moschetti, Wayne E.
    Kershaw, Colleen M.
    Tsongalis, Gregory J.
    JOURNAL OF CLINICAL MICROBIOLOGY, 2020, 58 (01)
  • [30] de novo repeat detection based on the third generation sequencing reads
    Liao, Xingyu
    Zhang, Xiankai
    Wu, Fang-Xiang
    Wang, Jianxin
    2019 IEEE INTERNATIONAL CONFERENCE ON BIOINFORMATICS AND BIOMEDICINE (BIBM), 2019, : 431 - 436