Improving protein domain classification for third-generation sequencing reads using deep learning

被引:5
|
作者
Du, Nan [1 ]
Shang, Jiayu [2 ]
Sun, Yanni [2 ]
机构
[1] Michigan State Univ, Comp Sci & Engn, E Lansing, MI 48824 USA
[2] City Univ Hong Kong, Elect Engn, Hong Kong, Peoples R China
关键词
HUMAN GENOME; ALIGNMENT;
D O I
10.1186/s12864-021-07468-7
中图分类号
Q81 [生物工程学(生物技术)]; Q93 [微生物学];
学科分类号
071005 ; 0836 ; 090102 ; 100705 ;
摘要
Background With the development of third-generation sequencing (TGS) technologies, people are able to obtain DNA sequences with lengths from 10s to 100s of kb. These long reads allow protein domain annotation without assembly, thus can produce important insights into the biological functions of the underlying data. However, the high error rate in TGS data raises a new challenge to established domain analysis pipelines. The state-of-the-art methods are not optimized for noisy reads and have shown unsatisfactory accuracy of domain classification in TGS data. New computational methods are still needed to improve the performance of domain prediction in long noisy reads. Results In this work, we introduce ProDOMA, a deep learning model that conducts domain classification for TGS reads. It uses deep neural networks with 3-frame translation encoding to learn conserved features from partially correct translations. In addition, we formulate our problem as an open-set problem and thus our model can reject reads not containing the targeted domains. In the experiments on simulated long reads of protein coding sequences and real TGS reads from the human genome, our model outperforms HMMER and DeepFam on protein domain classification. Conclusions In summary, ProDOMA is a useful end-to-end protein domain analysis tool for long noisy reads without relying on error correction.
引用
收藏
页数:13
相关论文
共 50 条
  • [1] Improving protein domain classification for third-generation sequencing reads using deep learning
    Nan Du
    Jiayu Shang
    Yanni Sun
    BMC Genomics, 22
  • [2] Scaffolding algorithm using second and third-generation reads
    Franus, Wiktor
    Kusmirek, Wiktor
    Nowak, Robert M.
    PHOTONICS APPLICATIONS IN ASTRONOMY, COMMUNICATIONS, INDUSTRY, AND HIGH-ENERGY PHYSICS EXPERIMENTS 2018, 2018, 10808
  • [3] A Sequence-Based Novel Approach for Quality Evaluation of Third-Generation Sequencing Reads
    Zhang, Wenjing
    Huang, Neng
    Zheng, Jiantao
    Liao, Xingyu
    Wang, Jianxin
    Li, Hong-Dong
    GENES, 2019, 10 (01):
  • [4] A window into third-generation sequencing
    Schadt, Eric E.
    Turner, Steve
    Kasarskis, Andrew
    HUMAN MOLECULAR GENETICS, 2010, 19 : R227 - R240
  • [5] Cheap third-generation sequencing
    Rusk, Nicole
    NATURE METHODS, 2009, 6 (04) : 244 - 245
  • [6] Third-Generation Sequencing Debuts
    Glaser, Vicki
    GENETIC ENGINEERING & BIOTECHNOLOGY NEWS, 2010, 30 (08): : 30 - 33
  • [7] Cheap third-generation sequencing
    Nicole Rusk
    Nature Methods, 2009, 6 : 244 - 244
  • [8] Third-Generation Sequencing of Epigenetic DNA
    Searle, Bethany
    Mueller, Markus
    Carell, Thomas
    Kellett, Andrew
    ANGEWANDTE CHEMIE-INTERNATIONAL EDITION, 2023, 62 (14)
  • [9] Third-generation sequencing for genetic disease
    Ling, Xiaoting
    Wang, Chenghan
    Li, Linlin
    Pan, Liqiu
    Huang, Chaoyu
    Zhang, Caixia
    Huang, Yunhua
    Qiu, Yuling
    Lin, Faquan
    Huang, Yifang
    CLINICA CHIMICA ACTA, 2023, 551
  • [10] Identification of rare thalassemia variants using third-generation sequencing
    Liu, Qin
    Chen, Qianting
    Zhang, Zonglei
    Peng, Shiyi
    Liu, Jing
    Pang, Jialun
    Jia, Zhengjun
    Xi, Hui
    Li, Jiaqi
    Chen, Libao
    Liu, Yinyin
    Peng, Ying
    FRONTIERS IN GENETICS, 2023, 13