Assembler for de novo assembly of large genomes

被引:21
|
作者
Chu, Te-Chin [1 ,4 ]
Lu, Chen-Hua [1 ]
Liu, Tsunglin [5 ]
Lee, Greg C. [4 ]
Li, Wen-Hsiung [2 ,6 ]
Shih, Arthur Chun-Chieh [1 ,3 ]
机构
[1] Acad Sinica, Inst Informat Sci, Taipei 115, Taiwan
[2] Acad Sinica, Biodivers Res Ctr, Taipei 115, Taiwan
[3] Acad Sinica, Res Ctr Informat Technol Innovat, Taipei 115, Taiwan
[4] Natl Taiwan Normal Univ, Dept Comp Sci & Informat Engn, Taipei 106, Taiwan
[5] Natl Cheng Kung Univ, Inst Bioinformat & Biosignal Transduct, Tainan 701, Taiwan
[6] Univ Chicago, Dept Ecol & Evolut, Chicago, IL 60637 USA
关键词
SHORT DNA-SEQUENCES; MILLIONS; READS;
D O I
10.1073/pnas.1314090110
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
07 ; 0710 ; 09 ;
摘要
Assembling a large genome using next generation sequencing reads requires large computer memory and a long execution time. To reduce these requirements, we propose an extension-based assembler, called JR-Assembler, where J and R stand for "jumping" extension and read "remapping." First, it uses the read count to select good quality reads as seeds. Second, it extends each seed by a whole-read extension process, which expedites the extension process and can jump over short repeats. Third, it uses a dynamic back trimming process to avoid extension termination due to sequencing errors. Fourth, it remaps reads to each assembled sequence, and if an assembly error occurs by the presence of a repeat, it breaks the contig at the repeat boundaries. Fifth, it applies a less stringent extension criterion to connect low-coverage regions. Finally, it merges contigs by unused reads. An extensive comparison of JR-Assembler with current assemblers using data-sets from small, medium, and large genomes shows that JR-Assembler achieves a better or comparable overall assembly quality and requires lower memory use and less central processing unit time, especially for large genomes. Finally, a simulation study shows that JR-Assembler achieves a superior performance on memory use and central processing unit time than most current assemblers when the read length is 150 bp or longer, indicating that the advantages of JR-Assembler over current assemblers will increase as the read length increases with advances in next generation sequencing technology.
引用
收藏
页码:E3417 / E3424
页数:8
相关论文
共 50 条
  • [1] BASE: a practical de novo assembler for large genomes using long NGS reads
    Binghang Liu
    Chi-Man Liu
    Dinghua Li
    Yingrui Li
    Hing-Fung Ting
    Siu-Ming Yiu
    Ruibang Luo
    Tak-Wah Lam
    BMC Genomics, 17
  • [2] BASE: A Practical de novo Assembler for Large Genomes Using Longer NGS Reads
    Liu, Binghang
    Luo, Ruibang
    Liu, Chi-Man
    Li, Dinghua
    Li, Yingrui
    Ting, Hing-Fung
    Yiu, Siu-Ming
    Lam, Tak-Wah
    BIOINFORMATICS RESEARCH AND APPLICATIONS (ISBRA 2015), 2015, 9096 : 429 - 430
  • [3] BASE: a practical de novo assembler for large genomes using long NGS reads
    Liu, Binghang
    Liu, Chi-Man
    Li, Dinghua
    Li, Yingrui
    Ting, Hing-Fung
    Yiu, Siu-Ming
    Luo, Ruibang
    Lam, Tak-Wah
    BMC GENOMICS, 2016, 17
  • [4] De novo assembly of human genomes
    Ameur, Adam
    EUROPEAN JOURNAL OF HUMAN GENETICS, 2022, 30 (SUPPL 1) : 12 - 12
  • [5] hAssembler: A hybrid de novo genome assembly approach for large genomes
    Kairi, Amit
    Majumdar, Priyanka Guha
    Rao, Atmakuri Ramakrishna
    INDIAN JOURNAL OF AGRICULTURAL SCIENCES, 2020, 90 (10): : 164 - 169
  • [6] Efficient de novo assembly of large genomes using compressed data structures
    Simpson, Jared T.
    Durbin, Richard
    GENOME RESEARCH, 2012, 22 (03) : 549 - 556
  • [7] An Integrated Pipeline for de Novo Assembly of Microbial Genomes
    Tritt, Andrew
    Eisen, Jonathan A.
    Facciotti, Marc T.
    Darling, Aaron E.
    PLOS ONE, 2012, 7 (09):
  • [8] Comparison of De Novo Assembly Strategies for Bacterial Genomes
    Zhang, Pengfei
    Jiang, Dike
    Wang, Yin
    Yao, Xueping
    Luo, Yan
    Yang, Zexiao
    INTERNATIONAL JOURNAL OF MOLECULAR SCIENCES, 2021, 22 (14)
  • [9] Genetic variation and the de novo assembly of human genomes
    Mark J. P. Chaisson
    Richard K. Wilson
    Evan E. Eichler
    Nature Reviews Genetics, 2015, 16 : 627 - 640
  • [10] Towards Accurate De Novo Assembly for Genomes with Repeats
    Bucur, Doina
    2017 IEEE CONFERENCE ON COMPUTATIONAL INTELLIGENCE IN BIOINFORMATICS AND COMPUTATIONAL BIOLOGY (CIBCB), 2017, : 106 - +