A pipeline for local assembly of minisatellite alleles from single-molecule sequencing data

被引:2
作者
Ogeh, Denye [1 ]
Badge, Richard [1 ]
机构
[1] Univ Leicester, Dept Genet, Leicester, Leics, England
关键词
GENOME; DNA;
D O I
10.1093/bioinformatics/btw687
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Motivation: The advent of Next Generation Sequencing (NGS) has led to the generation of enormous volumes of short read sequence data, cheaply and in reasonable time scales. Nevertheless, the quality of genome assemblies generated using NGS technologies has been greatly affected, compared to those generated using Sanger DNA sequencing. This is largely due to the inability of short read sequence data to scaffold repetitive structures, creating gaps, inversions and rearrangements and resulting in assemblies that are, at best, draft forms. Third generation single-molecule sequencing (SMS) technologies (e.g. Pacific Biosciences Single Molecule Real Time (SMRT) system) address this challenge by generating sequences with increased read lengths, offering the prospect to better recover these complex repetitive structures, concomitantly improving assembly quality. Results: Here, we evaluate the ability of SMS data (specifically human genome Pacific Biosciences SMRT data) to recover poorly represented repetitive sequences (specifically, GC-rich human minisatellites). To do this we designed a pipeline for the collection, processing and local assembly of single-molecule sequence data to form accurate contiguous local reconstructions. Our results show the recovery of an allele of the non-coding minisatellite MS1 (located on chromosome 1 at 1p33-35) at greater than 97% identity to reference (GRCh38) from the unprocessed sequence data of a haploid complete hydatidiform mole (CHM1) cell line. Furthermore, our assembly revealed an allele of over 500 repeat units; much larger than the reference (GRCh38), but consistent in structure with naturally occurring alleles that are segregating in human populations. This local assembly's reconstruction was validated with the release of the whole genome assemblies GCA_001297185.1 and GCA_000772585.3, where this allele occurs. Additionally, application of this pipeline to coding minisatellites in the PRDM9 and ZNF93 genes enabled recovery of high identity allele structures for these sequence regions whose length was confirmed by PCR from cell line genomic DNA. The internal repeat structure of the PRDM9 allele recovered was consistent with common human-specific alleles.
引用
收藏
页码:650 / 653
页数:4
相关论文
共 25 条
  • [1] BASIC LOCAL ALIGNMENT SEARCH TOOL
    ALTSCHUL, SF
    GISH, W
    MILLER, W
    MYERS, EW
    LIPMAN, DJ
    [J]. JOURNAL OF MOLECULAR BIOLOGY, 1990, 215 (03) : 403 - 410
  • [2] Tandem repeats finder: a program to analyze DNA sequences
    Benson, G
    [J]. NUCLEIC ACIDS RESEARCH, 1999, 27 (02) : 573 - 580
  • [3] PRDM9 variation strongly influences recombination hot-spot activity and meiotic instability in humans
    Berg, Ingrid L.
    Neumann, Rita
    Lam, Kwan-Wood G.
    Sarbajna, Shriparna
    Odenthal-Hesse, Linda
    May, Celia A.
    Jeffreys, Alec J.
    [J]. NATURE GENETICS, 2010, 42 (10) : 859 - +
  • [4] Multi-Platform Next-Generation Sequencing of the Domestic Turkey (Meleagris gallopavo): Genome Assembly and Analysis
    Dalloul, Rami A.
    Long, Julie A.
    Zimin, Aleksey V.
    Aslam, Luqman
    Beal, Kathryn
    Blomberg, Le Ann
    Bouffard, Pascal
    Burt, David W.
    Crasta, Oswald
    Crooijmans, Richard P. M. A.
    Cooper, Kristal
    Coulombe, Roger A.
    De, Supriyo
    Delany, Mary E.
    Dodgson, Jerry B.
    Dong, Jennifer J.
    Evans, Clive
    Frederickson, Karin M.
    Flicek, Paul
    Florea, Liliana
    Folkerts, Otto
    Groenen, Martien A. M.
    Harkins, Tim T.
    Herrero, Javier
    Hoffmann, Steve
    Megens, Hendrik-Jan
    Jiang, Andrew
    de Jong, Pieter
    Kaiser, Pete
    Kim, Heebal
    Kim, Kyu-Won
    Kim, Sungwon
    Langenberger, David
    Lee, Mi-Kyung
    Lee, Taeheon
    Mane, Shrinivasrao
    Marcais, Guillaume
    Marz, Manja
    McElroy, Audrey P.
    Modise, Thero
    Nefedov, Mikhail
    Notredame, Cedric
    Paton, Ian R.
    Payne, William S.
    Pertea, Geo
    Prickett, Dennis
    Puiu, Daniela
    Qioa, Dan
    Raineri, Emanuele
    Ruffier, Magali
    [J]. PLOS BIOLOGY, 2010, 8 (09)
  • [5] Mind the Gap: Upgrading Genomes with Pacific Biosciences RS Long-Read Sequencing Technology
    English, Adam C.
    Richards, Stephen
    Han, Yi
    Wang, Min
    Vee, Vanesa
    Qu, Jiaxin
    Qin, Xiang
    Muzny, Donna M.
    Reid, Jeffrey G.
    Worley, Kim C.
    Gibbs, Richard A.
    [J]. PLOS ONE, 2012, 7 (11):
  • [6] Paternal origins of complete hydatidiform moles proven by whole genome single-nucleotide polymorphism haplotyping
    Fan, JB
    Surti, U
    Taillon-Miller, P
    Hsie, L
    Kennedy, GC
    Hoffner, L
    Ryder, T
    Mutch, DG
    Kwok, PY
    [J]. GENOMICS, 2002, 79 (01) : 58 - 62
  • [7] EVOLUTIONARY TRANSIENCE OF HYPERVARIABLE MINISATELLITES IN MAN AND THE PRIMATES
    GRAY, IC
    JEFFREYS, AJ
    [J]. PROCEEDINGS OF THE ROYAL SOCIETY B-BIOLOGICAL SCIENCES, 1991, 243 (1308) : 241 - 253
  • [8] Henson J, 2012, PHARMACOGENOMICS, V13, P901, DOI [10.2217/PGS.12.72, 10.2217/pgs.12.72]
  • [9] Reconstructing complex regions of genomes using long-read sequencing technology
    Huddleston, John
    Ranade, Swati
    Malig, Maika
    Antonacci, Francesca
    Chaisson, Mark
    Hon, Lawrence
    Sudmant, Peter H.
    Graves, Tina A.
    Alkan, Can
    Dennis, Megan Y.
    Wilson, Richard K.
    Turner, Stephen W.
    Korlach, Jonas
    Eichler, Evan E.
    [J]. GENOME RESEARCH, 2014, 24 (04) : 688 - 696
  • [10] Extensive genomic and transcriptional diversity identified through massively parallel DNA and RNA sequencing of eighteen Korean individuals
    Ju, Young Seok
    Kim, Jong-Il
    Kim, Sheehyun
    Hong, Dongwan
    Park, Hansoo
    Shin, Jong-Yeon
    Lee, Seungbok
    Lee, Won-Chul
    Kim, Sujung
    Yu, Saet-Byeol
    Park, Sung-Soo
    Seo, Seung-Hyun
    Yun, Ji-Young
    Kim, Hyun-Jin
    Lee, Dong-Sung
    Yavartanoo, Maryam
    Kang, Hyunseok Peter
    Gokcumen, Omer
    Govindaraju, Diddahally R.
    Jung, Jung Hee
    Chong, Hyonyong
    Yang, Kap-Seok
    Kim, Hyungtae
    Lee, Charles
    Seo, Jeong-Sun
    [J]. NATURE GENETICS, 2011, 43 (08) : 745 - U47