VNTRseek--a computational tool to detect tandem repeat variants in high-throughput sequencing data

被引:27
作者
Gelfand, Yevgeniy [1 ]
Hernandez, Yozen [2 ]
Loving, Joshua [2 ]
Benson, Gary [1 ,2 ,3 ]
机构
[1] Boston Univ, Lab Biocomputing & Informat, Boston, MA 02215 USA
[2] Boston Univ, Grad Program Bioinformat, Boston, MA 02215 USA
[3] Boston Univ, Dept Comp Sci, Boston, MA 02215 USA
基金
美国国家科学基金会;
关键词
SEROTONIN TRANSPORTER GENE; READ ALIGNMENT; TRIPLET REPEAT; POLYMORPHISM; ASSOCIATION; LOCUS; INSERTIONS; ALGORITHMS; ACCURATE; MAIZE;
D O I
10.1093/nar/gku642
中图分类号
Q5 [生物化学]; Q7 [分子生物学];
学科分类号
071010 ; 081704 ;
摘要
DNA tandem repeats (TRs) are ubiquitous genomic features which consist of two or more adjacent copies of an underlying pattern sequence. The copies may be identical or approximate. Variable number of tandem repeats or VNTRs are polymorphic TR loci in which the number of pattern copies is variable. In this paper we describe VNTRseek, our software for discovery of minisatellite VNTRs (pattern size >= 7 nucleotides) using whole genome sequencing data. VNTRseek maps sequencing reads to a set of reference TRs and then identifies putative VNTRs based on a discrepancy between the copy number of a reference and its mapped reads. VNTRseek was used to analyze the Watson and Khoisan genomes (454 technology) and two 1000 Genomes family trios (Illumina). In the Watson genome, we identified 752 VNTRs with pattern sizes ranging from 7 to 84 nt. In the Khoisan genome, we identified 2572 VNTRs with pattern sizes ranging from 7 to 105 nt. In the trios, we identified between 2660 and 3822 VNTRs per individual and found nearly 100% consistency with Mendelian inheritance. VNTRseek is, to the best of our knowledge, the first software for genome-wide detection of minisatellite VNTRs. It is available at http://orca.bu.edu/vntrseek/.
引用
收藏
页码:8884 / 8894
页数:11
相关论文
共 67 条
[1]   An RNA-dependent RNA polymerase is required for paramutation in maize [J].
Alleman, Mary ;
Sidorenko, Lyudmila ;
McGinnis, Karen ;
Seshadri, Vishwas ;
Dorweiler, Jane E. ;
White, Joshua ;
Sikkink, Kristin ;
Chandler, Vicki L. .
NATURE, 2006, 442 (7100) :295-298
[2]   BASIC LOCAL ALIGNMENT SEARCH TOOL [J].
ALTSCHUL, SF ;
GISH, W ;
MILLER, W ;
MYERS, EW ;
LIPMAN, DJ .
JOURNAL OF MOLECULAR BIOLOGY, 1990, 215 (03) :403-410
[3]   A map of human genome variation from population-scale sequencing [J].
Altshuler, David ;
Durbin, Richard M. ;
Abecasis, Goncalo R. ;
Bentley, David R. ;
Chakravarti, Aravinda ;
Clark, Andrew G. ;
Collins, Francis S. ;
De la Vega, Francisco M. ;
Donnelly, Peter ;
Egholm, Michael ;
Flicek, Paul ;
Gabriel, Stacey B. ;
Gibbs, Richard A. ;
Knoppers, Bartha M. ;
Lander, Eric S. ;
Lehrach, Hans ;
Mardis, Elaine R. ;
McVean, Gil A. ;
Nickerson, DebbieA. ;
Peltonen, Leena ;
Schafer, Alan J. ;
Sherry, Stephen T. ;
Wang, Jun ;
Wilson, Richard K. ;
Gibbs, Richard A. ;
Deiros, David ;
Metzker, Mike ;
Muzny, Donna ;
Reid, Jeff ;
Wheeler, David ;
Wang, Jun ;
Li, Jingxiang ;
Jian, Min ;
Li, Guoqing ;
Li, Ruiqiang ;
Liang, Huiqing ;
Tian, Geng ;
Wang, Bo ;
Wang, Jian ;
Wang, Wei ;
Yang, Huanming ;
Zhang, Xiuqing ;
Zheng, Huisong ;
Lander, Eric S. ;
Altshuler, David L. ;
Ambrogio, Lauren ;
Bloom, Toby ;
Cibulskis, Kristian ;
Fennell, Tim J. ;
Gabriel, Stacey B. .
NATURE, 2010, 467 (7319) :1061-1073
[4]  
[Anonymous], 2004, P 15 AUSTRALASIAN WO
[5]  
[Anonymous], 2012, Nature
[6]   Tandem repeats finder: a program to analyze DNA sequences [J].
Benson, G .
NUCLEIC ACIDS RESEARCH, 1999, 27 (02) :573-580
[7]   A new distance measure for comparing sequence profiles based on path lengths along an entropy surface [J].
Benson, G .
BIOINFORMATICS, 2002, 18 :S44-S53
[8]  
Benson G, 2013, LECT NOTES COMPUT SC, V7922, P50, DOI 10.1007/978-3-642-38905-4_7
[9]  
BRINK RA, 1956, GENETICS, V41, P872
[10]   STRs vs. SNPs: Thoughts on the future of forensic DNA testing [J].
Butler J.M. ;
Coble M.D. ;
Vallone P.M. .
Forensic Science, Medicine, and Pathology, 2007, 3 (3) :200-205