GtTR: Bayesian estimation of absolute tandem repeat copy number using sequence capture and high throughput sequencing

被引:0
作者
Ganesamoorthy, Devika [1 ]
Minh Duc Cao [1 ]
Duarte, Tania [1 ]
Chen, Wenhan [1 ]
Coin, Lachlan [1 ]
机构
[1] Univ Queensland, Inst Mol Biosci, Brisbane, Qld, Australia
基金
英国医学研究理事会;
关键词
Tandem repeats; GtTR; VNTR; Target capture sequencing; TRINUCLEOTIDE CAG REPEAT; SCALE ANALYSIS; GENOME; ASSOCIATION; VARIABILITY; EVOLUTION; GENE; DNA;
D O I
10.1186/s12859-018-2282-3
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Background: Tandem repeats comprise significant proportion of the human genome including coding and regulatory regions. They are highly prone to repeat number variation and nucleotide mutation due to their repetitive and unstable nature, making them a major source of genomic variation between individuals. Despite recent advances in high throughput sequencing, analysis of tandem repeats in the context of complex diseases is still hindered by technical limitations. We report a novel targeted sequencing approach, which allows simultaneous analysis of hundreds of repeats. We developed a Bayesian algorithm, namely-GtTR-which combines information from a reference long-read dataset with a short read counting approach to genotype tandem repeats at population scale. PCR sizing analysis was used for validation. Results: We used a PacBio long-read sequenced sample to generate a reference tandem repeat genotype dataset with on average 13% absolute deviation from PCR sizing results. Using this reference dataset GtTR generated estimates of VNTR copy number with accuracy within 95% high posterior density (HPD) intervals of 68 and 83% for capture sequence data and 200X WGS data respectively, improving to 87 and 94% with use of a PCR reference. We show that the genotype resolution increases as a function of depth, such that the median 95% HPD interval lies within 25, 14, 12 and 8% of the its midpoint copy number value for 30X, 200X WGS, 395X and 800X capture sequence data respectively. We validated nine targets by PCR sizing analysis and genotype estimates from sequencing results correlated well with PCR results. Conclusions: The novel genotyping approach described here presents a new cost-effective method to explore previously unrecognized class of repeat variation in GWAS studies of complex diseases at the population level. Further improvements in accuracy can be obtained by improving accuracy of the reference dataset.
引用
收藏
页数:14
相关论文
共 36 条
[1]   A global reference for human genetic variation [J].
Altshuler, David M. ;
Durbin, Richard M. ;
Abecasis, Goncalo R. ;
Bentley, David R. ;
Chakravarti, Aravinda ;
Clark, Andrew G. ;
Donnelly, Peter ;
Eichler, Evan E. ;
Flicek, Paul ;
Gabriel, Stacey B. ;
Gibbs, Richard A. ;
Green, Eric D. ;
Hurles, Matthew E. ;
Knoppers, Bartha M. ;
Korbel, Jan O. ;
Lander, Eric S. ;
Lee, Charles ;
Lehrach, Hans ;
Mardis, Elaine R. ;
Marth, Gabor T. ;
McVean, Gil A. ;
Nickerson, Deborah A. ;
Wang, Jun ;
Wilson, Richard K. ;
Boerwinkle, Eric ;
Doddapaneni, Harsha ;
Han, Yi ;
Korchina, Viktoriya ;
Kovar, Christie ;
Lee, Sandra ;
Muzny, Donna ;
Reid, Jeffrey G. ;
Zhu, Yiming ;
Chang, Yuqi ;
Feng, Qiang ;
Fang, Xiaodong ;
Guo, Xiaosen ;
Jian, Min ;
Jiang, Hui ;
Jin, Xin ;
Lan, Tianming ;
Li, Guoqing ;
Li, Jingxiang ;
Li, Yingrui ;
Liu, Shengmao ;
Liu, Xiao ;
Lu, Yao ;
Ma, Xuedi ;
Tang, Meifang ;
Wang, Bo .
NATURE, 2015, 526 (7571) :68-+
[2]   THE RELATIONSHIP BETWEEN TRINUCLEOTIDE (CAG) REPEAT LENGTH AND CLINICAL-FEATURES OF HUNTINGTONS-DISEASE [J].
ANDREW, SE ;
GOLDBERG, YP ;
KREMER, B ;
TELENIUS, H ;
THEILMANN, J ;
ADAM, S ;
STARR, E ;
SQUITIERI, F ;
LIN, BY ;
KALCHMAN, MA ;
GRAHAM, RK ;
HAYDEN, MR .
NATURE GENETICS, 1993, 4 (04) :398-403
[3]   Tandemly repeated DNA: Why should anyone care? [J].
Armour, John A. L. .
MUTATION RESEARCH-FUNDAMENTAL AND MOLECULAR MECHANISMS OF MUTAGENESIS, 2006, 598 (1-2) :6-14
[4]  
Bakhtiari M, 2017, BIORXIV
[5]   DNA-RFLP METHODS AND INTERPRETATION SCHEME FOR HLA-DR-TYPING AND DQ-TYPING [J].
BIDWELL, JL ;
BIGNON, JD .
EUROPEAN JOURNAL OF IMMUNOGENETICS, 1991, 18 (1-2) :5-22
[6]  
Brahmachary M, 2014, DIGITAL GENOTYPING M, P1553
[7]   Digital Genotyping of Macrosatellites and Multicopy Genes Reveals Novel Biological Functions Associated with Copy Number Variation of Large Tandem Repeats [J].
Brahmachary, Manisha ;
Guilmatre, Audrey ;
Quilez, Javier ;
Hasson, Dan ;
Borel, Christelle ;
Warburton, Peter ;
Sharp, Andrew J. .
PLOS GENETICS, 2014, 10 (06)
[8]  
Cao MD, 2017, BIOINFORMATICS
[9]   Mapping single molecule sequencing reads using basic local alignment with successive refinement (BLASR): application and theory [J].
Chaisson, Mark J. ;
Tesler, Glenn .
BMC BIOINFORMATICS, 2012, 13
[10]   Digital fragment analysis of short tandem repeats by high-throughput amplicon sequencing [J].
Darby, Brian J. ;
Erickson, Shay F. ;
Hervey, Samuel D. ;
Ellis-Felege, Susan N. .
ECOLOGY AND EVOLUTION, 2016, 6 (13) :4502-4512