Linkage disequilibrium based genotype calling from low-coverage shotgun sequencing reads

被引:10
作者
Duitama, Jorge [1 ]
Kennedy, Justin [1 ]
Dinakar, Sanjiv [2 ]
Hernandez, Yoezen [3 ]
Wu, Yufeng [1 ]
Mandoiu, Ion I. [1 ]
机构
[1] Univ Connecticut, Dept Comp Sci & Engn, Unit 2155, Storrs, CT 06269 USA
[2] Univ Maryland, Dept Comp Sci, College Pk, MD 20742 USA
[3] CUNY Hunter Coll, Dept Comp Sci, New York, NY 10021 USA
来源
BMC BIOINFORMATICS | 2011年 / 12卷
基金
美国国家科学基金会;
关键词
HIDDEN MARKOV MODEL; STRUCTURAL VARIATION; GENOME; ASSOCIATION; IMPUTATION;
D O I
10.1186/1471-2105-12-S1-S53
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Background: Recent technology advances have enabled sequencing of individual genomes, promising to revolutionize biomedical research. However, deep sequencing remains more expensive than microarrays for performing whole-genome SNP genotyping. Results: In this paper we introduce a new multi-locus statistical model and computationally efficient genotype calling algorithms that integrate shotgun sequencing data with linkage disequilibrium (LD) information extracted from reference population panels such as Hapmap or the 1000 genomes project. Experiments on publicly available 454, Illumina, and ABI SOLiD sequencing datasets suggest that integration of LD information results in genotype calling accuracy comparable to that of microarray platforms from sequencing data of low-coverage. A software package implementing our algorithm, released under the GNU General Public License, is available at http://dna.engr.uconn.edu/software/GeneSeq/. Conclusions: Integration of LD information leads to significant improvements in genotype calling accuracy compared to prior LD-oblivious methods, rendering low-coverage sequencing as a viable alternative to microarrays for conducting large-scale genome-wide association studies.
引用
收藏
页数:11
相关论文
共 41 条
  • [1] *APPL BIOS, SOLID 4 SYST PROD DE
  • [2] Designing deep sequencing experiments: detecting structural variation and estimating transcript abundance
    Bashir, Ali
    Bansal, Vikas
    Bafna, Vineet
    [J]. BMC GENOMICS, 2010, 11
  • [3] A MAXIMIZATION TECHNIQUE OCCURRING IN STATISTICAL ANALYSIS OF PROBABILISTIC FUNCTIONS OF MARKOV CHAINS
    BAUM, LE
    PETRIE, T
    SOULES, G
    WEISS, N
    [J]. ANNALS OF MATHEMATICAL STATISTICS, 1970, 41 (01): : 164 - &
  • [4] Accurate whole human genome sequencing using reversible terminator chemistry
    Bentley, David R.
    Balasubramanian, Shankar
    Swerdlow, Harold P.
    Smith, Geoffrey P.
    Milton, John
    Brown, Clive G.
    Hall, Kevin P.
    Evers, Dirk J.
    Barnes, Colin L.
    Bignell, Helen R.
    Boutell, Jonathan M.
    Bryant, Jason
    Carter, Richard J.
    Cheetham, R. Keira
    Cox, Anthony J.
    Ellis, Darren J.
    Flatbush, Michael R.
    Gormley, Niall A.
    Humphray, Sean J.
    Irving, Leslie J.
    Karbelashvili, Mirian S.
    Kirk, Scott M.
    Li, Heng
    Liu, Xiaohai
    Maisinger, Klaus S.
    Murray, Lisa J.
    Obradovic, Bojan
    Ost, Tobias
    Parkinson, Michael L.
    Pratt, Mark R.
    Rasolonjatovo, Isabelle M. J.
    Reed, Mark T.
    Rigatti, Roberto
    Rodighiero, Chiara
    Ross, Mark T.
    Sabot, Andrea
    Sankar, Subramanian V.
    Scally, Aylwyn
    Schroth, Gary P.
    Smith, Mark E.
    Smith, Vincent P.
    Spiridou, Anastassia
    Torrance, Peta E.
    Tzonev, Svilen S.
    Vermaas, Eric H.
    Walter, Klaudia
    Wu, Xiaolin
    Zhang, Lu
    Alam, Mohammed D.
    Anastasi, Carole
    [J]. NATURE, 2008, 456 (7218) : 53 - 59
  • [5] Simultaneous Genotype Calling and Haplotype Phasing Improves Genotype Accuracy and Reduces False-Positive Associations for Genome-wide Association Studies
    Browning, Brian L.
    Yu, Zhaoxia
    [J]. AMERICAN JOURNAL OF HUMAN GENETICS, 2009, 85 (06) : 847 - 861
  • [6] Size matters: just how big is BIG? Quantifying realistic sample size requirements for human genome epidemiology
    Burton, Paul R.
    Hansell, Anna L.
    Fortier, Isabel
    Manolio, Teri A.
    Khoury, Muin J.
    Little, Julian
    Elliott, Paul
    [J]. INTERNATIONAL JOURNAL OF EPIDEMIOLOGY, 2009, 38 (01) : 263 - 273
  • [7] Human Genome Sequencing Using Unchained Base Reads on Self-Assembling DNA Nanoarrays
    Drmanac, Radoje
    Sparks, Andrew B.
    Callow, Matthew J.
    Halpern, Aaron L.
    Burns, Norman L.
    Kermani, Bahram G.
    Carnevali, Paolo
    Nazarenko, Igor
    Nilsen, Geoffrey B.
    Yeung, George
    Dahl, Fredrik
    Fernandez, Andres
    Staker, Bryan
    Pant, Krishna P.
    Baccash, Jonathan
    Borcherding, Adam P.
    Brownley, Anushka
    Cedeno, Ryan
    Chen, Linsu
    Chernikoff, Dan
    Cheung, Alex
    Chirita, Razvan
    Curson, Benjamin
    Ebert, Jessica C.
    Hacker, Coleen R.
    Hartlage, Robert
    Hauser, Brian
    Huang, Steve
    Jiang, Yuan
    Karpinchyk, Vitali
    Koenig, Mark
    Kong, Calvin
    Landers, Tom
    Le, Catherine
    Liu, Jia
    McBride, Celeste E.
    Morenzoni, Matt
    Morey, Robert E.
    Mutch, Karl
    Perazich, Helena
    Perry, Kimberly
    Peters, Brock A.
    Peterson, Joe
    Pethiyagoda, Charit L.
    Pothuraju, Kaliprasad
    Richter, Claudia
    Rosenbaum, Abraham M.
    Roy, Shaunak
    Shafto, Jay
    Sharanhovich, Uladzislau
    [J]. SCIENCE, 2010, 327 (5961) : 78 - 81
  • [8] Base-calling of automated sequencer traces using phred.: II.: Error probabilities
    Ewing, B
    Green, P
    [J]. GENOME RESEARCH, 1998, 8 (03): : 186 - 194
  • [9] The hierarchical hidden Markov model: Analysis and applications
    Fine, S
    Singer, Y
    Tishby, N
    [J]. MACHINE LEARNING, 1998, 32 (01) : 41 - 62
  • [10] A second generation human haplotype map of over 3.1 million SNPs
    Frazer, Kelly A.
    Ballinger, Dennis G.
    Cox, David R.
    Hinds, David A.
    Stuve, Laura L.
    Gibbs, Richard A.
    Belmont, John W.
    Boudreau, Andrew
    Hardenbol, Paul
    Leal, Suzanne M.
    Pasternak, Shiran
    Wheeler, David A.
    Willis, Thomas D.
    Yu, Fuli
    Yang, Huanming
    Zeng, Changqing
    Gao, Yang
    Hu, Haoran
    Hu, Weitao
    Li, Chaohua
    Lin, Wei
    Liu, Siqi
    Pan, Hao
    Tang, Xiaoli
    Wang, Jian
    Wang, Wei
    Yu, Jun
    Zhang, Bo
    Zhang, Qingrun
    Zhao, Hongbin
    Zhao, Hui
    Zhou, Jun
    Gabriel, Stacey B.
    Barry, Rachel
    Blumenstiel, Brendan
    Camargo, Amy
    Defelice, Matthew
    Faggart, Maura
    Goyette, Mary
    Gupta, Supriya
    Moore, Jamie
    Nguyen, Huy
    Onofrio, Robert C.
    Parkin, Melissa
    Roy, Jessica
    Stahl, Erich
    Winchester, Ellen
    Ziaugra, Liuda
    Altshuler, David
    Shen, Yan
    [J]. NATURE, 2007, 449 (7164) : 851 - U3