Quantitative group testing-based overlapping pool sequencing to identify rare variant carriers

被引:12
作者
Cao, Chang-Chang [1 ]
Li, Cheng [1 ]
Sun, Xiao [1 ]
机构
[1] Southeast Univ, Sch Biol Sci & Med Engn, State Key Lab Bioelect, Nanjing, Jiangsu, Peoples R China
来源
BMC BIOINFORMATICS | 2014年 / 15卷
基金
中国国家自然科学基金;
关键词
Quantitative group testing; Random k-set pool design; Overlapping pool sequencing; Rare variants; IDENTIFICATION; DESIGNS;
D O I
10.1186/1471-2105-15-195
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Background: Genome-wide association studies have revealed that rare variants are responsible for a large portion of the heritability of some complex human diseases. This highlights the increasing importance of detecting and screening for rare variants. Although the massively parallel sequencing technologies have greatly reduced the cost of DNA sequencing, the identification of rare variant carriers by large-scale re-sequencing remains prohibitively expensive because of the huge challenge of constructing libraries for thousands of samples. Recently, several studies have reported that techniques from group testing theory and compressed sensing could help identify rare variant carriers in large-scale samples with few pooled sequencing experiments and a dramatically reduced cost. Results: Based on quantitative group testing, we propose an efficient overlapping pool sequencing strategy that allows the efficient recovery of variant carriers in numerous individuals with much lower costs than conventional methods. We used random k-set pool designs to mix samples, and optimized the design parameters according to an indicative probability. Based on a mathematical model of sequencing depth distribution, an optimal threshold was selected to declare a pool positive or negative. Then, using the quantitative information contained in the sequencing results, we designed a heuristic Bayesian probability decoding algorithm to identify variant carriers. Finally, we conducted in silico experiments to find variant carriers among 200 simulated Escherichia coli strains. With the simulated pools and publicly available Illumina sequencing data, our method correctly identified the variant carriers for 91.5-97.9% variants with the variant frequency ranging from 0.5 to 1.5%. Conclusions: Using the number of reads, variant carriers could be identified precisely even though samples were randomly selected and pooled. Our method performed better than the published DNA Sudoku design and compressed sequencing, especially in reducing the required data throughput and cost.
引用
收藏
页数:14
相关论文
共 26 条
  • [1] Anders S., 2010, GENOME BIOL, V11, pR106, DOI [10.1186/gb-2010-11-10-r106, DOI 10.1186/gb-2010-11-10-r106]
  • [2] [Anonymous], APPL MATH SERICES
  • [3] THEORETICAL-ANALYSIS OF LIBRARY SCREENING USING A N-DIMENSIONAL POOLING STRATEGY
    BARILLOT, E
    LACROIX, B
    COHEN, D
    [J]. NUCLEIC ACIDS RESEARCH, 1991, 19 (22) : 6241 - 6247
  • [4] Common and rare variants in multifactorial susceptibility to common diseases
    Bodmer, Walter
    Bonilla, Carolina
    [J]. NATURE GENETICS, 2008, 40 (06) : 695 - 701
  • [5] EFFICIENT POOLING DESIGNS FOR LIBRARY SCREENING
    BRUNO, WJ
    KNILL, E
    BALDING, DJ
    BRUCE, DC
    DOGGETT, NA
    SAWHILL, WW
    STALLINGS, RL
    WHITTAKER, CC
    TORNEY, DC
    [J]. GENOMICS, 1995, 26 (01) : 21 - 30
  • [6] Stable signal recovery from incomplete and inaccurate measurements
    Candes, Emmanuel J.
    Romberg, Justin K.
    Tao, Terence
    [J]. COMMUNICATIONS ON PURE AND APPLIED MATHEMATICS, 2006, 59 (08) : 1207 - 1223
  • [7] Identifying Rare Variants With Optimal Depth of Coverage and Cost-Effective Overlapping Pool Sequencing
    Cao, Chang-Chang
    Li, Cheng
    Huang, Zheng
    Ma, Xin
    Sun, Xiao
    [J]. GENETIC EPIDEMIOLOGY, 2013, 37 (08) : 820 - 830
  • [8] Clarke J, 2009, NAT NANOTECHNOL, V4, P265, DOI [10.1038/NNANO.2009.12, 10.1038/nnano.2009.12]
  • [9] Compressed sensing
    Donoho, DL
    [J]. IEEE TRANSACTIONS ON INFORMATION THEORY, 2006, 52 (04) : 1289 - 1306
  • [10] Real-Time DNA Sequencing from Single Polymerase Molecules
    Eid, John
    Fehr, Adrian
    Gray, Jeremy
    Luong, Khai
    Lyle, John
    Otto, Geoff
    Peluso, Paul
    Rank, David
    Baybayan, Primo
    Bettman, Brad
    Bibillo, Arkadiusz
    Bjornson, Keith
    Chaudhuri, Bidhan
    Christians, Frederick
    Cicero, Ronald
    Clark, Sonya
    Dalal, Ravindra
    deWinter, Alex
    Dixon, John
    Foquet, Mathieu
    Gaertner, Alfred
    Hardenbol, Paul
    Heiner, Cheryl
    Hester, Kevin
    Holden, David
    Kearns, Gregory
    Kong, Xiangxu
    Kuse, Ronald
    Lacroix, Yves
    Lin, Steven
    Lundquist, Paul
    Ma, Congcong
    Marks, Patrick
    Maxham, Mark
    Murphy, Devon
    Park, Insil
    Pham, Thang
    Phillips, Michael
    Roy, Joy
    Sebra, Robert
    Shen, Gene
    Sorenson, Jon
    Tomaney, Austin
    Travers, Kevin
    Trulson, Mark
    Vieceli, John
    Wegener, Jeffrey
    Wu, Dawn
    Yang, Alicia
    Zaccarin, Denis
    [J]. SCIENCE, 2009, 323 (5910) : 133 - 138