An all-statistics, high-speed algorithm for the analysis of copy number variation in genomes

被引:4
作者
Chen, Chih-Hao [1 ,2 ]
Lee, Hsing-Chung [1 ,3 ]
Ling, Qingdong [1 ,2 ]
Chen, Hsiao-Rong [1 ]
Ko, Yi-An [1 ]
Tsou, Tsong-Shan [1 ,2 ,4 ]
Wang, Sun-Chong [1 ]
Wu, Li-Ching [1 ]
Lee, H. C. [1 ,2 ,5 ,6 ]
机构
[1] Natl Cent Univ, Grad Inst Syst Biol & Bioinformat, Chungli 32001, Taiwan
[2] Cathay Gen Hosp, Cathay Med Res Inst, Taipei 10630, Taiwan
[3] Cathay Gen Hosp, Dept Surg, Taipei 10630, Taiwan
[4] Natl Cent Univ, Grad Inst Stat, Chungli 32001, Taiwan
[5] Natl Cent Univ, Dept Phys, Chungli 32001, Taiwan
[6] Natl Ctr Theoret Sci, Shinchu 30043, Taiwan
关键词
ARRAY-CGH DATA; CIRCULAR BINARY SEGMENTATION; OLIGONUCLEOTIDE MICROARRAY; HYBRIDIZATION DATA; ACCURATE;
D O I
10.1093/nar/gkr137
中图分类号
Q5 [生物化学]; Q7 [分子生物学];
学科分类号
071010 ; 081704 ;
摘要
Detection of copy number variation (CNV) in DNA has recently become an important method for understanding the pathogenesis of cancer. While existing algorithms for extracting CNV from microarray data have worked reasonably well, the trend towards ever larger sample sizes and higher resolution microarrays has vastly increased the challenges they face. Here, we present Segmentation analysis of DNA (SAD), a clustering algorithm constructed with a strategy in which all operational decisions are based on simple and rigorous applications of statistical principles, measurement theory and precise mathematical relations. Compared with existing packages, SAD is simpler in formulation, more user friendly, much faster and less thirsty for memory, offers higher accuracy and supplies quantitative statistics for its predictions. Unique among such algorithms, SAD's running time scales linearly with array size; on a typical modern notebook, it completes high-quality CNV analyses for a 250 thousand-probe array in similar to 1 s and a 1.8 million-probe array in similar to 8 s.
引用
收藏
页数:7
相关论文
共 26 条
[1]   A robust statistical method for case-control association testing with copy number variation [J].
Barnes, Chris ;
Plagnol, Vincent ;
Fitzgerald, Tomas ;
Redon, Richard ;
Marchini, Jonathan ;
Clayton, David ;
Hurles, Matthew E. .
NATURE GENETICS, 2008, 40 (10) :1245-1252
[2]   High-resolution global profiling of genomic alterations with long oligonucleotide microarray [J].
Brennan, C ;
Zhang, YY ;
Leo, C ;
Feng, B ;
Cauwels, C ;
Aguirre, AJ ;
Kim, MJ ;
Protopopov, A ;
Chin, L .
CANCER RESEARCH, 2004, 64 (14) :4744-4748
[3]   Quantile smoothing of array CGH data [J].
Eilers, PHC ;
de Menezes, RX .
BIOINFORMATICS, 2005, 21 (07) :1146-1153
[4]   Accurate and reliable high-throughput detection of copy number variation in the human genome [J].
Fiegler, Heike ;
Redon, Richard ;
Andrews, Dan ;
Scott, Carol ;
Andrews, Robert ;
Carder, Carol ;
Clark, Richard ;
Dovey, Oliver ;
Ellis, Peter ;
Feuk, Lars ;
French, Lisa ;
Hunt, Paul ;
Kalaitzopoulos, Dimitrios ;
Larkin, James ;
Montgomery, Lyndal ;
Perry, George H. ;
Plumb, Bob W. ;
Porter, Keith ;
Rigby, Rachel E. ;
Rigler, Diane ;
Valsesia, Armand ;
Langford, Cordelia ;
Humphray, Sean J. ;
Scherer, Stephen W. ;
Lee, Charles ;
Hurles, Matthew E. ;
Carter, Nigel P. .
GENOME RESEARCH, 2006, 16 (12) :1566-1574
[5]   Hidden Markov models approach to the analysis of array CGH data [J].
Fridlyand, J ;
Snijders, AM ;
Pinkel, D ;
Albertson, DG ;
Jain, AN .
JOURNAL OF MULTIVARIATE ANALYSIS, 2004, 90 (01) :132-153
[6]   Denoising array-based comparative genomic hybridization data using wavelets [J].
Hsu, L ;
Self, SG ;
Grove, D ;
Randolph, T ;
Wang, K ;
Delrow, JJ ;
Loo, L ;
Porter, P .
BIOSTATISTICS, 2005, 6 (02) :211-226
[7]   Analysis of array CGH data:: from signal ratio to gain and loss of DNA regions [J].
Hupé, P ;
Stransky, N ;
Thiery, JP ;
Radvanyi, F ;
Barillot, E .
BIOINFORMATICS, 2004, 20 (18) :3413-3422
[8]   A tiling resolution DNA microarray with complete coverage of the human genome [J].
Ishkanian, AS ;
Malloff, CA ;
Watson, SK ;
deLeeuw, RJ ;
Chi, B ;
Coe, BP ;
Snijders, A ;
Albertson, DG ;
Pinkel, D ;
Marra, MA ;
Ling, V ;
MacAulay, C ;
Lam, WL .
NATURE GENETICS, 2004, 36 (03) :299-303
[9]  
Jong K, 2003, LECT NOTES COMPUT SC, V2611, P54
[10]   Comparative analysis of algorithms for identifying amplifications and deletions in array CGH data [J].
Lai, WR ;
Johnson, MD ;
Kucherlapati, R ;
Park, PJ .
BIOINFORMATICS, 2005, 21 (19) :3763-3770