CNAseg-a novel framework for identification of copy number changes in cancer from second-generation sequencing data

被引:77
作者
Ivakhno, Sergii [1 ,2 ]
Royce, Tom [3 ]
Cox, Anthony J. [2 ]
Evers, Dirk J. [2 ]
Cheetham, R. Keira [2 ]
Tavare, Simon [1 ]
机构
[1] Li Ka Shing Ctr, Canc Res UK Cambridge Res Inst, Cambridge CB2 0RE, England
[2] Illumina Cambridge, Saffron Walden CB10 1XL, England
[3] Illumina Inc, Corp Headquarters, San Diego, CA 92121 USA
关键词
REARRANGEMENTS;
D O I
10.1093/bioinformatics/btq587
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Motivation: Copy number abnormalities (CNAs) represent an important type of genetic mutation that can lead to abnormal cell growth and proliferation. New high-throughput sequencing technologies promise comprehensive characterization of CNAs. In contrast to microarrays, where probe design follows a carefully developed protocol, reads represent a random sample from a library and may be prone to representation biases due to GC content and other factors. The discrimination between true and false positive CNAs becomes an important issue. Results: We present a novel approach, called CNAseg, to identify CNAs from second-generation sequencing data. It uses depth of coverage to estimate copy number states and flowcell-to-flowcell variability in cancer and normal samples to control the false positive rate. We tested the method using the COLO-829 melanoma cell line sequenced to 40-fold coverage. An extensive simulation scheme was developed to recreate different scenarios of copy number changes and depth of coverage by altering a real dataset with spiked-in CNAs. Comparison to alternative approaches using both real and simulated datasets showed that CNAseg achieves superior precision and improved sensitivity estimates.
引用
收藏
页码:3051 / 3058
页数:8
相关论文
共 34 条
[11]  
HAMPTON O, 2007, GENOME RES, V19, P167
[12]   Whole-genome sequencing and variant discovery in C-elegans [J].
Hillier, LaDeana W. ;
Marth, Gabor T. ;
Quinlan, Aaron R. ;
Dooling, David ;
Fewell, Ginger ;
Barnett, Derek ;
Fox, Paul ;
Glasscock, Jarret I. ;
Hickenbotham, Matthew ;
Huang, Weichun ;
Magrini, Vincent J. ;
Richt, Ryan J. ;
Sander, Sacha N. ;
Stewart, Donald A. ;
Stromberg, Michael ;
Tsung, Eric F. ;
Wylie, Todd ;
Schedl, Tim ;
Wilson, Richard K. ;
Mardis, Elaine R. .
NATURE METHODS, 2008, 5 (02) :183-188
[13]   Combinatorial algorithms for structural variation detection in high-throughput sequenced genomes [J].
Hormozdiari, Fereydoun ;
Alkan, Can ;
Eichler, Evan E. ;
Sahinalp, S. Cenk .
GENOME RESEARCH, 2009, 19 (07) :1270-1278
[14]  
ILLUMINA LTD, 2009, COMPLETE SECONDARY A
[15]   Bayesian analysis of the differences of count data [J].
Karlis, D ;
Ntzoufras, I .
STATISTICS IN MEDICINE, 2006, 25 (11) :1885-1905
[16]   A robust framework for detecting structural variations in a genome [J].
Lee, Seunghak ;
Cheran, Elango ;
Brudno, Michael .
BIOINFORMATICS, 2008, 24 (13) :I59-I67
[17]   MoDIL: detecting small indels from clone-end sequencing with mixtures of distributions [J].
Lee, Seunghak ;
Hormozdiari, Fereydoun ;
Alkan, Can ;
Brudno, Michael .
NATURE METHODS, 2009, 6 (07) :473-474
[18]   The Sequence Alignment/Map format and SAMtools [J].
Li, Heng ;
Handsaker, Bob ;
Wysoker, Alec ;
Fennell, Tim ;
Ruan, Jue ;
Homer, Nils ;
Marth, Gabor ;
Abecasis, Goncalo ;
Durbin, Richard .
BIOINFORMATICS, 2009, 25 (16) :2078-2079
[19]   Fast and accurate short read alignment with Burrows-Wheeler transform [J].
Li, Heng ;
Durbin, Richard .
BIOINFORMATICS, 2009, 25 (14) :1754-1760
[20]   AN EXAMINATION OF PROCEDURES FOR DETERMINING THE NUMBER OF CLUSTERS IN A DATA SET [J].
MILLIGAN, GW ;
COOPER, MC .
PSYCHOMETRIKA, 1985, 50 (02) :159-179