CNAseg-a novel framework for identification of copy number changes in cancer from second-generation sequencing data

被引：77

作者：

Ivakhno, Sergii ^{[1
,2
]}

Royce, Tom ^{[3
]}

Cox, Anthony J. ^{[2
]}

Evers, Dirk J. ^{[2
]}

Cheetham, R. Keira ^{[2
]}

Tavare, Simon ^{[1
]}

机构：

[1] Li Ka Shing Ctr, Canc Res UK Cambridge Res Inst, Cambridge CB2 0RE, England

[2] Illumina Cambridge, Saffron Walden CB10 1XL, England

[3] Illumina Inc, Corp Headquarters, San Diego, CA 92121 USA

来源：

BIOINFORMATICS | 2010年 / 26卷 / 24期

关键词：

REARRANGEMENTS;

D O I：

10.1093/bioinformatics/btq587

中图分类号：

Q5 [生物化学];

学科分类号：

071010 ; 081704 ;

摘要：

Motivation: Copy number abnormalities (CNAs) represent an important type of genetic mutation that can lead to abnormal cell growth and proliferation. New high-throughput sequencing technologies promise comprehensive characterization of CNAs. In contrast to microarrays, where probe design follows a carefully developed protocol, reads represent a random sample from a library and may be prone to representation biases due to GC content and other factors. The discrimination between true and false positive CNAs becomes an important issue. Results: We present a novel approach, called CNAseg, to identify CNAs from second-generation sequencing data. It uses depth of coverage to estimate copy number states and flowcell-to-flowcell variability in cancer and normal samples to control the false positive rate. We tested the method using the COLO-829 melanoma cell line sequenced to 40-fold coverage. An extensive simulation scheme was developed to recreate different scenarios of copy number changes and depth of coverage by altering a real dataset with spiked-in CNAs. Comparison to alternative approaches using both real and simulated datasets showed that CNAseg achieves superior precision and improved sensitivity estimates.

引用

页码：3051 / 3058

页数：8

共 34 条

[11]

HAMPTON O, 2007, GENOME RES, V19, P167

[12] Whole-genome sequencing and variant discovery in C-elegans [J].

Hillier, LaDeana W. ;

Marth, Gabor T. ;

Quinlan, Aaron R. ;

Dooling, David ;

Fewell, Ginger ;

Barnett, Derek ;

Fox, Paul ;

Glasscock, Jarret I. ;

Hickenbotham, Matthew ;

Huang, Weichun ;

Magrini, Vincent J. ;

Richt, Ryan J. ;

Sander, Sacha N. ;

Stewart, Donald A. ;

Stromberg, Michael ;

Tsung, Eric F. ;

Wylie, Todd ;

Schedl, Tim ;

Wilson, Richard K. ;

Mardis, Elaine R. .

NATURE METHODS, 2008, 5 (02) :183-188

[13] Combinatorial algorithms for structural variation detection in high-throughput sequenced genomes [J].

Hormozdiari, Fereydoun ;

Alkan, Can ;

Eichler, Evan E. ;

Sahinalp, S. Cenk .

GENOME RESEARCH, 2009, 19 (07) :1270-1278

[14]

ILLUMINA LTD, 2009, COMPLETE SECONDARY A

[15] Bayesian analysis of the differences of count data [J].

Karlis, D ;

Ntzoufras, I .

STATISTICS IN MEDICINE, 2006, 25 (11) :1885-1905

[16] A robust framework for detecting structural variations in a genome [J].

Lee, Seunghak ;

Cheran, Elango ;

Brudno, Michael .

BIOINFORMATICS, 2008, 24 (13) :I59-I67

[17] MoDIL: detecting small indels from clone-end sequencing with mixtures of distributions [J].

Lee, Seunghak ;

Hormozdiari, Fereydoun ;

Alkan, Can ;

Brudno, Michael .

NATURE METHODS, 2009, 6 (07) :473-474

[18] The Sequence Alignment/Map format and SAMtools [J].

Li, Heng ;

Handsaker, Bob ;

Wysoker, Alec ;

Fennell, Tim ;

Ruan, Jue ;

Homer, Nils ;

Marth, Gabor ;

Abecasis, Goncalo ;

Durbin, Richard .

BIOINFORMATICS, 2009, 25 (16) :2078-2079

[19] Fast and accurate short read alignment with Burrows-Wheeler transform [J].

Li, Heng ;

Durbin, Richard .

BIOINFORMATICS, 2009, 25 (14) :1754-1760

[20] AN EXAMINATION OF PROCEDURES FOR DETERMINING THE NUMBER OF CLUSTERS IN A DATA SET [J].

MILLIGAN, GW ;

COOPER, MC .

PSYCHOMETRIKA, 1985, 50 (02) :159-179

← 1 2 3 4 →