A Multi-Sample Based Method for Identifying Common CNVs in Normal Human Genomic Structure Using High-Resolution aCGH Data

被引：6

作者：

Park, Chihyun ^{[1
]}

Ahn, Jaegyoon ^{[1
]}

Yoon, Youngmi ^{[2
]}

Park, Sanghyun ^{[1
]}

机构：

[1] Yonsei Univ, Dept Comp Sci, Seoul 120749, South Korea

[2] Gachon Univ Med & Sci, Div Informat Engn, Inchon, South Korea

来源：

PLOS ONE | 2011年 / 6卷 / 10期

基金：

新加坡国家研究基金会;

关键词：

ARRAY CGH DATA; COPY-NUMBER VARIATION; HIDDEN MARKOV MODEL; GENE-EXPRESSION; SEGMENTATION; ABERRATIONS; SCALE; IDENTIFICATION; ALGORITHMS; DISCOVERY;

D O I：

10.1371/journal.pone.0026975

中图分类号：

O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];

学科分类号：

07 ; 0710 ; 09 ;

摘要：

Background: It is difficult to identify copy number variations (CNV) in normal human genomic data due to noise and nonlinear relationships between different genomic regions and signal intensity. A high-resolution array comparative genomic hybridization (aCGH) containing 42 million probes, which is very large compared to previous arrays, was recently published. Most existing CNV detection algorithms do not work well because of noise associated with the large amount of input data and because most of the current methods were not designed to analyze normal human samples. Normal human genome analysis often requires a joint approach across multiple samples. However, the majority of existing methods can only identify CNVs from a single sample. Methodology and Principal Findings: We developed a multi-sample-based genomic variations detector (MGVD) that uses segmentation to identify common breakpoints across multiple samples and a k-means-based clustering strategy. Unlike previous methods, MGVD simultaneously considers multiple samples with different genomic intensities and identifies CNVs and CNV zones (CNVZs); CNVZ is a more precise measure of the location of a genomic variant than the CNV region (CNVR). Conclusions and Significance: We designed a specialized algorithm to detect common CNVs from extremely high-resolution multi-sample aCGH data. MGVD showed high sensitivity and a low false discovery rate for a simulated data set, and outperformed most current methods when real, high-resolution HapMap datasets were analyzed. MGVD also had the fastest runtime compared to the other algorithms evaluated when actual, high-resolution aCGH data were analyzed. The CNVZs identified by MGVD can be used in association studies for revealing relationships between phenotypes and genomic aberrations. Our algorithm was developed with standard C++ and is available in Linux and MS Windows format in the STL library. It is freely available at: http://embio.yonsei.ac.kr/similar to Park/mgvd.php.

引用

页数：15

共 40 条

[1] Personalized copy number and segmental duplication maps using next-generation sequencing [J].

Alkan, Can ;

Kidd, Jeffrey M. ;

Marques-Bonet, Tomas ;

Aksay, Gozde ;

Antonacci, Francesca ;

Hormozdiari, Fereydoun ;

Kitzman, Jacob O. ;

Baker, Carl ;

Malig, Maika ;

Mutlu, Onur ;

Sahinalp, S. Cenk ;

Gibbs, Richard A. ;

Eichler, Evan E. .

NATURE GENETICS, 2009, 41 (10) :1061-U29

[2] A map of human genome variation from population-scale sequencing [J].

Altshuler, David ;

Durbin, Richard M. ;

Abecasis, Goncalo R. ;

Bentley, David R. ;

Chakravarti, Aravinda ;

Clark, Andrew G. ;

Collins, Francis S. ;

De la Vega, Francisco M. ;

Donnelly, Peter ;

Egholm, Michael ;

Flicek, Paul ;

Gabriel, Stacey B. ;

Gibbs, Richard A. ;

Knoppers, Bartha M. ;

Lander, Eric S. ;

Lehrach, Hans ;

Mardis, Elaine R. ;

McVean, Gil A. ;

Nickerson, DebbieA. ;

Peltonen, Leena ;

Schafer, Alan J. ;

Sherry, Stephen T. ;

Wang, Jun ;

Wilson, Richard K. ;

Gibbs, Richard A. ;

Deiros, David ;

Metzker, Mike ;

Muzny, Donna ;

Reid, Jeff ;

Wheeler, David ;

Wang, Jun ;

Li, Jingxiang ;

Jian, Min ;

Li, Guoqing ;

Li, Ruiqiang ;

Liang, Huiqing ;

Tian, Geng ;

Wang, Bo ;

Wang, Jian ;

Wang, Wei ;

Yang, Huanming ;

Zhang, Xiuqing ;

Zheng, Huisong ;

Lander, Eric S. ;

Altshuler, David L. ;

Ambrogio, Lauren ;

Bloom, Toby ;

Cibulskis, Kristian ;

Fennell, Tim J. ;

Gabriel, Stacey B. .

NATURE, 2010, 467 (7319) :1061-1073

[3] A fast and flexible method for the segmentation of aCGH data [J].

Ben-Yaacov, Erez ;

Eldar, Yonina C. .

BIOINFORMATICS, 2008, 24 (16) :I139-I145

[4] Jointly analyzing gene expression and copy number data in breast cancer using data reduction models [J].

Berger, JA ;

Hautaniemi, S ;

Mitra, SK ;

Astola, J .

IEEE-ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS, 2006, 3 (01) :2-16

[5]

BLEAKLEY K, 2009, JOINT SEGMENTATION M

[6] Methods and strategies for analyzing copy number variation using DNA microarrays [J].

Carter, Nigel P. .

NATURE GENETICS, 2007, 39 (Suppl 7) :S16-S21

[7] Origins and functional impact of copy number variation in the human genome [J].

Conrad, Donald F. ;

Pinto, Dalila ;

Redon, Richard ;

Feuk, Lars ;

Gokcumen, Omer ;

Zhang, Yujun ;

Aerts, Jan ;

Andrews, T. Daniel ;

Barnes, Chris ;

Campbell, Peter ;

Fitzgerald, Tomas ;

Hu, Min ;

Ihm, Chun Hwa ;

Kristiansson, Kati ;

MacArthur, Daniel G. ;

MacDonald, Jeffrey R. ;

Onyiah, Ifejinelo ;

Pang, Andy Wing Chun ;

Robson, Sam ;

Stirrups, Kathy ;

Valsesia, Armand ;

Walter, Klaudia ;

Wei, John ;

Tyler-Smith, Chris ;

Carter, Nigel P. ;

Lee, Charles ;

Scherer, Stephen W. ;

Hurles, Matthew E. .

NATURE, 2010, 464 (7289) :704-712

[8] STAC: A method for testing the significance of DNA copy number aberrations across multiple array-CGH experiments [J].

Diskin, Sharon J. ;

Eck, Thomas ;

Greshock, Joel ;

Mosse, Yael P. ;

Naylor, Tara ;

Stoeckert, Christian J., Jr. ;

Weber, Barbara L. ;

Maris, John M. ;

Grant, Gregory R. .

GENOME RESEARCH, 2006, 16 (09) :1149-1158

[9] Quantile smoothing of array CGH data [J].

Eilers, PHC ;

de Menezes, RX .

BIOINFORMATICS, 2005, 21 (07) :1146-1153

[10] Next-generation VariationHunter: combinatorial algorithms for transposon insertion discovery [J].

Hormozdiari, Fereydoun ;

Hajirasouliha, Iman ;

Dao, Phuong ;

Hach, Faraz ;

Yorukoglu, Deniz ;

Alkan, Can ;

Eichler, Evan E. ;

Sahinalp, S. Cenk .

BIOINFORMATICS, 2010, 26 (12) :i350-i357

← 1 2 3 4 →