A Multi-Sample Based Method for Identifying Common CNVs in Normal Human Genomic Structure Using High-Resolution aCGH Data

被引:5
作者
Park, Chihyun [1 ]
Ahn, Jaegyoon [1 ]
Yoon, Youngmi [2 ]
Park, Sanghyun [1 ]
机构
[1] Yonsei Univ, Dept Comp Sci, Seoul 120749, South Korea
[2] Gachon Univ Med & Sci, Div Informat Engn, Inchon, South Korea
来源
PLOS ONE | 2011年 / 6卷 / 10期
基金
新加坡国家研究基金会;
关键词
ARRAY CGH DATA; COPY-NUMBER VARIATION; HIDDEN MARKOV MODEL; GENE-EXPRESSION; SEGMENTATION; ABERRATIONS; SCALE; IDENTIFICATION; ALGORITHMS; DISCOVERY;
D O I
10.1371/journal.pone.0026975
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
07 ; 0710 ; 09 ;
摘要
Background: It is difficult to identify copy number variations (CNV) in normal human genomic data due to noise and nonlinear relationships between different genomic regions and signal intensity. A high-resolution array comparative genomic hybridization (aCGH) containing 42 million probes, which is very large compared to previous arrays, was recently published. Most existing CNV detection algorithms do not work well because of noise associated with the large amount of input data and because most of the current methods were not designed to analyze normal human samples. Normal human genome analysis often requires a joint approach across multiple samples. However, the majority of existing methods can only identify CNVs from a single sample. Methodology and Principal Findings: We developed a multi-sample-based genomic variations detector (MGVD) that uses segmentation to identify common breakpoints across multiple samples and a k-means-based clustering strategy. Unlike previous methods, MGVD simultaneously considers multiple samples with different genomic intensities and identifies CNVs and CNV zones (CNVZs); CNVZ is a more precise measure of the location of a genomic variant than the CNV region (CNVR). Conclusions and Significance: We designed a specialized algorithm to detect common CNVs from extremely high-resolution multi-sample aCGH data. MGVD showed high sensitivity and a low false discovery rate for a simulated data set, and outperformed most current methods when real, high-resolution HapMap datasets were analyzed. MGVD also had the fastest runtime compared to the other algorithms evaluated when actual, high-resolution aCGH data were analyzed. The CNVZs identified by MGVD can be used in association studies for revealing relationships between phenotypes and genomic aberrations. Our algorithm was developed with standard C++ and is available in Linux and MS Windows format in the STL library. It is freely available at: http://embio.yonsei.ac.kr/similar to Park/mgvd.php.
引用
收藏
页数:15
相关论文
共 40 条
  • [1] Personalized copy number and segmental duplication maps using next-generation sequencing
    Alkan, Can
    Kidd, Jeffrey M.
    Marques-Bonet, Tomas
    Aksay, Gozde
    Antonacci, Francesca
    Hormozdiari, Fereydoun
    Kitzman, Jacob O.
    Baker, Carl
    Malig, Maika
    Mutlu, Onur
    Sahinalp, S. Cenk
    Gibbs, Richard A.
    Eichler, Evan E.
    [J]. NATURE GENETICS, 2009, 41 (10) : 1061 - U29
  • [2] A map of human genome variation from population-scale sequencing
    Altshuler, David
    Durbin, Richard M.
    Abecasis, Goncalo R.
    Bentley, David R.
    Chakravarti, Aravinda
    Clark, Andrew G.
    Collins, Francis S.
    De la Vega, Francisco M.
    Donnelly, Peter
    Egholm, Michael
    Flicek, Paul
    Gabriel, Stacey B.
    Gibbs, Richard A.
    Knoppers, Bartha M.
    Lander, Eric S.
    Lehrach, Hans
    Mardis, Elaine R.
    McVean, Gil A.
    Nickerson, DebbieA.
    Peltonen, Leena
    Schafer, Alan J.
    Sherry, Stephen T.
    Wang, Jun
    Wilson, Richard K.
    Gibbs, Richard A.
    Deiros, David
    Metzker, Mike
    Muzny, Donna
    Reid, Jeff
    Wheeler, David
    Wang, Jun
    Li, Jingxiang
    Jian, Min
    Li, Guoqing
    Li, Ruiqiang
    Liang, Huiqing
    Tian, Geng
    Wang, Bo
    Wang, Jian
    Wang, Wei
    Yang, Huanming
    Zhang, Xiuqing
    Zheng, Huisong
    Lander, Eric S.
    Altshuler, David L.
    Ambrogio, Lauren
    Bloom, Toby
    Cibulskis, Kristian
    Fennell, Tim J.
    Gabriel, Stacey B.
    [J]. NATURE, 2010, 467 (7319) : 1061 - 1073
  • [3] A fast and flexible method for the segmentation of aCGH data
    Ben-Yaacov, Erez
    Eldar, Yonina C.
    [J]. BIOINFORMATICS, 2008, 24 (16) : I139 - I145
  • [4] Jointly analyzing gene expression and copy number data in breast cancer using data reduction models
    Berger, JA
    Hautaniemi, S
    Mitra, SK
    Astola, J
    [J]. IEEE-ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS, 2006, 3 (01) : 2 - 16
  • [5] BLEAKLEY K, 2009, JOINT SEGMENTATION M
  • [6] Methods and strategies for analyzing copy number variation using DNA microarrays
    Carter, Nigel P.
    [J]. NATURE GENETICS, 2007, 39 (Suppl 7) : S16 - S21
  • [7] Origins and functional impact of copy number variation in the human genome
    Conrad, Donald F.
    Pinto, Dalila
    Redon, Richard
    Feuk, Lars
    Gokcumen, Omer
    Zhang, Yujun
    Aerts, Jan
    Andrews, T. Daniel
    Barnes, Chris
    Campbell, Peter
    Fitzgerald, Tomas
    Hu, Min
    Ihm, Chun Hwa
    Kristiansson, Kati
    MacArthur, Daniel G.
    MacDonald, Jeffrey R.
    Onyiah, Ifejinelo
    Pang, Andy Wing Chun
    Robson, Sam
    Stirrups, Kathy
    Valsesia, Armand
    Walter, Klaudia
    Wei, John
    Tyler-Smith, Chris
    Carter, Nigel P.
    Lee, Charles
    Scherer, Stephen W.
    Hurles, Matthew E.
    [J]. NATURE, 2010, 464 (7289) : 704 - 712
  • [8] STAC: A method for testing the significance of DNA copy number aberrations across multiple array-CGH experiments
    Diskin, Sharon J.
    Eck, Thomas
    Greshock, Joel
    Mosse, Yael P.
    Naylor, Tara
    Stoeckert, Christian J., Jr.
    Weber, Barbara L.
    Maris, John M.
    Grant, Gregory R.
    [J]. GENOME RESEARCH, 2006, 16 (09) : 1149 - 1158
  • [9] Quantile smoothing of array CGH data
    Eilers, PHC
    de Menezes, RX
    [J]. BIOINFORMATICS, 2005, 21 (07) : 1146 - 1153
  • [10] Next-generation VariationHunter: combinatorial algorithms for transposon insertion discovery
    Hormozdiari, Fereydoun
    Hajirasouliha, Iman
    Dao, Phuong
    Hach, Faraz
    Yorukoglu, Deniz
    Alkan, Can
    Eichler, Evan E.
    Sahinalp, S. Cenk
    [J]. BIOINFORMATICS, 2010, 26 (12) : i350 - i357