Integrating copy number polymorphisms into array CGH analysis using a robust HMM

被引:119
作者
Shah, Sohrab P.
Xuan, Xiang
DeLeeuw, Ron J.
Khojasteh, Mehrnoush
Lam, Wan L.
Ng, Raymond
Murphy, Kevin P.
机构
[1] Univ British Columbia, Dept Comp Sci, Vancouver, BC V6T 1Z4, Canada
[2] British Columbia Canc Res Ctr, Vancouver, BC V5Z 1L3, Canada
关键词
D O I
10.1093/bioinformatics/btl238
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Motivation: Array comparative genomic hybridization (aCGH) is a pervasive technique used to identify chromosomal aberrations in human diseases, including cancer. Aberrations are defined as regions of increased or decreased DNA copy number, relative to a normal sample. Accurately identifying the locations of these aberrations has many important medical applications. Unfortunately, the observed copy number changes are often corrupted by various sources of noise, making the boundaries hard to detect. One popular current technique uses hidden Markov models (HMMs) to divide the signal into regions of constant copy number called segments; a subsequent classification phase labels each segment as a gain, a loss or neutral. Unfortunately, standard HMMs are sensitive to outliers, causing over-segmentation, where segments erroneously span very short regions. Results: We propose a simple modification that makes the HMM robust to such outliers. More importantly, this modification allows us to exploit prior knowledge about the likely location of "outliers", which are often due to copy number polymorphisms (CNPs). By "explaining away" these outliers with prior knowledge about the locations of CNPs, we can focus attention on the more clinically relevant aberrated regions. We show significant improvements over the current state of the art technique (DNAcopy with MergeLevels) on previously published data from mantle cell lymphoma cell lines, and on published benchmark synthetic data augmented with outliers. Availability: Source code written in Matlab is available from http://www.cs.ubc.ca/similar to sshah/acgh.
引用
收藏
页码:E431 / E439
页数:9
相关论文
共 24 条
[1]  
[Anonymous], 2021, Bayesian Data Analysis
[2]  
BROET P, 2006, BIOINOFRMATICS
[3]   Copy-number polymorphisms:: mining the tip of an iceberg [J].
Buckley, PG ;
Mantripragada, KK ;
Piotrowski, A ;
de Ståhl, TD ;
Dumanski, JP .
TRENDS IN GENETICS, 2005, 21 (06) :315-317
[4]   A high-resolution survey of deletion polymorphism in the human genome [J].
Conrad, DF ;
Andrews, TD ;
Carter, NP ;
Hurles, ME ;
Pritchard, JK .
NATURE GENETICS, 2006, 38 (01) :75-81
[5]   Comprehensive whole genome array CGH profiling of mantle cell lymphoma model genomes [J].
de Leeuw, RJ ;
Davies, JJ ;
Rosenwald, A ;
Bebb, G ;
Gascoyne, RD ;
Dyer, MJS ;
Staudt, LM ;
Martinez-Climent, JA ;
Lam, WL .
HUMAN MOLECULAR GENETICS, 2004, 13 (17) :1827-1837
[6]   Quantile smoothing of array CGH data [J].
Eilers, PHC ;
de Menezes, RX .
BIOINFORMATICS, 2005, 21 (07) :1146-1153
[7]  
ENGLER DA, 2006, BIOSTATISTICS
[8]   Hidden Markov models approach to the analysis of array CGH data [J].
Fridlyand, J ;
Snijders, AM ;
Pinkel, D ;
Albertson, DG ;
Jain, AN .
JOURNAL OF MULTIVARIATE ANALYSIS, 2004, 90 (01) :132-153
[9]  
GUHA S, 2006, BAYESIAN HIDDEN MARK
[10]   Denoising array-based comparative genomic hybridization data using wavelets [J].
Hsu, L ;
Self, SG ;
Grove, D ;
Randolph, T ;
Wang, K ;
Delrow, JJ ;
Loo, L ;
Porter, P .
BIOSTATISTICS, 2005, 6 (02) :211-226