Population-based change-point detection for the identification of homozygosity islands

被引:3
作者
Prates, Lucas [1 ]
Lemes, Renan B. [2 ]
Hunemeier, Tabita [2 ,3 ]
Leonardi, Florencia [1 ]
机构
[1] Univ Sao Paulo, Inst Math & Stat, Sao Paulo, Brazil
[2] Univ Sao Paulo, Inst Biol Sci, Sao Paulo, Brazil
[3] Univ Pompeu Fabra, Inst Biol Evolut, Barcelona, Spain
关键词
BINARY SEGMENTATION; SEQUENCE; PATTERNS; MODEL; RUNS;
D O I
10.1093/bioinformatics/btad170
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Motivation This work is motivated by the problem of identifying homozygosity islands on the genome of individuals in a population. Our method directly tackles the issue of identification of the homozygosity islands at the population level, without the need of analysing single individuals and then combine the results, as is made nowadays in state-of-the-art approaches.Results We propose regularized offline change-point methods to detect changes in the parameters of a multidimensional distribution when we have several aligned, independent samples of fixed resolution. We present a penalized maximum likelihood approach that can be efficiently computed by a dynamic programming algorithm or approximated by a fast binary segmentation algorithm. Both estimators are shown to converge almost surely to the set of change-points without the need of specifying a priori the number of change-points. In simulation, we observed similar performances from the exact and greedy estimators. Moreover, we provide a new methodology for the selection of the regularization constant which has the advantage of being automatic, consistent, and less prone to subjective analysis.Availability and implementation The data used in the application are from the Human Genome Diversity Project (HGDP) and is publicly available. Algorithms were implemented using the R software R Core Team (R: A Language and Environment for Statistical Computing. Vienna (Austria): R Foundation for Statistical Computing, 2020.) in the R package blockcpd, found at .
引用
收藏
页数:8
相关论文
共 36 条
[1]  
Agudelo-España D, 2020, PR MACH LEARN RES, V124, P320
[2]   A model selection approach for multiple sequence segmentation and dimensionality reduction [J].
Castro, Bruno M. ;
Lemes, Renan B. ;
Cesar, Jonatas ;
Hunemeier, Tabita ;
Leonardi, Florencia .
JOURNAL OF MULTIVARIATE ANALYSIS, 2018, 167 :319-330
[3]   Runs of homozygosity: windows into population history and trait architecture [J].
Ceballos, Francisco C. ;
Joshi, Peter K. ;
Clark, David W. ;
Ramsay, Michele ;
Wilson, James F. .
NATURE REVIEWS GENETICS, 2018, 19 (04) :220-+
[4]   Testing and locating variance changepoints with application to stock prices [J].
Chen, J ;
Gupta, AK .
JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 1997, 92 (438) :739-747
[5]  
Chen J., 2012, Parametric statistical change point analysis: with applications to genetics, medicine, and finance, Vsecond, DOI [10.1007/978-0-8176-4801-5, DOI 10.1007/978-0-8176-4801-5]
[6]   TESTS OF EQUALITY BETWEEN SETS OF COEFFICIENTS IN 2 LINEAR REGRESSIONS [J].
CHOW, GC .
ECONOMETRICA, 1960, 28 (03) :591-605
[7]   WILD BINARY SEGMENTATION FOR MULTIPLE CHANGE-POINT DETECTION [J].
Fryzlewicz, Piotr .
ANNALS OF STATISTICS, 2014, 42 (06) :2243-2281
[8]   A change-point model for a shift in variance [J].
Hawkins, DM ;
Zamba, KD .
JOURNAL OF QUALITY TECHNOLOGY, 2005, 37 (01) :21-31
[9]   Computationally Efficient Changepoint Detection for a Range of Penalties [J].
Haynes, Kaylea ;
Eckley, Idris A. ;
Fearnhead, Paul .
JOURNAL OF COMPUTATIONAL AND GRAPHICAL STATISTICS, 2017, 26 (01) :134-143
[10]  
HINKLEY DV, 1970, BIOMETRIKA, V57, P1