On optimal multiple changepoint algorithms for large data

被引:118
作者
Maidstone, Robert [1 ]
Hocking, Toby [2 ]
Rigaill, Guillem [3 ]
Fearnhead, Paul [4 ]
机构
[1] Univ Lancaster, STOR i Ctr Doctoral Training, Lancaster LA1 4YW, England
[2] McGill Univ & Genome Quebec Innovat Ctr, Montreal, PQ, Canada
[3] Univ Paris Sud, Univ Devry, Univ Paris Diderot,Univ Paris Diderot,Sorbonne Pa, CNRS,INRA,UMR 9213 UMR 1403,Inst Plant Sci, Paris, France
[4] Univ Lancaster, Dept Math & Stat, Lancaster LA1 4YW, England
基金
英国工程与自然科学研究理事会;
关键词
Breakpoints; Dynamic Programming; FPOP; SNIP; Optimal Partitioning; pDPA; PELT; Segment Neighbourhood; DNA-SEQUENCE SEGMENTATION; CHANGE-POINTS; NUMBER; IDENTIFICATION; CRITERION; MODELS;
D O I
10.1007/s11222-016-9636-3
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
Many common approaches to detecting change-points, for example based on statistical criteria such as penalised likelihood or minimum description length, can be formulated in terms of minimising a cost over segmentations. We focus on a class of dynamic programming algorithms that can solve the resulting minimisation problem exactly, and thus find the optimal segmentation under the given statistical criteria. The standard implementation of these dynamic programming methods have a computational cost that scales at least quadratically in the length of the time-series. Recently pruning ideas have been suggested that can speed up the dynamic programming algorithms, whilst still being guaranteed to be optimal, in that they find the true minimum of the cost function. Here we extend these pruning methods, and introduce two newalgorithms for segmenting data: FPOP and SNIP. Empirical results showthat FPOP is substantially faster than existing dynamic programming methods, and unlike the existing methods its computational efficiency is robust to the number of changepoints in the data. We evaluate the method for detecting copy number variations and observe that FPOP has a computational cost that is even competitive with that of binary segmentation, but can give much more accurate segmentations.
引用
收藏
页码:519 / 533
页数:15
相关论文
共 28 条
[1]   NEW LOOK AT STATISTICAL-MODEL IDENTIFICATION [J].
AKAIKE, H .
IEEE TRANSACTIONS ON AUTOMATIC CONTROL, 1974, AC19 (06) :716-723
[2]   Structural breaks in time series [J].
Aue, Alexander ;
Horvath, Lajos .
JOURNAL OF TIME SERIES ANALYSIS, 2013, 34 (01) :1-16
[3]   ALGORITHMS FOR THE OPTIMAL IDENTIFICATION OF SEGMENT NEIGHBORHOODS [J].
AUGER, IE ;
LAWRENCE, CE .
BULLETIN OF MATHEMATICAL BIOLOGY, 1989, 51 (01) :39-54
[4]  
Braun JV, 1998, STAT SCI, V13, P142
[5]   Multiple changepoint fitting via quasilikelihood, with application to DNA sequence segmentation [J].
Braun, JV ;
Braun, RK ;
Müller, HG .
BIOMETRIKA, 2000, 87 (02) :301-314
[6]  
Cleynen A., 2012, ARXIV E PRINTS
[7]   Structural break estimation for nonstationary time series models [J].
Davis, RA ;
Lee, TCM ;
Rodriguez-Yam, GA .
JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 2006, 101 (473) :223-239
[8]   Multiscale change point inference [J].
Frick, Klaus ;
Munk, Axel ;
Sieling, Hannes .
JOURNAL OF THE ROYAL STATISTICAL SOCIETY SERIES B-STATISTICAL METHODOLOGY, 2014, 76 (03) :495-580
[9]  
Fryzlewicz P., 2012, ANN STAT IN PRESS
[10]   Multiscale DNA partitioning: statistical evidence for segments [J].
Futschik, Andreas ;
Hotz, Thomas ;
Munk, Axel ;
Sieling, Hannes .
BIOINFORMATICS, 2014, 30 (16) :2255-2262