A user-friendly guide to using distance measures to compare time series in ecology

被引:8
作者
Dove, Shawn [1 ,2 ]
Bohm, Monika [2 ,3 ]
Freeman, Robin [2 ]
Jellesmark, Sean [1 ,2 ]
Murrell, David J. [1 ]
机构
[1] UCL, Ctr Biodivers & Environm Res, Gower St, London WC1E 6BT, England
[2] Zool Soc London, Inst Zool, London, England
[3] Indianapolis Zoo, Global Ctr Species Survival, Indianapolis, IN USA
基金
欧盟地平线“2020”;
关键词
classification; clustering; dissimilarity measures; distance measure selection; time series analysis; time series comparison; SIMILARITY MEASURES; CLASSIFICATION; COLOR;
D O I
10.1002/ece3.10520
中图分类号
Q14 [生态学(生物生态学)];
学科分类号
071012 ; 0713 ;
摘要
Time series are a critical component of ecological analysis, used to track changes in biotic and abiotic variables. Information can be extracted from the properties of time series for tasks such as classification (e.g., assigning species to individual bird calls); clustering (e.g., clustering similar responses in population dynamics to abrupt changes in the environment or management interventions); prediction (e.g., accuracy of model predictions to original time series data); and anomaly detection (e.g., detecting possible catastrophic events from population time series). These common tasks in ecological research all rely on the notion of (dis-) similarity, which can be determined using distance measures. A plethora of distance measures have been described, predominantly in the computer and information sciences, but many have not been introduced to ecologists. Furthermore, little is known about how to select appropriate distance measures for time-series-related tasks. Therefore, many potential applications remain unexplored. Here, we describe 16 properties of distance measures that are likely to be of importance to a variety of ecological questions involving time series. We then test 42 distance measures for each property and use the results to develop an objective method to select appropriate distance measures for any task and ecological dataset. We demonstrate our selection method by applying it to a set of real-world data on breeding bird populations in the UK and discuss other potential applications for distance measures, along with associated technical issues common in ecology. Our real-world population trends exhibit a common challenge for time series comparisons: a high level of stochasticity. We demonstrate two different ways of overcoming this challenge, first by selecting distance measures with properties that make them well suited to comparing noisy time series and second by applying a smoothing algorithm before selecting appropriate distance measures. In both cases, the distance measures chosen through our selection method are not only fit-for-purpose but are consistent in their rankings of the population trends. The results of our study should lead to an improved understanding of, and greater scope for, the use of distance measures for comparing ecological time series and help us answer new ecological questions. Distance measures are often used in ecology to perform common time-series-related tasks such as classification, clustering, prediction, and anomaly detection, but little is known about how to select appropriate distance measures for specific tasks. We present a selection method for choosing appropriate distance measures, then demonstrate the method on a real-world dataset and discuss common challenges and ways of overcoming them. The results of our study should lead to an improved understanding of, and greater scope for, the use of distance measures for comparing ecological time series, and help answer new ecological questions.image
引用
收藏
页数:32
相关论文
共 54 条
[1]   Time-series clustering - A decade review [J].
Aghabozorgi, Saeed ;
Shirkhorshidi, Ali Seyed ;
Teh Ying Wah .
INFORMATION SYSTEMS, 2015, 53 :16-38
[2]  
Agrawal R., 1993, Foundations of Data Organization and Algorithms. 4th International Conference. FODO '93 Proceedings, P69
[3]  
[Anonymous], 2020, LIVING PLANET REPORT
[4]   The great time series classification bake off: a review and experimental evaluation of recent algorithmic advances [J].
Bagnall, Anthony ;
Lines, Jason ;
Bostrom, Aaron ;
Large, James ;
Keogh, Eamonn .
DATA MINING AND KNOWLEDGE DISCOVERY, 2017, 31 (03) :606-660
[5]  
Batista G. E., 2011, Proceedings of the 2011 SIAM International Conference on data mining, P699, DOI [10.1137/1.9781611972818.60, DOI 10.1137/1.9781611972818.60]
[6]   Time series shape association measures and local trend association patterns [J].
Batyrshin, Ildar ;
Solovyev, Valery ;
Ivanov, Vladimir .
NEUROCOMPUTING, 2016, 175 :924-934
[7]   Time is an affliction: Why ecology cannot be as predictive as physics and why it needs time series [J].
Boero, F. ;
Kraberg, A. C. ;
Krause, G. ;
Wiltshire, K. H. .
JOURNAL OF SEA RESEARCH, 2015, 101 :12-18
[8]   A periodogram-based metric for time series classification [J].
Caiado, Jorge ;
Crato, Nuno ;
Pena, Daniel .
COMPUTATIONAL STATISTICS & DATA ANALYSIS, 2006, 50 (10) :2668-2684
[9]  
Capinha C, 2020, bioRxiv, DOI [10.1101/2020.09.14.296251, 10.1101/2020.09.14.296251, DOI 10.1101/2020.09.14.296251]
[10]   Predicting the timing of ecological phenomena using dates of species occurrence records: a methodological approach and test case with mushrooms [J].
Capinha, Cesar .
INTERNATIONAL JOURNAL OF BIOMETEOROLOGY, 2019, 63 (08) :1015-1024