Embedding and learning with signatures

被引:22
作者
Fermanian, Adeline [1 ]
机构
[1] Sorbonne Univ, CNRS, Lab Probabilites Stat & Modelisat, 4 Pl Jussieu, F-75005 Paris, France
关键词
Sequential data; Time series classification; Functional data; Signature;
D O I
10.1016/j.csda.2020.107148
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
Sequential and temporal data arise in many fields of research, such as quantitative finance, medicine, or computer vision. A novel approach for sequential learning, called the signature method and rooted in rough path theory, is considered. Its basic principle is to represent multidimensional paths by a graded feature set of their iterated integrals, called the signature. This approach relies critically on an embedding principle, which consists in representing discretely sampled data as paths, i.e., functions from [0, 1] to R-d. After a survey of machine learning methodologies for signatures, the influence of embeddings on prediction accuracy is investigated with an in-depth study of three recent and challenging datasets. It is shown that a specific embedding, called lead-lag, is systematically the strongest performer across all datasets and algorithms considered. Moreover, an empirical study reveals that computing signatures over the whole path domain does not lead to a loss of local information. It is concluded that, with a good embedding, combining signatures with other simple algorithms achieves results competitive with state-of-the-art, domain-specific approaches. (C) 2020 Elsevier B.V. All rights reserved.
引用
收藏
页数:23
相关论文
共 50 条
[1]  
[Anonymous], 2013, ARXIV13080371
[2]  
Bagnall A, 2018, The UEA & UCR time series classification repository
[3]   The great time series classification bake off: a review and experimental evaluation of recent algorithmic advances [J].
Bagnall, Anthony ;
Lines, Jason ;
Bostrom, Aaron ;
Large, James ;
Keogh, Eamonn .
DATA MINING AND KNOWLEDGE DISCOVERY, 2017, 31 (03) :606-660
[4]  
Berndt DonaldJ., 1996, FINDING PATTERNS TIM, P229
[5]   Random forests [J].
Breiman, L .
MACHINE LEARNING, 2001, 45 (01) :5-32
[6]  
Chang J., 2017, Electron. Commun. Prob., V22, P1
[7]  
Chen K.-T., 1958, Trans. Amer. Math. Soc., V89, P395, DOI DOI 10.2307/1993193
[8]   XGBoost: A Scalable Tree Boosting System [J].
Chen, Tianqi ;
Guestrin, Carlos .
KDD'16: PROCEEDINGS OF THE 22ND ACM SIGKDD INTERNATIONAL CONFERENCE ON KNOWLEDGE DISCOVERY AND DATA MINING, 2016, :785-794
[9]  
Chevyrev I., 2018, Signature moments to characterize laws of stochastic processes
[10]  
Chevyrev I., 2016, A primer on the signature method in machine learning