Approximate NORTA simulations for virtual sample generation

被引:8
作者
Coqueret, Guillaume [1 ,2 ]
机构
[1] Montpellier Business Sch, 2300 Ave Moulins, F-34080 Montpellier, France
[2] Fundvisory, 48 Rue Chteau Landon, F-75010 Paris, France
关键词
NORTA simulation; Multivariate sampling; Regression trees; Support Vector Machine; DATA MINING TECHNIQUES; BIG DATA; RANDOM-VARIABLES; DATA SCIENCE; ALGORITHM; ANALYTICS; DECISION; DISTRIBUTIONS; MANAGEMENT; COPULAS;
D O I
10.1016/j.eswa.2016.12.027
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We introduce an approximate variant of the NORTA method which aims at generating structured data from a given prior sample. The technique accommodates for any combinations of marginals (especially continuous/discrete mixtures) and a wide range of correlation structures. We focus on the interesting case where the sample includes categorical data, both ordered and unordered. We provide an application in the financial industry through a test of our iterative Newton-like algorithm on a dataset comprising the results of a questionnaire. We show that the sampled data, similarly to the NORTA technique, matches both the marginal and correlation structures of the original dataset closely. Consequently, analyses such as decision tree modeling or Support Vector Machine classification and regression, can be carried out on the new, much larger, sample without altering the core properties of the original sample. (c) 2016 Elsevier Ltd. All rights reserved.
引用
收藏
页码:69 / 81
页数:13
相关论文
共 56 条
[1]  
Abramowitz M., 1964, HDB MATH FUNCTIONS F
[2]   Stability of Recommendation Algorithms [J].
Adomavicius, Gediminas ;
Zhang, Jingjing .
ACM TRANSACTIONS ON INFORMATION SYSTEMS, 2012, 30 (04)
[3]  
Agresti A., 2011, Categorical Data Analysis, DOI DOI 10.1002/0471249688
[4]   The effects of adding noise during backpropagation training on a generalization performance [J].
An, GZ .
NEURAL COMPUTATION, 1996, 8 (03) :643-674
[5]  
[Anonymous], Pronamel Gentle Whitening
[6]  
Atkinson E J., 2015, An introduction to recursive partitioning using the RPART routines
[7]   Efficient Correlation Matching for Fitting Discrete Multivariate Distributions with Arbitrary Marginals and Normal-Copula Dependence [J].
Avramidis, Athanassios N. ;
Channouf, Nabil ;
L'Ecuyer, Pierre .
INFORMS JOURNAL ON COMPUTING, 2009, 21 (01) :88-106
[8]  
Balakrishnan N., 2009, CONTINUOUS BIVARIATE, V2nd, DOI [DOI 10.1007/B101765, 10.1007/b101765]
[9]   TRAINING WITH NOISE IS EQUIVALENT TO TIKHONOV REGULARIZATION [J].
BISHOP, CM .
NEURAL COMPUTATION, 1995, 7 (01) :108-116
[10]   Semiparametric multivariate density estimation for positive data using copulas [J].
Bouezmarni, T. ;
Rombouts, J. V. K. .
COMPUTATIONAL STATISTICS & DATA ANALYSIS, 2009, 53 (06) :2040-2054