ON ROBUST INFORMATION EXTRACTION FROM HIGH-DIMENSIONAL DATA

被引:14
作者
Kalina, Jan [1 ]
机构
[1] Acad Sci Czech Republ, Inst Comp Sci, Vodarenskou Vezi 2, Prague 18207 8, Czech Republic
关键词
Data mining; high-dimensional data; robust econometrics; outliers; machine learning;
D O I
10.5937/sjm9-5520
中图分类号
C93 [管理学];
学科分类号
12 ; 1201 ; 1202 ; 120202 ;
摘要
Information extraction from high-dimensional data represents an important problem in current applications in management or econometrics. An important problem from a practical point of view is the sensitivity of machine learning methods with respect to the presence of outlying data values, while numerical stability represents another important aspect of data mining from high-dimensional data. This paper gives an overview of various types of data mining, discusses their suitability for high-dimensional data and critically discusses their properties from the robustness point of view, while we explain that the robustness itself is perceived differently in different contexts. Moreover, we investigate properties of a robust nonlinear regression estimator of Kalina (2013).
引用
收藏
页码:131 / 144
页数:14
相关论文
共 48 条
[1]  
Belloni A., Chernozhukov V., Hansen C., Inference for high-dimensional sparse econometric models, (2011)
[2]  
Blankertz B., Tangermann M., Popescu F., Krauledat M., Fazli S., Donaczy M., Curio G., Muller K.R., The Berlin brain-computer interface, Lecture Notes in Computer Science, 5050, pp. 79-101, (2008)
[3]  
Bobrowski L., Lukaszuk T., Relaxed linear separability (RLS) approach to feature (gene) subset selection, Selected Works in Bioinformatics, pp. 103-118, (2011)
[4]  
Bobrowski L., Lukaszuk T., Prognostic modeling with high dimensional and censored data, Lecture Notes in Computer Science, 7377, pp. 178-193, (2012)
[5]  
Brandl B., Keber C., Schuster M., An automated econometric decision support system: Forecasts for foreign exchange trades, Central European Journal of Operations Research, 14, pp. 401-415, (2006)
[6]  
Christmann A., Van Messem A., Bouligand derivatives and robustness of support vector machines for regression, Journal of Machine Learning Research, 9, pp. 915-936, (2008)
[7]  
Dai J.J., Lieu L., Rocke D., Dimension reduction for classification with gene expression microarray data, Statistical Applications in Genetics and Molecular Biology, 5, 1, (2006)
[8]  
Duan N., Li K.C., Slicing regression: A link-free regression method, Annals of Statistics, 19, pp. 505-530, (1991)
[9]  
Fernandez G., Data mining using SAS applications, (2003)
[10]  
Funk M.J., Westreich D., Wiesen C., Sturmer T., Brookhart M.A., Davidian M., Doubly robust estimation of causal effects, American Journal of Epidemiology, 173, 7, pp. 761-767, (2011)