A Nonparametric Kernel Approach to Interval-Valued Data Analysis

被引:9
作者
Jeon, Yongho [1 ]
Ahn, Jeongyoun [2 ]
Park, Cheolwoo [2 ]
机构
[1] Yonsei Univ, Dept Appl Stat, Seoul 120749, South Korea
[2] Univ Georgia, Dept Stat, Athens, GA 30602 USA
基金
新加坡国家研究基金会;
关键词
Conditional distribution; Interval prediction; Nonparametric density estimation; Symbolic data;
D O I
10.1080/00401706.2014.965346
中图分类号
O21 [概率论与数理统计]; C8 [统计学];
学科分类号
020208 ; 070103 ; 0714 ;
摘要
This article concerns datasets in which variables are in the form of intervals, which are obtained by aggregating information about variables from a larger dataset. We propose to view the observed set of hyper-rectangles as an empirical histogram, and to use a Gaussian kernel type estimator to approximate its underlying distribution in a nonparametric way. We apply this idea to both univariate density estimation and regression problems. Unlike many existing methods used in regression analysis, the proposed method can estimate the conditional distribution of the response variable for any given set of predictors even when some of them are not interval-valued. Empirical studies show that the proposed approach has a great flexibility in various scenarios with complex relationships between the location and width of intervals of the response and predictor variables.
引用
收藏
页码:566 / 575
页数:10
相关论文
共 21 条
[1]  
Ahn J., Peng M., Park C., Jeon Y., A resampling approach for interval-valued data regression, Statistical Analysis and Data Mining, 5, pp. 336-348, (2012)
[2]  
Billard L., Dependencies and variation components of symbolic interval-valued data, Selected Contributions in Data Analysis and Classification, pp. 3-13, (2007)
[3]  
Billard L., Diday E., Regression analysis for interval-valued data, Data Analysis, Classification, and Related Methods, pp. 369-374, (2000)
[4]  
Billard L., Diday E., Symbolic regression analysis, Classification, Clustering and Data Analysis: Proceedings of the 8th Conference of the International Federation of Classification Societies (IFCS '02), pp. 281-288, (2002)
[5]  
Billard L., Diday E., Symbolic Data Analysis: Conceptual Statistics and Data Mining, (2007)
[6]  
Blanco-Fernandez A., Colubi A., Garca-Barzana M., Confidence sets in a linear regression model for interval data, Journal of Statistical Planning and Inference, 142, pp. 1320-1329, (2012)
[7]  
Blanco-Fernandez A., Colubi A., Garca-Barzana M., A set arithmetic-based linear regression model for modelling interval-valued responses through real-valued variables, Information Sciences, 247, pp. 109-122, (2013)
[8]  
Blanco-Fernandez A., Corral N., Gonzalez-Rodrguez G., Estimation of a flexible simple linear model for interval data based on set arithmetic, Computational Statistics and Data Analysis, 55, pp. 2568-2578, (2011)
[9]  
Cipollini F., Engle R.F., Gallo G.M., Semiparametric vector MEM, Journal of Applied Econometrics, 28, pp. 1067-1086, (2013)
[10]  
Cleveland W., Visualizing Data, (1993)