Random forests regression for soft interval data

被引:0
作者
Gaona-Partida, Paul [1 ]
Yeh, Chih-Ching [2 ]
Sun, Yan [3 ]
Cutler, Adele [3 ]
机构
[1] Univ Calif Irvine, Dept Stat, Irvine, CA USA
[2] PricewaterhouseCoopers LLP, Pharmaceut & Life Sci Res & Dev, Salt Lake City, UT USA
[3] Utah State Univ, Dept Math & Stat, Logan, UT 84322 USA
关键词
L-2 dissimilarity measure; Distance; Nonlinearity; Nonparametric; Regression tree; LINEAR-REGRESSION; MODELS;
D O I
10.1080/03610918.2024.2396401
中图分类号
O21 [概率论与数理统计]; C8 [统计学];
学科分类号
020208 ; 070103 ; 0714 ;
摘要
Analyzing soft interval data for uncertainty quantification has attracted much attention recently. Within this context, regression methods for interval data have been extensively studied. As most existing works focus on linear models, it is important to note that many problems in practice are nonlinear in nature and the development of nonlinear regression tools for interval data is crucial. This paper proposes an interval-valued random forests model that defines the splitting criterion of variance reduction based on an L-2 type metric in the space of compact intervals. The model simultaneously considers the centers and ranges of the interval data as well as their possible interactions. Unlike most linear models that require additional constraints to ensure mathematical coherences, the proposed random forests model estimates the regression function in a nonparametric way, and so the predicted interval length is naturally nonnegative without any constraints. Simulation studies show that the new method outperforms typical existing regression methods for various linear, semi-linear, and nonlinear data archetypes and under different error measures. To demonstrate the applicability, a real data example is presented where the price range data of the Dow Jones Industrial Average index and its component stocks are analyzed.
引用
收藏
页数:20
相关论文
共 40 条
  • [1] Testing linear independence in linear models with interval-valued data
    Angeles Gil, Maria
    Gonzalez-Rodriguez, Gil
    Colubi, Ana
    Montenegro, Manuel
    [J]. COMPUTATIONAL STATISTICS & DATA ANALYSIS, 2007, 51 (06) : 3002 - 3015
  • [2] STRONG LAW OF LARGE NUMBERS FOR RANDOM COMPACT SETS
    ARTSTEIN, Z
    VITALE, RA
    [J]. ANNALS OF PROBABILITY, 1975, 3 (05) : 879 - 882
  • [3] INTEGRALS OF SET-VALUED FUNCTIONS
    AUMANN, RJ
    [J]. JOURNAL OF MATHEMATICAL ANALYSIS AND APPLICATIONS, 1965, 12 (01) : 1 - &
  • [4] Billard L., 2002, Classification, Clustering, and Data Analysis: Recent Advances and Applications, P281, DOI [DOI 10.1007/978-3-642-56181-8_31, 10.1007/978-3-642-56181-8_31, DOI 10.1007/978-3-642-56181-831]
  • [5] Billard Lynne., 2007, SELECTED CONTRIBUTIO, P3
  • [6] Estimation of a flexible simple linear model for interval data based on set arithmetic
    Blanco-Fernandez, Angela
    Corral, Norberto
    Gonzalez-Rodriguez, Gil
    [J]. COMPUTATIONAL STATISTICS & DATA ANALYSIS, 2011, 55 (09) : 2568 - 2578
  • [7] Random forests
    Breiman, L
    [J]. MACHINE LEARNING, 2001, 45 (01) : 5 - 32
  • [8] Breiman L, 1996, MACH LEARN, V24, P123, DOI 10.1023/A:1018054314350
  • [9] Likelihood-based Imprecise Regression
    Cattaneo, Marco E. G. V.
    Wiencierz, Andrea
    [J]. INTERNATIONAL JOURNAL OF APPROXIMATE REASONING, 2012, 53 (08) : 1137 - 1154
  • [10] Regression Models for Symbolic Interval-Valued Variables
    Chacon, Jose Emmanuel
    Rodriguez, Oldemar
    [J]. ENTROPY, 2021, 23 (04)