Prediction intervals;
regression;
model validation;
data and knowledge visualization;
methodologies and tools;
RELIABILITY;
D O I:
10.3233/IDA-140673
中图分类号:
TP18 [人工智能理论];
学科分类号:
081104 ;
0812 ;
0835 ;
1405 ;
摘要:
In this article we compare and put to test two families of non-parametric approaches to constructing prediction intervals for arbitrary regression models in the supervised learning framework. It is often assumed for the errors to be independent and identically distributed, but we focus on the general case when the errors may be input dependent. The first family of approaches is based on the idea of explaining the total prediction error as a sum of the model's error and the error caused by noise inherent to the data, so the two are estimated independently. The second family is based on the assumption of similarity of the data and these approaches estimate the prediction intervals of the target regression variable by using sample's nearest neighbors. Results on a large set of artificial and real-world datasets show that one method from the second family is superior to other methods. Approaches from the first family always form valid, yet not necessarily confirmatory prediction intervals, whereas approaches from the second family prove to be more time efficient.
机构:
Virginia Commonwealth Univ, Ctr Biomarker Res & Personalized Med, Richmond, VA 23284 USAVirginia Commonwealth Univ, Ctr Biomarker Res & Personalized Med, Richmond, VA 23284 USA