Overcoming Long Inference Time of Nearest Neighbors Analysis in Regression and Uncertainty Prediction

被引:0
作者
Koutenský F. [1 ]
Šimánek P. [1 ]
Čepek M. [1 ]
Kovalenko A. [1 ]
机构
[1] Faculty of Information Technology, Czech Technical University in Prague, Thakurova 9, Prague
关键词
Automated appraisal; Inference time; Nearest neighbors; Probabilistic estimation; Uncertainty;
D O I
10.1007/s42979-024-02670-2
中图分类号
学科分类号
摘要
The intuitive approach of comparing like with like, forms the basis of the so-called nearest neighbor analysis, which is central to many machine learning algorithms. Nearest neighbor analysis is easy to interpret, analyze, and reason about. It is widely used in advanced techniques such as uncertainty estimation in regression models, as well as the renowned k-nearest neighbor-based algorithms. Nevertheless, its high inference time complexity, which is dataset size dependent even in the case of its faster approximated version, restricts its applications and can considerably inflate the application cost. In this paper, we address the problem of high inference time complexity. By using gradient-boosted regression trees as a predictor of the labels obtained from nearest neighbor analysis, we demonstrate a significant increase in inference speed, improving by several orders of magnitude. We validate the effectiveness of our approach on a real-world European Car Pricing Dataset with approximately 4.2×106 rows for both residual cost and price uncertainty prediction. Moreover, we assess our method’s performance on the most commonly used tabular benchmark datasets to demonstrate its scalability. The link is to github repository where the code is available: https://github.com/koutefra/uncertainty_experiments. © The Author(s) 2024.
引用
收藏
相关论文
共 35 条
  • [1] Bhatia N., Et al., Survey of nearest neighbor techniques, Arxiv Preprint Arxiv, 1007, (2010)
  • [2] Zafar M.R., Khan N., Deterministic local interpretable model-agnostic explanations for stable explainability, Mach Learn Knowl Extr, 3, 3, pp. 525-541, (2021)
  • [3] Winter B., Matlock T., Making judgments based on similarity and proximity, Metaphor Symb, 28, 4, pp. 219-232, (2013)
  • [4] Fix E., Hodges J., An important contribution to nonparametric discriminant analysis and density estimation, Int Stat Rev, 3, 57, pp. 233-238, (1951)
  • [5] Mukhlishin M.F., Saputra R., Wibowo A., Predicting house sale price using fuzzy logic, artificial neural network and k-nearest neighbor, 2017 1St International Conference on Informatics and Computational Sciences (Icicos), pp. 171-176, (2017)
  • [6] Brophy J., Lowd D., Instance-Based Uncertainty Estimation for Gradient-Boosted Regression Trees, (2022)
  • [7] Liu L., Lu S., Zhong R., Wu B., Yao Y., Zhang Q., Shi W., Computing systems for autonomous driving: state of the art and challenges, IEEE Internet Things J, 8, 8, pp. 6469-6486, (2020)
  • [8] Liu T., Moore A., Yang K., Gray A., An investigation of practical approximate nearest neighbor algorithms, Adv Neural Inf Process Syst, 17, (2004)
  • [9] Omohundro S.M., Five Balltree Construction Algorithms, International Computer Science Institute Berkeley, (1989)
  • [10] Clarkson K.L., Et al., Nearest-neighbor searching and metric space dimensions, Nearest-Neighbor Methods for Learning and Vision: Theory and Practice, pp. 15-59, (2006)