p-adic distance and k-Nearest Neighbor classification

被引:4
作者
Kartal, Elif [1 ]
Caliskan, Fatma [2 ]
Eskisehirli, Beyaz Basak [3 ]
Ozen, Zeki [1 ]
机构
[1] Istanbul Univ, Fac Econ, Dept Management Informat Syst, Istanbul, Turkiye
[2] Istanbul Univ, Fac Sci, Dept Math, Algebra & Number Theory Div, Istanbul, Turkiye
[3] Istanbul Univ, Dept Math, Anal & Theory Funct Div, Istanbul, Turkiye
关键词
Classification; Metric; k-NN; Thep-adic distance; Machine learning;
D O I
10.1016/j.neucom.2024.127400
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The k-Nearest Neighbor (k-NN) is a well-known supervised learning algorithm. The effect of the distance used in the analysis on the k-NN performance is very important. According to Ostrowski's theorem, there are only two nontrivial absolute values on the field of rational numbers, Q, which are the usual absolute value and the p-adic absolute value for a prime p. In view of this theorem, the p-adic absolute value motivates us to calculate the p-adic distance between two samples for the k-NN algorithm. In this study, the p-adic distance on Q was coupled with the k-NN algorithm and was applied to 10 well-known public datasets containing categorical, numerical, and mixed (both categorical and numerical) type predictive attributes. Moreover, the p-adic distance performance was compared with Euclidean, Manhattan, Chebyshev, and Cosine distances. It was seen that the average accuracy obtained from the p-adic distance ranks first in 5 out of 10 datasets. Especially, in mixed datasets, the p-adic distance gave better results than other distances. For r = 1, 2, 3, the effect of the r -decimal values of the number for the p-adic calculation was examined on numerical and mixed datasets. In addition, the p parameter of the p-adic distance was tested with prime numbers less than 29, and it was found that the average accuracy obtained for each p was very close to each other, especially in categorical and mixed datasets. Also, it can be concluded that k-NN with the p-adic distance may be more suitable for binary classification than multi-class classification.
引用
收藏
页数:7
相关论文
共 37 条
[31]  
Pedregosa F, 2011, J MACH LEARN RES, V12, P2825
[32]  
Reback J., 2021, Pandas-dev/pandas: pandas, DOI 10.5281/ZENODO.3509134
[33]   Combining Minkowski and Cheyshev: New distance proposal and survey of distance metrics using k-nearest neighbours classifier [J].
Rodrigues, E. O. .
PATTERN RECOGNITION LETTERS, 2018, 110 :66-71
[34]  
Singh A, 2017, INT J ADV COMPUT SC, V8, P1
[35]  
Steinbach M., 2009, The Top Ten Algorithms in Data Mining
[36]  
Yilmaz I..G., 2021, J. Aeronautics Space Technol., V14, P19
[37]   p-adic Cellular Neural Networks [J].
Zambrano-Luna, B. A. ;
Zuniga-Galindo, W. A. .
JOURNAL OF NONLINEAR MATHEMATICAL PHYSICS, 2023, 30 (01) :34-70