p-adic distance and k-Nearest Neighbor classification

被引:4
作者
Kartal, Elif [1 ]
Caliskan, Fatma [2 ]
Eskisehirli, Beyaz Basak [3 ]
Ozen, Zeki [1 ]
机构
[1] Istanbul Univ, Fac Econ, Dept Management Informat Syst, Istanbul, Turkiye
[2] Istanbul Univ, Fac Sci, Dept Math, Algebra & Number Theory Div, Istanbul, Turkiye
[3] Istanbul Univ, Dept Math, Anal & Theory Funct Div, Istanbul, Turkiye
关键词
Classification; Metric; k-NN; Thep-adic distance; Machine learning;
D O I
10.1016/j.neucom.2024.127400
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The k-Nearest Neighbor (k-NN) is a well-known supervised learning algorithm. The effect of the distance used in the analysis on the k-NN performance is very important. According to Ostrowski's theorem, there are only two nontrivial absolute values on the field of rational numbers, Q, which are the usual absolute value and the p-adic absolute value for a prime p. In view of this theorem, the p-adic absolute value motivates us to calculate the p-adic distance between two samples for the k-NN algorithm. In this study, the p-adic distance on Q was coupled with the k-NN algorithm and was applied to 10 well-known public datasets containing categorical, numerical, and mixed (both categorical and numerical) type predictive attributes. Moreover, the p-adic distance performance was compared with Euclidean, Manhattan, Chebyshev, and Cosine distances. It was seen that the average accuracy obtained from the p-adic distance ranks first in 5 out of 10 datasets. Especially, in mixed datasets, the p-adic distance gave better results than other distances. For r = 1, 2, 3, the effect of the r -decimal values of the number for the p-adic calculation was examined on numerical and mixed datasets. In addition, the p parameter of the p-adic distance was tested with prime numbers less than 29, and it was found that the average accuracy obtained for each p was very close to each other, especially in categorical and mixed datasets. Also, it can be concluded that k-NN with the p-adic distance may be more suitable for binary classification than multi-class classification.
引用
收藏
页数:7
相关论文
共 37 条
[1]   Effects of Distance Measure Choice on K-Nearest Neighbor Classifier Performance: A Review [J].
Abu Alfeilat, Haneen Arafat ;
Hassanat, Ahmad B. A. ;
Lasassmeh, Omar ;
Tarawneh, Ahmad S. ;
Alhasanat, Mahmoud Bashir ;
Salman, Hamzeh S. Eyal ;
Prasath, V. B. Surya .
BIG DATA, 2019, 7 (04) :221-248
[2]   Evaluation of k-nearest neighbour classifier performance for heterogeneous data sets [J].
Ali, Najat ;
Neagu, Daniel ;
Trundle, Paul .
SN APPLIED SCIENCES, 2019, 1 (12)
[3]   Optimized implementation of an improved KNN classification algorithm using Intel FPGA platform: Covid-19 case study [J].
Almomany, Abedalmuhdi ;
Ayyad, Walaa R. ;
Jarrah, Amin .
JOURNAL OF KING SAUD UNIVERSITY-COMPUTER AND INFORMATION SCIENCES, 2022, 34 (06) :3815-3827
[4]  
Arora I., 2021, CEUR WORKSHOP P, V3176
[5]   The frame of the p-adic numbers [J].
Avila, F. .
TOPOLOGY AND ITS APPLICATIONS, 2020, 273
[6]  
Bachman G., 1964, Academic paperbacks. Mathematics
[7]  
Batista G., 2009, ARGENTINE S ARTIFICI, P1
[8]  
Bradley PE, 2009, P-ADIC NUMBERS ULTRA, V1, P271, DOI 10.1134/S2070046609040013
[9]   SOME LACUNARY POWER SERIES AND MAHLER'S Um-NUMBERS IN p-ADIC DOMAIN [J].
Caliskan, Fatma .
COMPTES RENDUS DE L ACADEMIE BULGARE DES SCIENCES, 2022, 75 (04) :477-485
[10]  
Cha S.-H., 2007, City, V1, P300