Semantic-k-NN algorithm: An enhanced version of traditional k-NN algorithm

被引:58
作者
Ali, Munwar [1 ]
Jung, Low Tang [2 ]
Abdel-Aty, Abdel-Haleem [3 ,4 ]
Abubakar, Mustapha Y. [5 ]
Elhoseny, Mohamed [6 ]
Ali, Irfan [7 ]
机构
[1] Shaheed Benazir Bhutto Univ, Dept Informat Technol, Shaheed Benazirabad, Sindh, Pakistan
[2] Univ Teknol PETRONAS, Dept Comp & Informat Sci, Seri Iskandar, Perak, Malaysia
[3] Univ Bisha, Coll Sci, Dept Phys, POB 344, Bisha 61922, Saudi Arabia
[4] Al Azhar Univ, Fac Sci, Phys Dept, Assiut 71524, Egypt
[5] Kano State Polytech, Sch Technol, Compter Sci Dept, Kano, Nigeria
[6] Mansoura Univ, Fac Comp & Informat, Dept Informat Syst, Mansoura, Egypt
[7] MUET Jamshoro, Dept Comp Syst Engn, Sindh, Pakistan
关键词
Semantic itemization; Bigram model; Big data analytics; Semantic-kNN; Machine learning; CLASSIFICATION;
D O I
10.1016/j.eswa.2020.113374
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The k-NN algorithm is one of the most renowned ML algorithms widely used in the area of data classification research. With the emergence of big data, the performance and the efficiency of the traditional k-NN algorithm is fast becoming a critical issue. The traditional k-NN algorithm is inefficient to solve the high volume multi-categorical training datasets Traditional k-NN algorithm has a constraint in filtering the training dataset to yield training data that are most relevant to the intended or the targeted test dataset/file. It has to scan through all the training datasets categories to classify the intended/targeted data. As such, traditional k-NN is considered not intelligent and consequently is suffering poor accuracy performance with high computational complexity. A Semantic-kNN (Sic-NN) algorithm for ML is thus proposed in this paper to address the limitations in the traditional k-NN. The proposed Sk-NN deploys a process by leveraging on the semantic itemization and bigram model to filter the training dataset in accordance with the relevant information engaged in the test dataset. It is aimed for general security applications such as finding (the confidentiality level of the data when the algorithm is trained with multiple training categories during the data classification phase. Ultimately, Sk-NN is to elevate the ML performance in pattern extraction and labeling in the big data context. (C) 2020 Elsevier Ltd. All rights reserved.
引用
收藏
页数:18
相关论文
共 46 条
[1]  
Alizadeh H, 2008, PROC WORLD C ENG COM
[2]  
Angiulli F., 2005, P 22 INT C MACH LEAR, P1
[3]  
Bijalwan V., 2014, INT J DATABASE THEOR, V7, P61, DOI [DOI 10.14257/IJDTA.2014.7.1.06, 10.14257/ijdta.2014.7.1.06]
[4]  
Chiang Tsung- Hsien, 2012, P AS C MACH LEARN NO, P81
[5]  
Ellard D., 2003, Attribute-based prediction of file properties
[6]  
Florian Radu., 2001, Fast Transformation-Based Learning
[7]   Boosting k-nearest neighbor classifier by means of input space projection [J].
Garcia-Pedrajas, Nicolas ;
Ortiz-Boyer, Domingo .
EXPERT SYSTEMS WITH APPLICATIONS, 2009, 36 (07) :10570-10582
[8]  
GATES GW, 1972, IEEE T INFORM THEORY, V18, P431, DOI 10.1109/TIT.1972.1054809
[9]  
Ha J, 2012, DATA MINING CONCEPTS
[10]  
He J., 2000, PRICAI 2000 WORKSH T, V35, P24