Attribute reduction based on overlap degree and k-nearest-neighbor rough sets in decision information systems

被引：43

作者：

Hu, Meng ^{[1
]}

Tsang, Eric C. C. ^{[1
]}

Guo, Yanting ^{[1
]}

Chen, Degang ^{[2
]}

Xu, Weihua ^{[3
]}

机构：

[1] Macau Univ Sci & Technol, Fac Informat Technol, Taipa, Macao, Peoples R China

[2] North China Elect Power Univ, Dept Math & Phys, Beijing 102206, Peoples R China

[3] Southwest Univ, Coll Artificial Intelligence, Chongqing 400715, Peoples R China

来源：

INFORMATION SCIENCES | 2022年 / 584卷

基金：

中国国家自然科学基金;

关键词：

k-nearest-neighbor rough sets; Attribute reduction; Overlap degree; Neighborhood rough sets; FEATURE-SELECTION; FUZZY-SETS;

D O I：

10.1016/j.ins.2021.10.063

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

The k-nearest-neighbor rule is a popular classification technique, and rough set theory is an effective mathematical tool to deal with the uncertainty of data. Rough set models based on k-nearest-neighbor relations have a strong ability to approximate decisions, but the cal-culation is very time-consuming. In this paper, we model the overlap degree of objects from different categories in advance to accelerate the attribute reduction and improve the classification performance of the selected attributes. Firstly, we define the coincidence degree (CD) and distance (DIS) of objects from different categories to measure the coverage and distance of between-class objects. Secondly, we combine CD and DIS to define the over-lap degree (OD) to pre-sort attributes, then use k-nearest-neighbor rough sets to filter inconsistent and redundant attributes. The pre-sort operation based on OD can greatly reduce the number of searches for attributes and ensure that the attributes with high sep-arability should be selected first. Furthermore, we design a fast reduction algorithm (OD&KNN) to obtain a reduct with the ability to approximate decisions as well as the orig-inal attributes but with lower OD. Comparing experimental results and time complexity of OD&KNN with state-of-the-art algorithms, OD&KNN is more efficient for high-dimensional data while ensuring classification accuracy. (c) 2021 Elsevier Inc. All rights reserved.

引用

页码：301 / 324

页数：24