Improving Instance Selection Methods for Big Data Classification

被引：0

作者：

Malhat, Mohamed ^{[1
]}

El Menshawy, Mohamed ^{[1
]}

Mousa, Hamdy ^{[1
]}

El Sisi, Ashraf ^{[1
]}

机构：

[1] Menoufia Univ, Fac Comp & Informat, Comp Sci Dept, Shibin Al Kawm, Egypt

来源：

2017 13TH INTERNATIONAL COMPUTER ENGINEERING CONFERENCE (ICENCO) | 2017年

关键词：

Big data; Data Mining; Data Reduction; Instance Selection; REDUCTION;

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

The explosion of data in many application domains leads to a new term called big data. While the big data volume rapidly exceeds, the capacity and processing capabilities of contributed data mining algorithms are not effective. The instance selection methods become a mandatory step prior to applying data mining algorithms. Instance selection methods scale training set down by eliminating redundant, erroneous, and unrelated instances. Recently, instance selection methods have improved to work on big data sets by splitting training data into disjoint subsets and applying instance selection methods on individual subsets. However, these improved methods have a variable performance in the degree of reduction rate and classification accuracy. In this work, we propose an operational and unified framework to balance between reduction rate and classification accuracy. It starts with splitting a training set into class-balanced subsets to analyze the impact of the splitting process on the performance regarding the reduction rate and classification accuracy. It then applies two different instance selection methods on each subset. We implement and test experimentally the framework using two standard data sets. With the random splitting process as a benchmark, the results prove that the class-balanced splitting process is preferred regarding the classification accuracy criterion. The results also depict that the combination of two instance selection methods remarkably reduces the performance variability.

引用

页码：213 / 218

页数：6

共 50 条

[1] Instance selection of linear complexity for big data
Arnaiz-Gonzalez, Alvar
Diez-Pastor, Jose-Francisco
Rodriguez, Juan J.
Garcia-Osorio, Cesar
KNOWLEDGE-BASED SYSTEMS, 2016, 107 : 83 - 95
[2] Evidential instance selection for K-nearest neighbor classification of big data
Gong, Chaoyu
Su, Zhi-gang
Wang, Pei-hong
Wang, Qian
You, Yang
INTERNATIONAL JOURNAL OF APPROXIMATE REASONING, 2021, 138 : 123 - 144
[3] A review of instance selection methods
Arturo Olvera-Lopez, J.
Ariel Carrasco-Ochoa, J.
Francisco Martinez-Trinidad, J.
Kittler, Josef
ARTIFICIAL INTELLIGENCE REVIEW, 2010, 34 (02) : 133 - 143
[4] LSIS: Large Scale Instance Selection Algorithm for Big Data
Marone, Reine Marie
Camara, Fode
Ndiaye, Samba
PROCEEDINGS OF 2017 3RD IEEE INTERNATIONAL CONFERENCE ON COMPUTER AND COMMUNICATIONS (ICCC), 2017, : 2353 - 2356
[5] A distributed evolutionary based instance selection algorithm for big data using Apache Spark
Qin, Liyang
Wang, Xiaoli
Yin, Linzi
Jiang, Zhaohui
APPLIED SOFT COMPUTING, 2024, 159
[6] Exploring Performance of Instance Selection Methods in Text Sentiment Classification
Onan, Aytug
Korukoglu, Serdar
ARTIFICIAL INTELLIGENCE PERSPECTIVES IN INTELLIGENT SYSTEMS, VOL 1, 2016, 464 : 167 - 179
[7] MR-DIS: democratic instance selection for big data by MapReduce
Arnaiz-González Á.
González-Rogel A.
Díez-Pastor J.-F.
López-Nozal C.
Progress in Artificial Intelligence, 2017, 6 (3) : 211 - 219
[8] Cluster-Based Instance Selection for the Imbalanced Data Classification
Czarnowski, Ireneusz
Jedrzejowicz, Piotr
COMPUTATIONAL COLLECTIVE INTELLIGENCE, ICCCI 2018, PT II, 2018, 11056 : 191 - 200
[9] Combining instance selection methods based on data characterization: An approach to increase their effectiveness
Caises, Yoel
Gonzalez, Antonio
Leyva, Enrique
Perez, Raul
INFORMATION SCIENCES, 2011, 181 (20) : 4780 - 4798
[10] Learning to detect representative data for large scale instance selection
Lin, Wei-Chao
Tsai, Chih-Fong
Ke, Shih-Wen
Hung, Chia-Wen
Eberle, William
JOURNAL OF SYSTEMS AND SOFTWARE, 2015, 106 : 1 - 8

← 1 2 3 4 5 →