Improving Instance Selection Methods for Big Data Classification

被引:0
|
作者
Malhat, Mohamed [1 ]
El Menshawy, Mohamed [1 ]
Mousa, Hamdy [1 ]
El Sisi, Ashraf [1 ]
机构
[1] Menoufia Univ, Fac Comp & Informat, Comp Sci Dept, Shibin Al Kawm, Egypt
来源
2017 13TH INTERNATIONAL COMPUTER ENGINEERING CONFERENCE (ICENCO) | 2017年
关键词
Big data; Data Mining; Data Reduction; Instance Selection; REDUCTION;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The explosion of data in many application domains leads to a new term called big data. While the big data volume rapidly exceeds, the capacity and processing capabilities of contributed data mining algorithms are not effective. The instance selection methods become a mandatory step prior to applying data mining algorithms. Instance selection methods scale training set down by eliminating redundant, erroneous, and unrelated instances. Recently, instance selection methods have improved to work on big data sets by splitting training data into disjoint subsets and applying instance selection methods on individual subsets. However, these improved methods have a variable performance in the degree of reduction rate and classification accuracy. In this work, we propose an operational and unified framework to balance between reduction rate and classification accuracy. It starts with splitting a training set into class-balanced subsets to analyze the impact of the splitting process on the performance regarding the reduction rate and classification accuracy. It then applies two different instance selection methods on each subset. We implement and test experimentally the framework using two standard data sets. With the random splitting process as a benchmark, the results prove that the class-balanced splitting process is preferred regarding the classification accuracy criterion. The results also depict that the combination of two instance selection methods remarkably reduces the performance variability.
引用
收藏
页码:213 / 218
页数:6
相关论文
共 50 条
  • [21] An Efficient Approach for Instance Selection
    Carbonera, Joel Luis
    BIG DATA ANALYTICS AND KNOWLEDGE DISCOVERY, DAWAK 2017, 2017, 10440 : 228 - 243
  • [22] A New Automated Big Data Partitioning Approach to Improve Condensation Methods Performance
    Malhat, Mohamed
    El-Menshawy, Mohamed
    Mousa, Hamdy
    El-Sisi, Ashraf
    2018 14TH INTERNATIONAL COMPUTER ENGINEERING CONFERENCE (ICENCO), 2018, : 90 - 95
  • [23] Instance selection for big data based on locally sensitive hashing and double-voting mechanism
    Junhai Zhai
    Yajie Huang
    Advances in Computational Intelligence, 2022, 2 (2):
  • [24] Joint feature and instance selection using manifold data criteria: application to image classification
    Dornaika, Fadi
    ARTIFICIAL INTELLIGENCE REVIEW, 2021, 54 (03) : 1735 - 1765
  • [25] Joint feature and instance selection using manifold data criteria: application to image classification
    Fadi Dornaika
    Artificial Intelligence Review, 2021, 54 : 1735 - 1765
  • [26] Simultaneous instance and feature selection for improving prediction in special education data
    Villuendas-Rey, Yenny
    Rey-Benguria, Carmen
    Lytras, Miltiadis
    Yanez-Marquez, Cornelio
    Camacho-Nieto, Oscar
    PROGRAM-ELECTRONIC LIBRARY AND INFORMATION SYSTEMS, 2017, 51 (03) : 278 - 297
  • [27] Optimal instance subset selection from big data using genetic algorithm and open source framework
    Junhai Zhai
    Dandan Song
    Journal of Big Data, 9
  • [28] Cluster-based instance selection for machine classification
    Ireneusz Czarnowski
    Knowledge and Information Systems, 2012, 30 : 113 - 133
  • [29] An Approach to Sample Selection from Big Data for Classification
    Xing, Sheng
    He, Yulin
    Zhu, Hong
    Wang, Xizhao
    2016 IEEE INTERNATIONAL CONFERENCE ON SYSTEMS, MAN, AND CYBERNETICS (SMC), 2016, : 2928 - 2935
  • [30] Cluster-based instance selection for machine classification
    Czarnowski, Ireneusz
    KNOWLEDGE AND INFORMATION SYSTEMS, 2012, 30 (01) : 113 - 133