Improving Instance Selection Methods for Big Data Classification

被引:0
|
作者
Malhat, Mohamed [1 ]
El Menshawy, Mohamed [1 ]
Mousa, Hamdy [1 ]
El Sisi, Ashraf [1 ]
机构
[1] Menoufia Univ, Fac Comp & Informat, Comp Sci Dept, Shibin Al Kawm, Egypt
来源
2017 13TH INTERNATIONAL COMPUTER ENGINEERING CONFERENCE (ICENCO) | 2017年
关键词
Big data; Data Mining; Data Reduction; Instance Selection; REDUCTION;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The explosion of data in many application domains leads to a new term called big data. While the big data volume rapidly exceeds, the capacity and processing capabilities of contributed data mining algorithms are not effective. The instance selection methods become a mandatory step prior to applying data mining algorithms. Instance selection methods scale training set down by eliminating redundant, erroneous, and unrelated instances. Recently, instance selection methods have improved to work on big data sets by splitting training data into disjoint subsets and applying instance selection methods on individual subsets. However, these improved methods have a variable performance in the degree of reduction rate and classification accuracy. In this work, we propose an operational and unified framework to balance between reduction rate and classification accuracy. It starts with splitting a training set into class-balanced subsets to analyze the impact of the splitting process on the performance regarding the reduction rate and classification accuracy. It then applies two different instance selection methods on each subset. We implement and test experimentally the framework using two standard data sets. With the random splitting process as a benchmark, the results prove that the class-balanced splitting process is preferred regarding the classification accuracy criterion. The results also depict that the combination of two instance selection methods remarkably reduces the performance variability.
引用
收藏
页码:213 / 218
页数:6
相关论文
共 50 条
  • [41] Fault Line Selection Method for Distribution System Based on Big Data and Feature Classification
    Shao, Zheng
    Wang, Liancheng
    PROCEEDINGS OF 2017 CHINA INTERNATIONAL ELECTRICAL AND ENERGY CONFERENCE (CIEEC 2017), 2017, : 820 - 825
  • [42] Uncertainty Based Optimal Sample Selection for Big Data
    Ajmal, Saadia
    Ashfaq, Rana Aamir Raza
    Saleem, Kashif
    IEEE ACCESS, 2023, 11 : 6284 - 6292
  • [43] Instance selection by genetic-based biological algorithm
    Chen, Zong-Yao
    Tsai, Chih-Fong
    Eberle, William
    Lin, Wei-Chao
    Ke, Shih-Wen
    SOFT COMPUTING, 2015, 19 (05) : 1269 - 1282
  • [44] A Global Density-based Approach for Instance Selection
    Carbonera, Joel Luis
    PROCEEDINGS OF THE 23RD INTERNATIONAL CONFERENCE ON ENTERPRISE INFORMATION SYSTEMS (ICEIS 2021), VOL 1, 2021, : 402 - 409
  • [45] Boosting instance selection algorithms
    Garcia-Pedrajas, Nicolas
    de Haro-Garcia, Aida
    KNOWLEDGE-BASED SYSTEMS, 2014, 67 : 342 - 360
  • [46] Simultaneous feature and instance selection in big noisy data using memetic variable neighborhood search
    Lin, Chun-Cheng
    Kang, Jia-Rong
    Liang, Yu-Lin
    Kuo, Chih-Chi
    APPLIED SOFT COMPUTING, 2021, 112
  • [47] Simple Incremental Instance Selection Wrapper for Classification
    Grochowski, Marek
    ARTIFICIAL INTELLIGENCE AND SOFT COMPUTING, PT II, 2012, 7268 : 64 - 72
  • [48] Proposal of big data route selection methods for autonomous vehicles
    Reddig, Klaudia
    Dikunow, Blazej
    Krzykowska, Karolina
    INTERNET TECHNOLOGY LETTERS, 2018, 1 (05):
  • [49] Graph reduction techniques for instance selection: comparative and empirical study
    Rustamov, Zahiriddin
    Zaki, Nazar
    Rustamov, Jaloliddin
    Zaitouny, Ayham
    Damseh, Rafat
    ARTIFICIAL INTELLIGENCE REVIEW, 2024, 58 (02)
  • [50] Magnetic Force Classifier: A Novel Method for Big Data Classification
    Hassanat, Ahmad B.
    Ali, Hasan N.
    Tarawneh, Ahmad S.
    Alrashidi, Malek
    Alghamdi, Mansoor
    Altarawneh, Ghada Awad
    Abbadi, Mohammad Ali
    IEEE ACCESS, 2022, 10 : 12592 - 12606