Improvement in Hadoop performance using integrated feature extraction and machine learning algorithms

被引:7
|
作者
Sarumathiy, C. K. [1 ]
Geetha, K. [1 ]
Rajan, C. [2 ]
机构
[1] Excel Engn Coll, Dept CSE, Komarapalayam, Tamil Nadu, India
[2] KS Rangasamy Coll Technol, Dept IT, Tiruchengode, Tamil Nadu, India
关键词
Big Data; Hadoop system; MapReduce; Feature selection; Correlation-based feature selection (CFS); Mutual information (MI); AdaBoost and support vector machine (SVM); CLASSIFIER;
D O I
10.1007/s00500-019-04453-x
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Big Data has been a term used in datasets which are complex and large in such a way there are some traditional technologies of data processing which are not adequate. Big Data can revolutionize most aspects in society such as collection or management of data from Big Data which is challenging and also very complex. The Hadoop has been designed for processing a large amount of unstructured and complex data. It has provided with a large amount of storage for data along with the ability to be able to tackle unlimited and concurrent tasks or jobs. The selection of features is an extremely powerful technique in the reduction of dimensionality and is also the most important step in machine learning applications. In recent decades, data is getting larger in a progressive manner in terms of instances and numbers making it very hard to deal with the problem of feature selection. In order to cope with such an epoch of Big Data, there are some more new techniques that are required to address the problem in a more efficient manner. At the same time, the suitability of the algorithms currently used may not be applicable especially when the size of data is above hundreds of gigabytes. For the purpose of this work, the correlation-based feature selection along with mutual information-based methods of feature selection was used for improving the performance. The AdaBoost and the support vector machine based classifiers have been used for improving their accuracy. The results of the experiment prove that the method proposed was able to achieve better performance compared to that of the other methods.
引用
收藏
页码:627 / 636
页数:10
相关论文
共 50 条
  • [1] Machine learning by multi-feature extraction using genetic algorithms
    Shafti, LS
    Pérez, E
    ADVANCES IN ARTIFICIAL INTELLIGENCE - IBERAMIA 2004, 2004, 3315 : 246 - 255
  • [2] Investigating the performance of Hadoop and Spark platforms on machine learning algorithms
    Ali Mostafaeipour
    Amir Jahangard Rafsanjani
    Mohammad Ahmadi
    Joshuva Arockia Dhanraj
    The Journal of Supercomputing, 2021, 77 : 1273 - 1300
  • [3] Investigating the performance of Hadoop and Spark platforms on machine learning algorithms
    Mostafaeipour, Ali
    Rafsanjani, Amir Jahangard
    Ahmadi, Mohammad
    Dhanraj, Joshuva Arockia
    JOURNAL OF SUPERCOMPUTING, 2021, 77 (02): : 1273 - 1300
  • [4] Intrusion Detection System Using Feature Extraction with Machine Learning Algorithms in IoT
    Musleh, Dhiaa
    Alotaibi, Meera
    Alhaidari, Fahd
    Rahman, Atta
    Mohammad, Rami M.
    JOURNAL OF SENSOR AND ACTUATOR NETWORKS, 2023, 12 (02)
  • [5] Research on Machine Learning Algorithms and Feature Extraction for Time Series
    Li, Lei
    Wu, Yabin
    Ou, Yihang
    Li, Qi
    Zhou, Yanquan
    Chen, Daoxin
    2017 IEEE 28TH ANNUAL INTERNATIONAL SYMPOSIUM ON PERSONAL, INDOOR, AND MOBILE RADIO COMMUNICATIONS (PIMRC), 2017,
  • [6] Optimizing the Hyperparameter of Feature Extraction and Machine Learning Classification Algorithms
    Isa, Sani Muhammad
    Suwandi, Rizaldi
    Andrean, Yosefina Pricilia
    INTERNATIONAL JOURNAL OF ADVANCED COMPUTER SCIENCE AND APPLICATIONS, 2019, 10 (03) : 69 - 76
  • [7] Electromyography based hand movement classification and feature extraction using machine learning algorithms
    Ekinci, Ekin
    Garip, Zeynep
    Serbest, Kasim
    JOURNAL OF POLYTECHNIC-POLITEKNIK DERGISI, 2023, 26 (04):
  • [8] Exploration of machine algorithms based on deep learning model and feature extraction
    Qian, Yufeng
    MATHEMATICAL BIOSCIENCES AND ENGINEERING, 2021, 18 (06) : 7602 - 7618
  • [9] Facial geometric feature extraction based emotional expression classification using machine learning algorithms
    Murugappan, M.
    Mutawa, A.
    PLOS ONE, 2021, 16 (02):
  • [10] Performance Enhancement of Intrusion Detection System Using Machine Learning Algorithms with Feature Selection
    Raju, Anuradha Samkham
    Rashid, Md Mamunur
    Sabrina, Fariza
    2021 31ST INTERNATIONAL TELECOMMUNICATION NETWORKS AND APPLICATIONS CONFERENCE (ITNAC), 2021, : 34 - 39