A MAPREDUCE BASED FRAMEWORK TO PERFORM FULL MODEL SELECTION IN VERY LARGE DATASETS

被引:0
作者
Diaz Pacheco, Angel [1 ]
Gonzalez-Bernal, Jesus A. [1 ]
Reyes-Garcia, Carlos A. [1 ]
机构
[1] INAOE, Comp Sci Dept, Luis Enrique Erro 1, Puebla 72840, Mexico
来源
IADIS-INTERNATIONAL JOURNAL ON COMPUTER SCIENCE AND INFORMATION SYSTEMS | 2018年 / 13卷 / 01期
关键词
Model Selection; MapReduce; Big Data;
D O I
暂无
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
The analysis of large amounts of data has become an important task in science and business that led to the emergence of the Big Data paradigm. This paradigm owes its name to data objects too large to be processed by standard hardware and algorithms. Many data analysis tasks involve the use of machine learning techniques. The goal of predictive models consists on achieving the highest possible accuracy to predict new samples, and for this reason there is high interest in selecting the most suitable algorithm for a specific dataset. Selecting the most suitable algorithm together with feature selection and data preparation techniques integrates the Full Model Selection paradigm and it has been widely studied in datasets of common size, but poorly explored in the Big Data context. As an effort to explore in this direction, this work proposes a framework adjustable to any population based meta-heuristic methods in order to perform model selection under the MapReduce paradigm.
引用
收藏
页码:1 / 13
页数:13
相关论文
共 29 条
  • [1] Apacheorg, 2017, ML TUN MOD SEL HYP T
  • [2] Bergstra J, 2012, J MACH LEARN RES, V13, P281
  • [3] A multi-model selection framework for unknown and/or evolutive misclassification cost problems
    Chatelain, Clement
    Adam, Sebastien
    Lecourtier, Yves
    Heutte, Laurent
    Paquet, Thierry
    [J]. PATTERN RECOGNITION, 2010, 43 (03) : 815 - 823
  • [4] Dean J, 2004, USENIX ASSOCIATION PROCEEDINGS OF THE SIXTH SYMPOSIUM ON OPERATING SYSTEMS DESIGN AND IMPLEMENTATION (OSDE '04), P137
  • [5] A MapReduce Approach to Address Big Data Classification Problems Based on the Fusion of Linguistic Fuzzy Rules
    del Rio, Sara
    Lopez, Victoria
    Manuel Benitez, Jose
    Herrera, Francisco
    [J]. INTERNATIONAL JOURNAL OF COMPUTATIONAL INTELLIGENCE SYSTEMS, 2015, 8 (03) : 422 - 437
  • [6] On the use of MapReduce for imbalanced big data using Random Forest
    del Rio, Sara
    Lopez, Victoria
    Manuel Benitez, Jose
    Herrera, Francisco
    [J]. INFORMATION SCIENCES, 2014, 285 : 112 - 137
  • [7] Eshelman L.J., 1991, FDN GENETIC ALGORITH, P265, DOI DOI 10.1016/B978-0-08-050684-5.50020-3
  • [8] Goodrich MT, 2011, LECT NOTES COMPUT SC, V7074, P374, DOI 10.1007/978-3-642-25591-5_39
  • [9] Guller M., 2015, BIG DATA ANAL SPARK
  • [10] A novel LS-SVMs hyper-parameter selection based on particle swarm optimization
    Guo, X. C.
    Yang, J. H.
    Wu, C. G.
    Wang, C. Y.
    Liang, Y. C.
    [J]. NEUROCOMPUTING, 2008, 71 (16-18) : 3211 - 3215