Fast and simple dataset selection for machine learning

被引:5
作者
Peter, Timm J. [1 ]
Nelles, Oliver [1 ]
机构
[1] Univ Siegen, Inst Mechan & Regelungstech Mechatron, Dept Maschinenbau, Paul Bonatz Str 9-11, D-57068 Siegen, Germany
关键词
machine learning; dataset selection; design of experiments; space-filling design; domain adaptation;
D O I
10.1515/auto-2019-0010
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
The task of data reduction is discussed and a novel selection approach which allows to control the optimal point distribution of the selected data subset is proposed. The proposed approach utilizes the estimation of probability density functions (pdfs). Due to its structure, the new method is capable of selecting a subset either by approximating the pdf of the original dataset or by approximating an arbitrary, desired target pdf. The new strategy evaluates the estimated pdfs solely on the selected data points, resulting in a simple and efficient algorithm with low computational and memory demand. The performance of the new approach is investigated for two different scenarios. For representative subset selection of a dataset, the new approach is compared to a recently proposed, more complex method and shows comparable results. For the demonstration of the capability of matching a target pdf, a uniform distribution is chosen as an example. Here the new method is compared to strategies for space-filling design of experiments and shows convincing results.
引用
收藏
页码:833 / 842
页数:10
相关论文
共 50 条
  • [41] Application of machine learning in stock selection
    Li, Pengfei
    Xu, Jungang
    AI-Hamami, Mohammad
    APPLIED MATHEMATICS AND NONLINEAR SCIENCES, 2022, : 2413 - 2424
  • [42] Probabilistic Feature Selection in Machine Learning
    Ghosh, Indrajit
    ARTIFICIAL INTELLIGENCE AND SOFT COMPUTING, ICAISC 2018, PT I, 2018, 10841 : 623 - 632
  • [43] ModelSet: a dataset for machine learning in model-driven engineering
    Hernandez Lopez, Jose Antonio
    Canovas Izquierdo, Javier Luis
    Sanchez Cuadrado, Jesus
    SOFTWARE AND SYSTEMS MODELING, 2022, 21 (03) : 967 - 986
  • [44] Automatic dataset builder for Machine Learning applications to satellite imagery
    Sebastianelli, Alessandro
    Del Rosso, Maria Pia
    Ullo, Silvia Liberata
    SOFTWAREX, 2021, 15
  • [45] High Quality Dataset for Machine Learning in the Business Intelligence Domain
    Franchina, Luisa
    Sergiani, Federico
    INTELLIGENT SYSTEMS AND APPLICATIONS, VOL 1, 2020, 1037 : 391 - 401
  • [46] Fuzzy machine learning logic utilization on hormonal imbalance dataset
    Khushal R.
    Fatima U.
    Computers in Biology and Medicine, 2024, 174
  • [47] ModelSet: a dataset for machine learning in model-driven engineering
    José Antonio Hernández López
    Javier Luis Cánovas Izquierdo
    Jesús Sánchez Cuadrado
    Software and Systems Modeling, 2022, 21 : 967 - 986
  • [48] Discrepancies in Stroke Distribution and Dataset Origin in Machine Learning for Stroke
    Velagapudi, Lohit
    Mouchtouris, Nikolaos
    Baldassari, Michael P.
    Nauheim, David
    Khanna, Omaditya
    Al Saiegh, Fadi
    Herial, Nabeel
    Gooch, M. Reid
    Tjoumakaris, Stavropoula
    Rosenwasser, Robert H.
    Jabbour, Pascal
    JOURNAL OF STROKE & CEREBROVASCULAR DISEASES, 2021, 30 (07)
  • [49] Date grading using machine learning techniques on a novel dataset
    Raissouli H.
    Aljabri A.A.
    Aljudaibi S.M.
    Haron F.
    Alharbi G.
    International Journal of Advanced Computer Science and Applications, 2020, 11 (08): : 758 - 765
  • [50] Applying machine learning to the dynamic selection of replenishment policies in fast-changing supply chain environments
    Priore, Paolo
    Ponte, Borja
    Rosillo, Rafael
    de la Fuente, David
    INTERNATIONAL JOURNAL OF PRODUCTION RESEARCH, 2019, 57 (11) : 3663 - 3677