Fast and simple dataset selection for machine learning

被引:5
作者
Peter, Timm J. [1 ]
Nelles, Oliver [1 ]
机构
[1] Univ Siegen, Inst Mechan & Regelungstech Mechatron, Dept Maschinenbau, Paul Bonatz Str 9-11, D-57068 Siegen, Germany
关键词
machine learning; dataset selection; design of experiments; space-filling design; domain adaptation;
D O I
10.1515/auto-2019-0010
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
The task of data reduction is discussed and a novel selection approach which allows to control the optimal point distribution of the selected data subset is proposed. The proposed approach utilizes the estimation of probability density functions (pdfs). Due to its structure, the new method is capable of selecting a subset either by approximating the pdf of the original dataset or by approximating an arbitrary, desired target pdf. The new strategy evaluates the estimated pdfs solely on the selected data points, resulting in a simple and efficient algorithm with low computational and memory demand. The performance of the new approach is investigated for two different scenarios. For representative subset selection of a dataset, the new approach is compared to a recently proposed, more complex method and shows comparable results. For the demonstration of the capability of matching a target pdf, a uniform distribution is chosen as an example. Here the new method is compared to strategies for space-filling design of experiments and shows convincing results.
引用
收藏
页码:833 / 842
页数:10
相关论文
共 50 条
  • [31] A MULTIMODAL DATASET FOR FOREST DAMAGE DETECTION AND MACHINE LEARNING
    Yailymova, Hanna
    Yailymov, Bohdan
    Salii, Yevhenii
    Kuzin, Volodymyr
    Kussul, Nataliia
    Shelestov, Andrii
    IGARSS 2024-2024 IEEE INTERNATIONAL GEOSCIENCE AND REMOTE SENSING SYMPOSIUM, IGARSS 2024, 2024, : 2949 - 2953
  • [32] An Indoor Sound Source Localization Dataset for Machine Learning
    Wu, Tao
    Jiang, Yong
    Li, Nan
    Feng, Tao
    PROCEEDINGS OF 2018 THE 2ND INTERNATIONAL CONFERENCE ON COMPUTER SCIENCE AND ARTIFICIAL INTELLIGENCE (CSAI 2018) / 2018 THE 10TH INTERNATIONAL CONFERENCE ON INFORMATION AND MULTIMEDIA TECHNOLOGY (ICIMT 2018), 2018, : 28 - 32
  • [33] AESA Antennas using Machine Learning with Reduced Dataset
    Zaib, Alam
    Masood, Abdur Rehman
    Abdullah, Muhammad Asad
    Khattak, Shahid
    Bin Saleem, Aasim
    Ullah, Irfan
    RADIOENGINEERING, 2024, 33 (03) : 397 - 405
  • [34] Simple Deterministic Selection-Based Genetic Algorithm for Hyperparameter Tuning of Machine Learning Models
    Raji, Ismail Damilola
    Bello-Salau, Habeeb
    Umoh, Ime Jarlath
    Onumanyi, Adeiza James
    Adegboye, Mutiu Adesina
    Salawudeen, Ahmed Tijani
    APPLIED SCIENCES-BASEL, 2022, 12 (03):
  • [35] Fast and Intelligent Antenna Design Optimization using Machine Learning
    Gampala, Gopinath
    Reddy, C. J.
    APPLIED COMPUTATIONAL ELECTROMAGNETICS SOCIETY JOURNAL, 2020, 35 (11): : 1350 - 1351
  • [36] GIR dataset: A geometry and real impulse response dataset for machine learning research in acoustics
    Xydis, Achilleas
    Perraudin, Nathanael
    Rust, Romana
    Heutschi, Kurt
    Casas, Gonzalo
    Grognuz, Oksana Riba
    Eggenschwiler, Kurt
    Kohler, Matthias
    Perez-Cruz, Fernando
    APPLIED ACOUSTICS, 2023, 208
  • [37] Fast and Intelligent Antenna Design Optimization using Machine Learning
    Gampala, Gopinath
    Reddy, C. J.
    2020 INTERNATIONAL APPLIED COMPUTATIONAL ELECTROMAGNETICS SOCIETY SYMPOSIUM (2020 ACES-MONTEREY), 2020,
  • [38] Impact of machine learning on personnel selection
    Campion, Emily D.
    Campion, Michael A.
    ORGANIZATIONAL DYNAMICS, 2024, 53 (01)
  • [39] Probabilistic Feature Selection in Machine Learning
    Ghosh, Indrajit
    ARTIFICIAL INTELLIGENCE AND SOFT COMPUTING, ICAISC 2018, PT I, 2018, 10841 : 623 - 632
  • [40] Application of machine learning in stock selection
    Li, Pengfei
    Xu, Jungang
    AI-Hamami, Mohammad
    APPLIED MATHEMATICS AND NONLINEAR SCIENCES, 2022, : 2413 - 2424