HIGHLY ROBUST METHODS IN DATA MINING

被引:11
|
作者
Kalina, Jan [1 ]
机构
[1] Acad Sci Czech Republ, Inst Comp Sci, Vodarenskou Vezi 2, Prague 18207 8, Czech Republic
关键词
Data mining; robust statistics; High-dimensional data; Cluster analysis; Logistic regression; Neuralnetworks;
D O I
10.5937/sjm8-3226
中图分类号
C93 [管理学];
学科分类号
12 ; 1201 ; 1202 ; 120202 ;
摘要
This paper is devoted to highly robust methods for information extraction from data, with a special attention paid to methods suitable for management applications. The sensitivity of available data mining methods to the presence of outlying measurements in the observed data is discussed as a major drawback of available data mining methods. The paper proposes several newhighly robust methods for data mining, which are based on the idea of implicit weighting of individual data values. Particularly it propose a novel robust method of hierarchical cluster analysis, which is a popular data mining method of unsupervised learning. Further, a robust method for estimating parameters in the logistic regression was proposed. This idea is extended to a robust multinomial logistic classification analysis. Finally, the sensitivity of neural networks to the presence of noise and outlying measurements in the data was discussed. The method for robust training of neural networks for the task of function approximation, which has the form of a robust estimator in nonlinear regression, was proposed.
引用
收藏
页码:9 / 24
页数:16
相关论文
共 50 条
  • [1] Robust learning from bites for data mining
    Christmann, Andreas
    Steinwart, Ingo
    Hubert, Mia
    COMPUTATIONAL STATISTICS & DATA ANALYSIS, 2007, 52 (01) : 347 - 361
  • [2] A Comparison of Data Mining Methods in Analyzing Educational Data
    Jung, Euihyun
    ADVANCES IN COMPUTER SCIENCE AND UBIQUITOUS COMPUTING, 2017, 421 : 173 - 178
  • [3] Data Mining and Machine Learning Methods for Robust Reliability Predictions on Automotive Components
    Bonato, Marco
    Krishnamoorthy, Murali
    Goge, Philippe
    2022 68TH ANNUAL RELIABILITY AND MAINTAINABILITY SYMPOSIUM (RAMS 2022), 2022,
  • [4] Prediction of mortality in patients with cardiovascular disease using data mining methods
    Imamovic, Damir
    Babovic, Elmir
    Bijedic, Nina
    2020 19TH INTERNATIONAL SYMPOSIUM INFOTEH-JAHORINA (INFOTEH), 2020,
  • [5] Text Mining of Highly Cited Publications in Data Mining
    Jayasekara, P. K.
    Abu, K. S.
    IEEE 5TH INTERNATIONAL SYMPOSIUM ON EMERGING TRENDS AND TECHNOLOGIES IN LIBRARIES AND INFORMATION SERVICES (ETTLIS 2018), 2018, : 128 - 130
  • [6] Data mining methods with trees
    Zambochova, Marta
    E & M EKONOMIE A MANAGEMENT, 2008, 11 (01): : 126 - 131
  • [7] Advanced Methods for Data Mining
    David, Nicoleta
    Patrascu, Neculai
    Carstea, Claudia-Georgeta
    Patrascu, Lucian
    Ratiu, Ioan-Gheorghe
    Damian, Daniela
    PROCEEDINGS OF THE 8TH WSEAS INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE, KNOWLEDGE ENGINEERING AND DATA BASES, 2009, : 407 - 412
  • [8] Wavelet Methods in Data Mining
    Manchanda, P.
    EMERGING APPLICATIONS OF WAVELET METHODS, 2012, 1463 : 103 - 131
  • [9] Comparative analysis of data mining methods for bankruptcy prediction
    Olson, David L.
    Delen, Dursun
    Meng, Yanyan
    DECISION SUPPORT SYSTEMS, 2012, 52 (02) : 464 - 473
  • [10] Which method to use? An assessment of data mining methods in Environmental Data Science
    Gibert, Karina
    Izquierdo, Joaquin
    Sanchez-Marre, Miquel
    Hamilton, Serena H.
    Rodriguez-Roda, Ignasi
    Holmes, Geoff
    ENVIRONMENTAL MODELLING & SOFTWARE, 2018, 110 : 3 - 27