A Genetic Algorithm Based Modification on the LTS Algorithm for Large Data Sets

被引:1
|
作者
Satman, M. Hakan [1 ]
机构
[1] Istanbul Univ, Dept Econometr, TR-34 Istanbul, Turkey
关键词
C-steps; Genetic algorithms; Least trimmed squares regression; Outliers; Robust regression; TRIMMED SQUARES REGRESSION; OUTLIERS;
D O I
10.1080/03610918.2011.598989
中图分类号
O21 [概率论与数理统计]; C8 [统计学];
学科分类号
020208 ; 070103 ; 0714 ;
摘要
The authors introduce an algorithm for estimating the least trimmed squares (LTS) parameters in large data sets. The algorithm performs a genetic algorithm search to form a basic subset that is unlikely to contain outliers. Rousseeuw and van Driessen (2006) suggested drawing independent basic subsets and iterating C-steps many times to minimize LTS criterion. The authors 'algorithm constructs a genetic algorithm to form a basic subset and iterates C-steps to calculate the cost value of the LTS criterion. Genetic algorithms are successful methods for optimizing nonlinear objective functions but they are slower in many cases. The genetic algorithm configuration in the algorithm can be kept simple because a small number of observations are searched from the data. An R package is prepared to perform Monte Carlo simulations on the algorithm. Simulation results show that the performance of the algorithm is suitable for even large data sets because a small number of trials is always performed.
引用
收藏
页码:644 / 652
页数:9
相关论文
共 50 条
  • [1] A genetic algorithm for clustering on very large data sets
    Gasvoda, J
    Ding, Q
    COMPUTER APPLICATIONS IN INDUSTRY AND ENGINEERING, 2003, : 163 - 167
  • [2] A Genetic Algorithm Approach for Clustering Large Data Sets
    Luchi, Diego
    Rodrigues, Alexandre
    Varejao, Flavio Miguel
    Santos, Willian
    2016 IEEE 28TH INTERNATIONAL CONFERENCE ON TOOLS WITH ARTIFICIAL INTELLIGENCE (ICTAI 2016), 2016, : 570 - 576
  • [3] A Data Allocation Strategy Algorithm for Large Databases based on Genetic Algorithm
    Li, Xiaofeng
    Tang, Yuanxin
    Gong, Cuicui
    COMPUTATIONAL MATERIALS SCIENCE, PTS 1-3, 2011, 268-270 : 898 - +
  • [4] Computing LTS Regression for Large Data Sets
    PETER J. ROUSSEEUW
    KATRIEN VAN DRIESSEN
    Data Mining and Knowledge Discovery, 2006, 12 : 29 - 45
  • [5] Computing LTS regression for large data sets
    Rousseeuw, PJ
    Van Driessen, K
    DATA MINING AND KNOWLEDGE DISCOVERY, 2006, 12 (01) : 29 - 45
  • [6] Handling imbalanced data sets with a modification of Decorate algorithm
    Kotsiantis, Sotiris B.
    INTERNATIONAL JOURNAL OF COMPUTER APPLICATIONS IN TECHNOLOGY, 2008, 33 (2-3) : 91 - 98
  • [7] FCM-based clustering algorithm ensemble for large data sets
    Li, Jie
    Gao, Xinbo
    Tian, Chunna
    FUZZY SYSTEMS AND KNOWLEDGE DISCOVERY, PROCEEDINGS, 2006, 4223 : 559 - 567
  • [8] DESCRY: A density based clustering algorithm for very large data sets
    Angiulli, F
    Pizzuti, C
    Ruffolo, M
    INTELLIGENT DAA ENGINEERING AND AUTOMATED LEARNING IDEAL 2004, PROCEEDINGS, 2004, 3177 : 203 - 210
  • [9] Knowledge Reduction Algorithm for Rough Sets based on Adaptive Genetic Algorithm
    Hou Ruidong
    Zhang Xiaohui
    Pan Wei
    Mao Ning
    2008 CHINESE CONTROL AND DECISION CONFERENCE, VOLS 1-11, 2008, : 5162 - 5166
  • [10] AN ALGORITHM FOR THE PRINCIPAL COMPONENT ANALYSIS OF LARGE DATA SETS
    Halko, Nathan
    Martinsson, Per-Gunnar
    Shkolnisky, Yoel
    Tygert, Mark
    SIAM JOURNAL ON SCIENTIFIC COMPUTING, 2011, 33 (05): : 2580 - 2594