A Genetic Algorithm Based Modification on the LTS Algorithm for Large Data Sets

被引:1
|
作者
Satman, M. Hakan [1 ]
机构
[1] Istanbul Univ, Dept Econometr, TR-34 Istanbul, Turkey
关键词
C-steps; Genetic algorithms; Least trimmed squares regression; Outliers; Robust regression; TRIMMED SQUARES REGRESSION; OUTLIERS;
D O I
10.1080/03610918.2011.598989
中图分类号
O21 [概率论与数理统计]; C8 [统计学];
学科分类号
020208 ; 070103 ; 0714 ;
摘要
The authors introduce an algorithm for estimating the least trimmed squares (LTS) parameters in large data sets. The algorithm performs a genetic algorithm search to form a basic subset that is unlikely to contain outliers. Rousseeuw and van Driessen (2006) suggested drawing independent basic subsets and iterating C-steps many times to minimize LTS criterion. The authors 'algorithm constructs a genetic algorithm to form a basic subset and iterates C-steps to calculate the cost value of the LTS criterion. Genetic algorithms are successful methods for optimizing nonlinear objective functions but they are slower in many cases. The genetic algorithm configuration in the algorithm can be kept simple because a small number of observations are searched from the data. An R package is prepared to perform Monte Carlo simulations on the algorithm. Simulation results show that the performance of the algorithm is suitable for even large data sets because a small number of trials is always performed.
引用
收藏
页码:644 / 652
页数:9
相关论文
共 50 条
  • [41] Sorting Algorithm for Medium and Large Data Sets Based on Multi-Level Independent Subarrays
    Moghaddam, Kiaksar Shirvani
    Moghaddam, Shahriar Shirvani
    2021 IEEE INTERNATIONAL CONFERENCE ON COMMUNICATION, NETWORKS AND SATELLITE (COMNETSAT 2021), 2021, : 152 - 156
  • [42] A GA-based clustering algorithm for large data sets with mixed numeric and categorical values
    Li, J
    Gao, XB
    Jiao, LC
    ICCIMA 2003: FIFTH INTERNATIONAL CONFERENCE ON COMPUTATIONAL INTELLIGENCE AND MULTIMEDIA APPLICATIONS, PROCEEDINGS, 2003, : 102 - 107
  • [43] Research of DNA Sequences Sets Based on the Improved Genetic Algorithm
    Xia, Xue
    Zhang, Qiang
    Wang, Bin
    Zhao, Yuxia
    JOURNAL OF COMPUTATIONAL AND THEORETICAL NANOSCIENCE, 2012, 9 (07) : 969 - 973
  • [44] A genetic algorithm based on eigen fuzzy sets for image reconstruction
    Di Martino, Ferdinando
    Sessa, Salvatore
    APPLICATIONS OF FUZZY SETS THEORY, 2007, 4578 : 342 - +
  • [45] A Valid Clustering Algorithm for High-dimensional Large Data Sets Based on Distributed Method
    Guo Xian e
    Yan Junmei
    PROCEEDINGS OF 2009 INTERNATIONAL WORKSHOP ON INFORMATION SECURITY AND APPLICATION, 2009, : 1 - 6
  • [46] Genetic algorithm-based design for DNA sequences sets
    Zhang, Qiang
    Wang, Bin
    Zhang, Rui
    Xu, Chun-Xia
    Jisuanji Xuebao/Chinese Journal of Computers, 2008, 31 (12): : 2193 - 2199
  • [47] A EM Probabilistic Clustering Algorithm for Large Scale Data Sets based on Partial Constraints Information
    Yan S.
    Shunlin S.
    Yuquan Z.
    Advances in Information Sciences and Service Sciences, 2011, 3 (10): : 20 - 29
  • [48] A GA-based clustering algorithm for large data sets with mixed numeric and categorical values
    Li, J
    Gao, XB
    Jiao, LC
    THIRD INTERNATIONAL SYMPOSIUM ON MULTISPECTRAL IMAGE PROCESSING AND PATTERN RECOGNITION, PTS 1 AND 2, 2003, 5286 : 171 - 174
  • [49] A hybrid algorithm for K-medoid clustering of large data sets
    Sheng, WG
    Liu, XH
    CEC2004: PROCEEDINGS OF THE 2004 CONGRESS ON EVOLUTIONARY COMPUTATION, VOLS 1 AND 2, 2004, : 77 - 82
  • [50] New algorithm for computing cube on very large compressed data sets
    Wu, Weili
    Gao, Hong
    Li, Jianzhong
    IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2006, 18 (12) : 1667 - 1680