Towards Ultrahigh Dimensional Feature Selection for Big Data

被引:0
作者
Tan, Mingkui [1 ]
Tsang, Ivor W. [2 ]
Wang, Li [3 ]
机构
[1] Nanyang Technol Univ, Sch Comp Engn, Singapore 639798, Singapore
[2] Univ Technol Sydney, Ctr Quantum Computat & Intelligent Syst, Broadway, NSW 2007, Australia
[3] Univ Calif San Diego, Dept Math, La Jolla, CA 92093 USA
基金
澳大利亚研究理事会;
关键词
big data; ultrahigh dimensionality; feature selection; nonlinear feature selection; multiple kernel learning; feature generation; MULTIPLE; CLASSIFICATION; OPTIMIZATION; CONVERGENCE; ONLINE; CANCER; LASSO;
D O I
暂无
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
In this paper, we present a new adaptive feature scaling scheme for ultrahigh-dimensional feature selection on Big Data, and then reformulate it as a convex semi-infinite programming (SIP) problem. To address the SIP, we propose an efficient feature generating paradigm. Different from traditional gradient-based approaches that conduct optimization on all input features, the proposed paradigm iteratively activates a group of features, and solves a sequence of multiple kernel learning (MKL) subproblems. To further speed up the training, we propose to solve the MKL subproblems in their primal forms through a modified accelerated proximal gradient approach. Due to such optimization scheme, some efficient cache techniques are also developed. The feature generating paradigm is guaranteed to converge globally under mild conditions, and can achieve lower feature selection bias. Moreover, the proposed method can tackle two challenging tasks in feature selection: 1) group-based feature selection with complex structures, and 2) nonlinear feature selection with explicit feature mappings. Comprehensive experiments on a wide range of synthetic and real-world data sets of tens of million data points with O(10(14)) features demonstrate the competitive performance of the proposed method over state-of-the-art feature selection methods in terms of generalization performance and training efficiency.
引用
收藏
页码:1371 / 1429
页数:59
相关论文
共 50 条
  • [41] High dimensional data classification and feature selection using support vector machines
    Ghaddar, Bissan
    Naoum-Sawaya, Joe
    EUROPEAN JOURNAL OF OPERATIONAL RESEARCH, 2018, 265 (03) : 993 - 1004
  • [42] Improving Evolutionary Algorithm Performance for Feature Selection in High-Dimensional Data
    Cilia, N.
    De Stefano, C.
    Fontanella, F.
    di Freca, A. Scotto
    APPLICATIONS OF EVOLUTIONARY COMPUTATION, EVOAPPLICATIONS 2018, 2018, 10784 : 439 - 454
  • [43] Genetic programming for feature construction and selection in classification on high-dimensional data
    Binh Tran
    Xue, Bing
    Zhang, Mengjie
    MEMETIC COMPUTING, 2016, 8 (01) : 3 - 15
  • [44] Distributed Fuzzy Cognitive Maps for Feature Selection in Big Data Classification
    Haritha, K.
    Judy, M., V
    Papageorgiou, Konstantinos
    Georgiannis, Vassilis C.
    Papageorgiou, Elpiniki
    ALGORITHMS, 2022, 15 (10)
  • [45] Toward feature selection in big data preprocessing based on hybrid cloud-based model
    Shehab, Noha
    Badawy, Mahmoud
    Ali, H. Arafat
    JOURNAL OF SUPERCOMPUTING, 2022, 78 (03) : 3226 - 3265
  • [46] Toward feature selection in big data preprocessing based on hybrid cloud-based model
    Noha Shehab
    Mahmoud Badawy
    H Arafat Ali
    The Journal of Supercomputing, 2022, 78 : 3226 - 3265
  • [47] Simultaneous feature and instance selection in big noisy data using memetic variable neighborhood search
    Lin, Chun-Cheng
    Kang, Jia-Rong
    Liang, Yu-Lin
    Kuo, Chih-Chi
    APPLIED SOFT COMPUTING, 2021, 112
  • [48] Multi-Criteria Feature Selection Based Intrusion Detection for Internet of Things Big Data
    Wang, Jie
    Xiong, Xuanrui
    Chen, Gaosheng
    Ouyang, Ruiqi
    Gao, Yunli
    Alfarraj, Osama
    SENSORS, 2023, 23 (17)
  • [49] Link based BPSO for feature selection in big data text clustering
    Kushwaha, Neetu
    Pant, Millie
    FUTURE GENERATION COMPUTER SYSTEMS-THE INTERNATIONAL JOURNAL OF ESCIENCE, 2018, 82 : 190 - 199
  • [50] Hybrid Efficient Genetic Algorithm for Big Data Feature Selection Problems
    Mohammed, Tareq Abed
    Bayat, Oguz
    Ucan, Osman N.
    Alhayali, Shaymaa
    FOUNDATIONS OF SCIENCE, 2020, 25 (04) : 1009 - 1025