Towards Ultrahigh Dimensional Feature Selection for Big Data

被引:0
|
作者
Tan, Mingkui [1 ]
Tsang, Ivor W. [2 ]
Wang, Li [3 ]
机构
[1] Nanyang Technol Univ, Sch Comp Engn, Singapore 639798, Singapore
[2] Univ Technol Sydney, Ctr Quantum Computat & Intelligent Syst, Broadway, NSW 2007, Australia
[3] Univ Calif San Diego, Dept Math, La Jolla, CA 92093 USA
基金
澳大利亚研究理事会;
关键词
big data; ultrahigh dimensionality; feature selection; nonlinear feature selection; multiple kernel learning; feature generation; MULTIPLE; CLASSIFICATION; OPTIMIZATION; CONVERGENCE; ONLINE; CANCER; LASSO;
D O I
暂无
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
In this paper, we present a new adaptive feature scaling scheme for ultrahigh-dimensional feature selection on Big Data, and then reformulate it as a convex semi-infinite programming (SIP) problem. To address the SIP, we propose an efficient feature generating paradigm. Different from traditional gradient-based approaches that conduct optimization on all input features, the proposed paradigm iteratively activates a group of features, and solves a sequence of multiple kernel learning (MKL) subproblems. To further speed up the training, we propose to solve the MKL subproblems in their primal forms through a modified accelerated proximal gradient approach. Due to such optimization scheme, some efficient cache techniques are also developed. The feature generating paradigm is guaranteed to converge globally under mild conditions, and can achieve lower feature selection bias. Moreover, the proposed method can tackle two challenging tasks in feature selection: 1) group-based feature selection with complex structures, and 2) nonlinear feature selection with explicit feature mappings. Comprehensive experiments on a wide range of synthetic and real-world data sets of tens of million data points with O(10(14)) features demonstrate the competitive performance of the proposed method over state-of-the-art feature selection methods in terms of generalization performance and training efficiency.
引用
收藏
页码:1371 / 1429
页数:59
相关论文
共 50 条
  • [1] Towards ultrahigh dimensional feature selection for big data
    Tan, Mingkui
    Tsang, Ivor W.
    Wang, Li
    Journal of Machine Learning Research, 2014, 15 : 1371 - 1429
  • [2] Towards Scalable and Accurate Online Feature Selection for Big Data
    Yu, Kui
    Wu, Xindong
    Ding, Wei
    Pei, Jian
    2014 IEEE INTERNATIONAL CONFERENCE ON DATA MINING (ICDM), 2014, : 660 - 669
  • [3] Ultrahigh dimensional time course feature selection
    Xu, Peirong
    Zhu, Lixing
    Li, Yi
    BIOMETRICS, 2014, 70 (02) : 356 - 365
  • [4] A STUDY ON FEATURE SELECTION IN BIG DATA
    Manikandan, R. P. S.
    Kalpana, A. M.
    2017 INTERNATIONAL CONFERENCE ON COMPUTER COMMUNICATION AND INFORMATICS (ICCCI), 2017,
  • [5] Ultrahigh Dimensional Feature Selection: Beyond the Linear Model
    Fan, Jianqing
    Samworth, Richard
    Wu, Yichao
    Journal of Machine Learning Research, 2009, 10 : 2013 - 2038
  • [6] Ultrahigh Dimensional Feature Selection: Beyond The Linear Model
    Fan, Jianqing
    Samworth, Richard
    Wu, Yichao
    JOURNAL OF MACHINE LEARNING RESEARCH, 2009, 10 : 2013 - 2038
  • [7] Stable feature screening for ultrahigh dimensional data
    Lai, Peng
    Song, Fengli
    Gao, Yufei
    JOURNAL OF THE KOREAN STATISTICAL SOCIETY, 2019, 48 (02) : 221 - 232
  • [8] Ultra High-Dimensional Nonlinear Feature Selection for Big Biological Data
    Yamada, Makoto
    Tang, Jiliang
    Lugo-Martinez, Jose
    Hodzic, Ermin
    Shrestha, Raunak
    Saha, Avishek
    Ouyang, Hua
    Yin, Dawei
    Mamitsuka, Hiroshi
    Sahinalp, Cenk
    Radivojac, Predrag
    Menczer, Filippo
    Chang, Yi
    IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2018, 30 (07) : 1352 - 1365
  • [9] Stable feature screening for ultrahigh dimensional data
    Peng Lai
    Fengli Song
    Yufei Gao
    Journal of the Korean Statistical Society, 2019, 48 : 221 - 232
  • [10] Feature screening for ultrahigh dimensional binary data
    Guan, Guoyu
    Shan, Na
    Guo, Jianhua
    STATISTICS AND ITS INTERFACE, 2018, 11 (01) : 41 - 50