Scalable Model-Free Feature Screening via Sliced-Wasserstein Dependency

被引:1
|
作者
Li, Tao [1 ]
Yu, Jun [2 ]
Meng, Cheng [3 ]
机构
[1] Renmin Univ China, Inst Stat & Big Data, Beijing, Peoples R China
[2] Beijing Inst Technol, Sch Math & Stat, Beijing, Peoples R China
[3] Renmin Univ China, Inst Stat & Big Data, Ctr Appl Stat, Beijing, Peoples R China
基金
中国国家自然科学基金;
关键词
Multivariate response model; Nonlinear model; Optimal transport; Sure screening; Variable selection; SIMULTANEOUS DIMENSION REDUCTION; FEATURE-SELECTION; DISTANCE CORRELATION; VARIABLE SELECTION; MICROARRAY DATA; REGRESSION;
D O I
10.1080/10618600.2023.2183213
中图分类号
O21 [概率论与数理统计]; C8 [统计学];
学科分类号
020208 ; 070103 ; 0714 ;
摘要
We consider the model-free feature screening problem that aims to discard non-informative features before downstream analysis. Most of the existing feature screening approaches have at least quadratic computational cost with respect to the sample size n, thus, may suffer from a huge computational burden when n is large. To alleviate the computational burden, we propose a scalable model-free sure independence screening approach. This approach is based on the so-called sliced-Wasserstein dependency, a novel metric that measures the dependence between two random variables. Specifically, we quantify the dependence between two random variables by measuring the sliced-Wasserstein distance between their joint distribution and the product of their marginal distributions. For a predictor matrix of size n x d, the computational cost for the proposed algorithm is at the order of O(n log (n)d), even when the response variable is multivariate. Theoretically, we show the proposed method enjoys both sure screening and rank consistency properties under mild regularity conditions. Numerical studies on various synthetic and real-world datasets demonstrate the superior performance of the proposed method in comparison with mainstream competitors, requiring significantly less computational time. for this article are available online.
引用
收藏
页码:1501 / 1511
页数:11
相关论文
共 50 条
  • [1] Model-Free Feature Screening and FDR Control With Knockoff Features
    Liu, Wanjun
    Ke, Yuan
    Liu, Jingyuan
    Li, Runze
    JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 2022, 117 (537) : 428 - 443
  • [2] Robust model-free feature screening via quantile correlation
    Ma, Xuejun
    Zhang, Jingxiao
    JOURNAL OF MULTIVARIATE ANALYSIS, 2016, 143 : 472 - 480
  • [3] Hyperbolic Sliced-Wasserstein via Geodesic and Horospherical Projections
    Bonet, Clement
    Chapel, Laetitia
    Drumetz, Lucas
    Courty, Nicolas
    TOPOLOGICAL, ALGEBRAIC AND GEOMETRIC LEARNING WORKSHOPS 2023, VOL 221, 2023, 221
  • [4] Model-free sure screening via maximum correlation
    Huang, Qiming
    Zhu, Yu
    JOURNAL OF MULTIVARIATE ANALYSIS, 2016, 148 : 89 - 106
  • [5] Model-free feature screening for ultrahigh dimensional classification
    Sheng, Ying
    Wang, Qihua
    JOURNAL OF MULTIVARIATE ANALYSIS, 2020, 178
  • [6] Model-free feature screening via distance correlation for ultrahigh dimensional survival data
    Zhang, Jing
    Liu, Yanyan
    Cui, Hengjian
    STATISTICAL PAPERS, 2021, 62 (06) : 2711 - 2738
  • [7] Model-free conditional screening via conditional distance correlation
    Lu, Jun
    Lin, Lu
    STATISTICAL PAPERS, 2020, 61 (01) : 225 - 244
  • [8] Distribution-free and model-free multivariate feature screening via multivariate rank distance correlation
    Zhao, Shaofei
    Fu, Guifang
    JOURNAL OF MULTIVARIATE ANALYSIS, 2022, 192
  • [9] Model-Free Feature Screening for Ultrahigh-Dimensional Data
    Zhu, Li-Ping
    Li, Lexin
    Li, Runze
    Zhu, Li-Xing
    JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 2011, 106 (496) : 1464 - 1475
  • [10] Model-Free Forward Screening Via Cumulative Divergence
    Zhou, Tingyou
    Zhu, Liping
    Xu, Chen
    Li, Runze
    JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 2020, 115 (531) : 1393 - 1405