Generalized Jaccard feature screening for ultra-high dimensional survival data

被引:0
作者
Liu, Renqing [1 ]
Deng, Guangming [1 ,2 ]
He, Hanji [3 ]
机构
[1] Guilin Univ Technol, Sch Math & Stat, Guilin 541004, Peoples R China
[2] Guangxi Coll & Univ, Key Lab Appl Stat, Guilin 541004, Peoples R China
[3] South China Univ Technol, Sch Econ & Finance, Guangzhou 510006, Peoples R China
来源
AIMS MATHEMATICS | 2024年 / 9卷 / 10期
关键词
generalized Jaccard coefficient; ultra-high dimensional survival data; model-free; VARIABLE SELECTION; LINEAR-MODELS;
D O I
10.3934/math.20241341
中图分类号
O29 [应用数学];
学科分类号
070104 ;
摘要
To identify critical genomes that influence a cancer patient's survival time, feature screening methods play a vital role in this biomedical field. Most of the current research relies on a fixed survival function model, which limits its universality in practical applications. In this paper, we propose the Generalized Jaccard coefficient (GJAC), which extends the traditional Jaccard coefficient from comparing binary vectors' similarity to calculating the correlation between the general vectors. The larger the GJAC value, the higher the sample similarity. Using the GJAC, we introduce a novel model-free screening method to select the active set of covariates in ultra-high dimensional survival data. Through Monte Carlo simulations, GJAC-Sure Independence Screening (GJAC-SIS) shows a higher accuracy, lower errors, and an excellent applicability in different types of survival data compared with other existing model-free feature screening methods in survival data. Additionally, in the real cancer datasets (DLBCL), GJAC-SIS can screen out two additional important genomes, which are certified in the real biomedical experiment, while the other five methods can't. As a result, GJAC-SIS achieves a high screening precision, delivers a more effective screening outcome, and has a better utility and universality.
引用
收藏
页码:27607 / 27626
页数:20
相关论文
共 23 条
[1]   Conditional Sure Independence Screening [J].
Barut, Emre ;
Fan, Jianqing ;
Verhasselt, Anneleen .
JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 2016, 111 (515) :1266-1277
[2]   Variable selection in high-dimensional linear models: partially faithful distributions and the PC-simple algorithm [J].
Buehlmann, P. ;
Kalisch, M. ;
Maathuis, M. H. .
BIOMETRIKA, 2010, 97 (02) :261-278
[3]  
Fan J, 2010, Borrowing strength: Theory powering applications-a Festschrift for Lawrence D. Brown, P70, DOI DOI 10.1214/10-IMSCOLL606
[4]   Sure independence screening for ultrahigh dimensional feature space [J].
Fan, Jianqing ;
Lv, Jinchi .
JOURNAL OF THE ROYAL STATISTICAL SOCIETY SERIES B-STATISTICAL METHODOLOGY, 2008, 70 :849-883
[5]   SURE INDEPENDENCE SCREENING IN GENERALIZED LINEAR MODELS WITH NP-DIMENSIONALITY [J].
Fan, Jianqing ;
Song, Rui .
ANNALS OF STATISTICS, 2010, 38 (06) :3567-3604
[6]   Variable selection via nonconcave penalized likelihood and its oracle properties [J].
Fan, JQ ;
Li, RZ .
JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 2001, 96 (456) :1348-1360
[7]   Independent screening for single-index hazard rate models with ultrahigh dimensional features [J].
Gorst-Rasmussen, Anders ;
Scheike, Thomas .
JOURNAL OF THE ROYAL STATISTICAL SOCIETY SERIES B-STATISTICAL METHODOLOGY, 2013, 75 (02) :217-245
[8]   Using Generalized Correlation to Effect Variable Selection in Very High Dimensional Problems [J].
Hall, Peter ;
Miller, Hugh .
JOURNAL OF COMPUTATIONAL AND GRAPHICAL STATISTICS, 2009, 18 (03) :533-550
[9]   QUANTILE-ADAPTIVE MODEL-FREE VARIABLE SCREENING FOR HIGH-DIMENSIONAL HETEROGENEOUS DATA [J].
He, Xuming ;
Wang, Lan ;
Hong, Hyokyoung Grace .
ANNALS OF STATISTICS, 2013, 41 (01) :342-369
[10]   Feature Screening for Ultrahigh Dimensional Categorical Data With Applications [J].
Huang, Danyang ;
Li, Runze ;
Wang, Hansheng .
JOURNAL OF BUSINESS & ECONOMIC STATISTICS, 2014, 32 (02) :237-244