Robust and flexible learning of a high-dimensional classification rule using auxiliary outcomes

被引:0
作者
Liang, Muxuan [1 ]
Park, Jaeyoung [2 ]
Lu, Qing [1 ]
Zhong, Xiang [3 ]
机构
[1] Univ Florida, Dept Biostat, 2004 Mowry Rd, 5th Floor CTRB, Gainesville, FL 32611 USA
[2] Univ Cent Florida, Sch Global Hlth Management & Informat, Orlando, FL 32816 USA
[3] Univ Florida, Dept Ind & Syst Engn, Gainesville, FL 32611 USA
关键词
auxiliary outcomes; classification; high-dimensional data; multi-task learning; transfer learning; MULTITASK; ALGORITHMS; PREDICT;
D O I
10.1093/biomtc/ujae144
中图分类号
Q [生物科学];
学科分类号
07 ; 0710 ; 09 ;
摘要
Correlated outcomes are common in many practical problems. In some settings, one outcome is of particular interest, and others are auxiliary. To leverage information shared by all the outcomes, traditional multi-task learning (MTL) minimizes an averaged loss function over all the outcomes, which may lead to biased estimation for the target outcome, especially when the MTL model is misspecified. In this work, based on a decomposition of estimation bias into two types, within-subspace and against-subspace, we develop a robust transfer learning approach to estimating a high-dimensional linear decision rule for the outcome of interest with the presence of auxiliary outcomes. The proposed method includes an MTL step using all outcomes to gain efficiency and a subsequent calibration step using only the outcome of interest to correct both types of biases. We show that the final estimator can achieve a lower estimation error than the one using only the single outcome of interest. Simulations and real data analysis are conducted to justify the superiority of the proposed method.
引用
收藏
页数:9
相关论文
共 50 条
  • [41] Consistent and Flexible Selectivity Estimation for High-Dimensional Data
    Wang, Yaoshu
    Xiao, Chuan
    Qin, Jianbin
    Mao, Rui
    Onizuka, Makoto
    Wang, Wei
    Zhang, Rui
    Ishikawa, Yoshiharu
    SIGMOD '21: PROCEEDINGS OF THE 2021 INTERNATIONAL CONFERENCE ON MANAGEMENT OF DATA, 2021, : 2319 - 2327
  • [42] Joint and Progressive Learning from High-Dimensional Data for Multi-label Classification
    Hong, Danfeng
    Yokoya, Naoto
    Xu, Jian
    Zhu, Xiaoxiang
    COMPUTER VISION - ECCV 2018, PT VIII, 2018, 11212 : 478 - 493
  • [43] Benchmark for filter methods for feature selection in high-dimensional classification data
    Bommert, Andrea
    Sun, Xudong
    Bischl, Bernd
    Rahnenfuehrer, Joerg
    Lang, Michel
    COMPUTATIONAL STATISTICS & DATA ANALYSIS, 2020, 143
  • [44] Genetic programming for feature construction and selection in classification on high-dimensional data
    Binh Tran
    Bing Xue
    Mengjie Zhang
    Memetic Computing, 2016, 8 : 3 - 15
  • [45] Genetic Programming Based on Granular Computing for Classification with High-Dimensional Data
    Pei, Wenbin
    Xue, Bing
    Shang, Lin
    Zhang, Mengjie
    AI 2018: ADVANCES IN ARTIFICIAL INTELLIGENCE, 2018, 11320 : 643 - 655
  • [46] High-dimensional variable selection via low-dimensional adaptive learning
    Staerk, Christian
    Kateri, Maria
    Ntzoufras, Ioannis
    ELECTRONIC JOURNAL OF STATISTICS, 2021, 15 (01): : 830 - 879
  • [47] Genetic programming for feature construction and selection in classification on high-dimensional data
    Binh Tran
    Xue, Bing
    Zhang, Mengjie
    MEMETIC COMPUTING, 2016, 8 (01) : 3 - 15
  • [48] Robust high-dimensional regression for data with anomalous responses
    Ren, Mingyang
    Zhang, Sanguo
    Zhang, Qingzhao
    ANNALS OF THE INSTITUTE OF STATISTICAL MATHEMATICS, 2021, 73 (04) : 703 - 736
  • [49] Genetic programming for multiple-feature construction on high-dimensional classification
    Binh Tran
    Xue, Bing
    Zhang, Mengjie
    PATTERN RECOGNITION, 2019, 93 : 404 - 417
  • [50] High-Dimensional Data Classification Based on Smooth Support Vector Machines
    Purnami, Santi Wulan
    Andari, Shofi
    Pertiwi, Yuniati Dian
    THIRD INFORMATION SYSTEMS INTERNATIONAL CONFERENCE 2015, 2015, 72 : 477 - 484