Deep latent variable models for generating knockoffs

被引:7
作者
Liu, Ying [1 ]
Zheng, Cheng [2 ]
机构
[1] Columbia Univ, Dept Psychiat, Irving Med Ctr, Mental Hlth Data Sci, 722 168th St, New York, NY 10032 USA
[2] Univ Wisconsin, Joseph J Zilber Sch Publ Hlth, Milwaukee, WI 53211 USA
来源
STAT | 2019年 / 8卷 / 01期
关键词
deep generative model; FDR control; latent variable mode; model-X knockoff; FALSE DISCOVERY RATE; INFERENCE;
D O I
10.1002/sta4.260
中图分类号
O21 [概率论与数理统计]; C8 [统计学];
学科分类号
020208 ; 070103 ; 0714 ;
摘要
Selective inference is an emerging field in big data analytics; it targets on conducting variable selection and providing statistical inference at the same time. Among various selective inference frameworks, the model-X framework offers the most flexible tool to equip almost any machine learning method with the ability for false discovery rate (FDR) controlled variable selection. This paper provides a practical and flexible approach to generate knockoffs. We propose to fit a latent variable model for generating knockoffs. Under general conditions, the knockoffs can be generated by approximate inference of a latent variable, which captures all the correlation of predictors. We propose an algorithm based on recent advancement in stochastic variational inference to approximately reconstruct the distribution of data via the latent variables. We demonstrate that our proposed method can achieve FDR control and better power than existing knockoff generation methods in various simulated settings and a real data example for finding mutations associated with drug resistance in human immunodeficiency virus type 1 patients.
引用
收藏
页数:10
相关论文
共 21 条
  • [11] Panning for gold: "model-X' knockoffs for high dimensional controlled variable selection
    Candes, Emmanuel
    Fan, Yingying
    Janson, Lucas
    Lv, Jinchi
    [J]. JOURNAL OF THE ROYAL STATISTICAL SOCIETY SERIES B-STATISTICAL METHODOLOGY, 2018, 80 (03) : 551 - 577
  • [12] Mean-field variational approximate Bayesian inference for latent variable models
    Consonni, Guido
    Marin, Jean-Michel
    [J]. COMPUTATIONAL STATISTICS & DATA ANALYSIS, 2007, 52 (02) : 790 - 798
  • [13] Diederik P. K, 2014, ICLR, V1
  • [14] Goodfellow IJ, 2014, ADV NEUR IN, V27, P2672, DOI DOI 10.1145/3422622
  • [15] Hoffman MD, 2013, J MACH LEARN RES, V14, P1303
  • [16] Jang E., 2017, ICLR 2017
  • [17] Paszke A., 2017, PROC 31 INT C NEURAL, P1
  • [18] Genotypic predictors of human immunodeficiency virus type 1 drug resistance
    Rhee, Soo-Yon
    Taylor, Jonathan
    Wadhera, Gauhar
    Ben-Hur, Asa
    Brutlag, Douglas L.
    Shafer, Robert W.
    [J]. PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2006, 103 (46) : 17355 - 17360
  • [19] HIV-1 protease and reverse-transcriptase mutations: Correlations with antiretroviral therapy in subtype B isolates and implications for drug-resistance surveillance
    Rhee, SY
    Fessel, WJ
    Zolopa, AR
    Hurley, L
    Liu, T
    Taylor, J
    Nguyen, DP
    Slome, S
    Klein, D
    Horberg, M
    Flamm, J
    Follansbee, S
    Schapiro, JM
    Shafer, RW
    [J]. JOURNAL OF INFECTIOUS DISEASES, 2005, 192 (03) : 456 - 465
  • [20] Romano Y., 2018, ARXIV181106687