Deep latent variable models for generating knockoffs

被引:7
作者
Liu, Ying [1 ]
Zheng, Cheng [2 ]
机构
[1] Columbia Univ, Dept Psychiat, Irving Med Ctr, Mental Hlth Data Sci, 722 168th St, New York, NY 10032 USA
[2] Univ Wisconsin, Joseph J Zilber Sch Publ Hlth, Milwaukee, WI 53211 USA
来源
STAT | 2019年 / 8卷 / 01期
关键词
deep generative model; FDR control; latent variable mode; model-X knockoff; FALSE DISCOVERY RATE; INFERENCE;
D O I
10.1002/sta4.260
中图分类号
O21 [概率论与数理统计]; C8 [统计学];
学科分类号
020208 ; 070103 ; 0714 ;
摘要
Selective inference is an emerging field in big data analytics; it targets on conducting variable selection and providing statistical inference at the same time. Among various selective inference frameworks, the model-X framework offers the most flexible tool to equip almost any machine learning method with the ability for false discovery rate (FDR) controlled variable selection. This paper provides a practical and flexible approach to generate knockoffs. We propose to fit a latent variable model for generating knockoffs. Under general conditions, the knockoffs can be generated by approximate inference of a latent variable, which captures all the correlation of predictors. We propose an algorithm based on recent advancement in stochastic variational inference to approximately reconstruct the distribution of data via the latent variables. We demonstrate that our proposed method can achieve FDR control and better power than existing knockoff generation methods in various simulated settings and a real data example for finding mutations associated with drug resistance in human immunodeficiency virus type 1 patients.
引用
收藏
页数:10
相关论文
共 21 条