DISCOVERING INFLUENTIAL VARIABLES: A METHOD OF PARTITIONS

被引:25
作者
Chernoff, Herman [1 ]
Lo, Shaw-Hwa [2 ]
Zheng, Tian [2 ]
机构
[1] Harvard Univ, Dept Stat, Ctr Sci, Cambridge, MA 02138 USA
[2] Columbia Univ, Dept Stat, New York, NY 10027 USA
关键词
Partition; variable selection; influence; marginal influence; retention; impostor; resuscitation; RHEUMATOID-ARTHRITIS; LINKAGE ANALYSIS; GENOME; ASSOCIATION; GENES; POLYMORPHISMS; FAMILIES; DISEASE; SCAN;
D O I
10.1214/09-AOAS265
中图分类号
O21 [概率论与数理统计]; C8 [统计学];
学科分类号
020208 ; 070103 ; 0714 ;
摘要
A trend in all scientific disciplines, based on advances in technology, is the increasing availability of high dimensional data in which are buried important information. A current urgent challenge to statisticians is to develop effective methods of finding the useful information from the vast amounts of messy and noisy data available, most of which are noninformative. This paper presents a general computer intensive approach, based on a method pioneered by Lo and Zheng for detecting which, of many potential explanatory variables, have an influence on a dependent variable Y. This approach is suited to detect influential variables, where causal effects depend on the confluence of values of several variables. It has the advantage of avoiding a difficult direct analysis, involving possibly thousands of variables, by dealing with many randomly selected small subsets from which smaller subsets are selected, guided by a measure of influence I. The main objective is to discover the influential variables, rather than to measure their effects. Once they are detected, the problem of dealing with a much smaller group of influential variables should be vulnerable to appropriate analysis. In a sense, we are confining our attention to locating a few needles in a haystack.
引用
收藏
页码:1335 / 1369
页数:35
相关论文
共 24 条
  • [1] High-density SNP analysis of 642 Caucasian families with rheumatoid arthritis identifies two new linkage regions on 11p12 and 2q33
    Amos, C. I.
    Chen, W. V.
    Lee, A.
    Li, W.
    Kern, M.
    Lundsten, R.
    Batliwalla, F.
    Wener, M.
    Remmers, E.
    Kastner, D. A.
    Criswell, L. A.
    Seldin, M. F.
    Gregersen, P. K.
    [J]. GENES AND IMMUNITY, 2006, 7 (04) : 277 - 286
  • [2] [Anonymous], Journal of machine learning research
  • [3] Autoimmune diseases in a Danish cohort of 4,866 carriers of constitutional structural chromosomal rearrangements
    Bache, Iben
    Nielsen, Nete M.
    Rostgaard, Klaus
    Tommerup, Niels
    Frisch, Morten
    [J]. ARTHRITIS AND RHEUMATISM, 2007, 56 (07): : 2402 - 2409
  • [4] CONTROLLING THE FALSE DISCOVERY RATE - A PRACTICAL AND POWERFUL APPROACH TO MULTIPLE TESTING
    BENJAMINI, Y
    HOCHBERG, Y
    [J]. JOURNAL OF THE ROYAL STATISTICAL SOCIETY SERIES B-STATISTICAL METHODOLOGY, 1995, 57 (01) : 289 - 300
  • [5] Random forests
    Breiman, L
    [J]. MACHINE LEARNING, 2001, 45 (01) : 5 - 32
  • [6] CHERNOFF H, 2009, DISCOVERING INFLUE S, DOI DOI 10.1214/09-AOAS265SUPP
  • [7] CORDELL H, 2007, BMC P, V2, pS1
  • [8] New susceptibility locus for rheumatoid arthritis suggested by a genome-wide linkage study
    Cornelis, F
    Faure, S
    Martinez, M
    Prud'Homme, JF
    Fritz, P
    Dib, C
    Alves, H
    Barrera, P
    De Vries, N
    Balsa, A
    Pascual-Salcedo, D
    Maenaut, K
    Westhovens, R
    Migliorini, P
    Tran, TH
    Delaye, A
    Prince, N
    Lefevre, C
    Thomas, G
    Poirier, M
    Soubigou, S
    Alibert, O
    Lasbleiz, S
    Fouix, S
    Bouchier, C
    Lioté, F
    Loste, MN
    Lepage, V
    Charron, D
    Gyapay, G
    Lopes-Vaz, A
    Kuntz, D
    Bardin, T
    Weissenbach, J
    [J]. PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 1998, 95 (18) : 10746 - 10750
  • [9] Dash M., 1997, Intelligent Data Analysis, V1
  • [10] Ding Yuejing, 2007, BMC Proc, V1 Suppl 1, pS13