Detecting Novel Associations in Large Data Sets

被引:2635
作者
Reshef, David N. [1 ,2 ,3 ]
Reshef, Yakir A. [2 ,4 ]
Finucane, Hilary K. [5 ]
Grossman, Sharon R. [2 ,6 ]
McVean, Gilean [3 ,7 ]
Turnbaugh, Peter J. [6 ]
Lander, Eric S. [2 ,8 ,9 ]
Mitzenmacher, Michael [10 ]
Sabeti, Pardis C. [2 ,6 ]
机构
[1] MIT, Dept Comp Sci, Cambridge, MA 02139 USA
[2] Broad Inst MIT & Harvard, Cambridge, MA 02142 USA
[3] Univ Oxford, Dept Stat, Oxford OX1 3TG, England
[4] Harvard Univ, Dept Math, Cambridge, MA 02138 USA
[5] Weizmann Inst Sci, Dept Comp Sci & Appl Math, IL-76100 Rehovot, Israel
[6] Harvard Univ, Dept Organism & Evolutionary Biol, Ctr Syst Biol, Cambridge, MA 02138 USA
[7] Univ Oxford, Wellcome Trust Ctr Human Genet, Oxford OX3 7BN, England
[8] MIT, Dept Biol, Cambridge, MA 02139 USA
[9] Harvard Univ, Sch Med, Dept Syst Biol, Boston, MA 02115 USA
[10] Harvard Univ, Sch Engn & Appl Sci, Cambridge, MA 02138 USA
基金
美国国家科学基金会; 欧洲研究理事会;
关键词
PRINCIPAL CURVES; REGRESSION; INFORMATION; MICROBIOME; HEALTH; CYCLE;
D O I
10.1126/science.1205438
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
07 ; 0710 ; 09 ;
摘要
Identifying interesting relationships between pairs of variables in large data sets is increasingly important. Here, we present a measure of dependence for two-variable relationships: the maximal information coefficient (MIC). MIC captures a wide range of associations both functional and not, and for functional relationships provides a score that roughly equals the coefficient of determination (R-2) of the data relative to the regression function. MIC belongs to a larger class of maximal information-based nonparametric exploration (MINE) statistics for identifying and classifying relationships. We apply MIC and MINE to data sets in global health, gene expression, major-league baseball, and the human gut microbiota and identify known and novel relationships.
引用
收藏
页码:1518 / 1524
页数:7
相关论文
共 33 条
[1]   Robust detection of periodic time series measured from biological systems -: art. no. 117 [J].
Ahdesmäki, M ;
Lähdesmäki, H ;
Pearson, R ;
Huttunen, H ;
Yli-Harja, O .
BMC BIOINFORMATICS, 2005, 6 (1)
[2]   Mo-total organic carbon covariation in modern anoxic marine environments: Implications for analysis of paleoredox and paleohydrographic conditions [J].
Algeo, TJ ;
Lyons, TW .
PALEOCEANOGRAPHY, 2006, 21 (01)
[3]  
[Anonymous], 2011, SCIENCE, DOI DOI 10.1126/SCIENCE.331.6018.692
[4]  
[Anonymous], 2009, WORLD FACTB 2009
[5]  
Baseball Prospectus Statistics Reports, 2009, BAS PROSP STAT REP
[6]  
BREIMAN L, 1985, J AM STAT ASSOC, V80, P580, DOI 10.2307/2288473
[7]   Influence of life stress on depression: Moderation by a polymorphism in the 5-HTT gene [J].
Caspi, A ;
Sugden, K ;
Moffitt, TE ;
Taylor, A ;
Craig, IW ;
Harrington, H ;
McClay, J ;
Mill, J ;
Martin, J ;
Braithwaite, A ;
Poulton, R .
SCIENCE, 2003, 301 (5631) :386-389
[8]   Human resources for health: overcoming the crisis [J].
Chen, L ;
Evans, T ;
Anand, S ;
Boufford, JI ;
Brown, H ;
Chowdhury, M ;
Cueto, M ;
Dare, L ;
Dussault, G ;
Elzinga, G ;
Fee, E ;
Habte, D ;
Hanvoravongchai, P ;
Jacobs, M ;
Kurowski, C ;
Michael, S ;
Pablos-Mendez, A ;
Sewankambo, N ;
Solimano, G ;
Stilwell, B ;
de Waal, A ;
Wibulpolprasert, S .
LANCET, 2004, 364 (9449) :1984-1990
[9]   Oxygen isotope studies of achondrites [J].
Clayton, RN ;
Mayeda, TK .
GEOCHIMICA ET COSMOCHIMICA ACTA, 1996, 60 (11) :1999-2017
[10]   LOCALLY WEIGHTED REGRESSION - AN APPROACH TO REGRESSION-ANALYSIS BY LOCAL FITTING [J].
CLEVELAND, WS ;
DEVLIN, SJ .
JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 1988, 83 (403) :596-610