Sound Texture Perception via Statistics of the Auditory Periphery: Evidence from Sound Synthesis

被引:233
作者
McDermott, Josh H. [1 ,2 ]
Simoncelli, Eero P. [1 ,2 ,3 ]
机构
[1] NYU, Howard Hughes Med Inst, New York, NY 10003 USA
[2] NYU, Ctr Neural Sci, New York, NY 10003 USA
[3] NYU, Courant Inst Math Sci, New York, NY 10003 USA
关键词
COCKTAIL PARTY; AMPLITUDE-MODULATION; RECEPTIVE-FIELDS; DISCRIMINATION; SPEECH; MECHANISMS; FREQUENCY; RESPONSES; MASKING; CONTEXT;
D O I
10.1016/j.neuron.2011.06.032
中图分类号
Q189 [神经科学];
学科分类号
071006 ;
摘要
Rainstorms, insect swarms, and galloping horses produce "sound textures"-the collective result of many similar acoustic events. Sound textures are distinguished by temporal homogeneity, suggesting they could be recognized with time-averaged statistics. To test this hypothesis, we processed real-world textures with an auditory model containing filters tuned for sound frequencies and their modulations, and measured statistics of the resulting decomposition. We then assessed the realism and recognizability of novel sounds synthesized to have matching statistics. Statistics of individual frequency channels, capturing spectral power and sparsity, generally failed to produce compelling synthetic textures; however, combining them with correlations between channels produced identifiable and natural-sounding textures. Synthesis quality declined if statistics were computed from biologically implausible auditory models. The results suggest that sound texture perception is mediated by relatively simple statistics of early auditory representations, presumably computed by downstream neural populations. The synthesis methodology offers a powerful tool for their further investigation.
引用
收藏
页码:926 / 940
页数:15
相关论文
共 62 条
  • [1] SPATIOTEMPORAL ENERGY MODELS FOR THE PERCEPTION OF MOTION
    ADELSON, EH
    BERGEN, JR
    [J]. JOURNAL OF THE OPTICAL SOCIETY OF AMERICA A-OPTICS IMAGE SCIENCE AND VISION, 1985, 2 (02) : 284 - 299
  • [2] Spatial ensemble statistics are efficient codes that can be represented with reduced attention
    Alvarez, George A.
    Oliva, Aude
    [J]. PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2009, 106 (18) : 7345 - 7350
  • [3] [Anonymous], P 22 ANN C COMP GRAP, DOI DOI 10.1145/218380.218446
  • [4] Sparse representations for the cocktail party problem
    Asari, Hiroki
    Pearlmutter, Barak A.
    Zador, Anthony M.
    [J]. JOURNAL OF NEUROSCIENCE, 2006, 26 (28) : 7477 - 7490
  • [5] ATHINEOS M, 2003, P ICASSP 03 HONG KON, DOI DOI 10.1109/ASPAA.2003.1285816
  • [6] Attias H, 1998, ADV NEUR IN, V10, P103
  • [7] MODULATION MASKING - EFFECTS OF MODULATION FREQUENCY, DEPTH, AND PHASE
    BACON, SP
    GRANTHAM, DW
    [J]. JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 1989, 85 (06) : 2575 - 2580
  • [8] Contrast tuning in auditory cortex
    Barbour, DL
    Wang, XQ
    [J]. SCIENCE, 2003, 299 (5609) : 1073 - 1075
  • [9] Orthogonal representation of sound dimensions in the primate midbrain
    Baumann, Simon
    Griffiths, Timothy D.
    Sun, Li
    Petkov, Christopher I.
    Thiele, Alexander
    Rees, Adrian
    [J]. NATURE NEUROSCIENCE, 2011, 14 (04) : 423 - 425
  • [10] The cocktail party problem: What is it? How can it be solved? And why should animal behaviorists study it?
    Bee, Mark A.
    Micheyl, Christophe
    [J]. JOURNAL OF COMPARATIVE PSYCHOLOGY, 2008, 122 (03) : 235 - 251