The Visual Microphone: Passive Recovery of Sound from Video

被引:237
作者
Davis, Abe [1 ]
Rubinstein, Michael [1 ,2 ]
Wadhwa, Neal [1 ]
Mysore, Gautham J. [3 ]
Durand, Fredo [1 ]
Freeman, William T. [1 ]
机构
[1] MIT CSAIL, Cambridge, MA 02139 USA
[2] Microsoft Res, New York, NY 11728 USA
[3] Adobe Res, Bangalore, Karnataka, India
来源
ACM TRANSACTIONS ON GRAPHICS | 2014年 / 33卷 / 04期
基金
美国国家科学基金会;
关键词
remote sound acquisition; sound from video; visual acoustics; SPEECH;
D O I
10.1145/2601097.2601119
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
When sound hits an object, it causes small vibrations of the object's surface. We show how, using only high-speed video of the object, we can extract those minute vibrations and partially recover the sound that produced them, allowing us to turn everyday objects-a glass of water, a potted plant, a box of tissues, or a bag of chips-into visual microphones. We recover sounds from high-speed footage of a variety of objects with different properties, and use both real and simulated data to examine some of the factors that affect our ability to visually recover sound. We evaluate the quality of recovered sounds using intelligibility and SNR metrics and provide input and recovered audio samples for direct comparison. We also explore how to leverage the rolling shutter in regular consumer cameras to recover audio from standard frame-rate videos, and use the spatial resolution of our method to visualize how sound-related vibrations vary over an object's surface, which we can use to recover the vibration modes of an object.
引用
收藏
页数:10
相关论文
共 29 条
[1]  
Ait-Aider Omar, 2007, IEEE C COMP VIS PATT, P1, DOI DOI 10.1109/CVPR.2007.383119
[2]  
[Anonymous], 2005, 6th OmniVis WS
[3]  
[Anonymous], 1988, Objective measures of speech quality
[4]  
[Anonymous], 2012, P 2012 IEEE INT C CO
[5]  
[Anonymous], COMP PHOT ICCP 2014
[6]   SUPPRESSION OF ACOUSTIC NOISE IN SPEECH USING SPECTRAL SUBTRACTION [J].
BOLL, SF .
IEEE TRANSACTIONS ON ACOUSTICS SPEECH AND SIGNAL PROCESSING, 1979, 27 (02) :113-120
[7]  
Chen J., 2014, P 32 INT MO IN PRESS
[8]   Uncertainty analysis of high frequency image-based vibration measurements [J].
D'Emilia, Giulio ;
Razze, Laura ;
Zappa, Emanuele .
MEASUREMENT, 2013, 46 (08) :2630-2637
[9]   YIN, a fundamental frequency estimator for speech and music [J].
de Cheveigné, A ;
Kawahara, H .
JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 2002, 111 (04) :1917-1930
[10]  
Fisher W., 1986, PROC DARPA WORKSHOP, P93