Visually Indicated Sounds

被引:213
作者
Owens, Andrew [1 ]
Torralba, Antonio [1 ,2 ]
Isola, Phillip [1 ]
Adelson, Edward H. [1 ]
McDermott, Josh [1 ]
Freeman, William T. [1 ,3 ]
机构
[1] MIT, Cambridge, MA 02139 USA
[2] Univ Calif Berkeley, Berkeley, CA 94720 USA
[3] Google Res, Mountain View, CA USA
来源
2016 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR) | 2016年
基金
美国国家科学基金会;
关键词
D O I
10.1109/CVPR.2016.264
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Objects make distinctive sounds when they are hit or scratched. These sounds reveal aspects of an object's material properties, as well as the actions that produced them. In this paper, we propose the task of predicting what sound an object makes when struck as a way of studying physical interactions within a visual scene. We present an algorithm that synthesizes sound from silent videos of people hitting and scratching objects with a drumstick. This algorithm uses a recurrent neural network to predict sound features from videos and then produces a waveform from these features with an example-based synthesis procedure. We show that the sounds predicted by our model are realistic enough to fool participants in a "real or fake" psychophysical experiment, and that they convey significant information about material properties and physical interactions.
引用
收藏
页码:2405 / 2413
页数:9
相关论文
共 46 条
[1]   Learning to See by Moving [J].
Agrawal, Pulkit ;
Carreira, Joao ;
Malik, Jitendra .
2015 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2015, :37-45
[2]  
[Anonymous], 2013, IEEE TPAMI
[3]  
[Anonymous], 2015, ARXIV150402518
[4]  
[Anonymous], 2014, Advances in neural information processing systems
[5]  
Arnab A., 2015, BMVC
[6]  
Baillargeon R., 2002, BLACKWELL HDB CHILDH, P47, DOI 10.1002/9780470996652.ch3
[7]  
Bell Sean, 2014, CORR
[8]   A tutorial on onset detection in music signals [J].
Bello, JP ;
Daudet, L ;
Abdallah, S ;
Duxbury, C ;
Davies, M ;
Sandler, MB .
IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, 2005, 13 (05) :1035-1047
[9]  
Bonebright T., 2012, INT C AUD DISPL
[10]   Statistical modeling of intrinsic structures in impacts sounds [J].
Cavaco, Sofia ;
Lewicki, Michael S. .
JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 2007, 121 (06) :3558-3568