Machine learning in acoustics: Theory and applications

被引:409
作者
Bianco, Michael J. [1 ]
Gerstoft, Peter [1 ]
Traer, James [2 ]
Ozanich, Emma [1 ]
Roch, Marie A. [3 ]
Gannot, Sharon [4 ]
Deledalle, Charles-Alban [5 ]
机构
[1] Univ Calif San Diego, Scripps Inst Oceanog, La Jolla, CA 92093 USA
[2] MIT, Dept Brain & Cognit Sci, E25-618, Cambridge, MA 02139 USA
[3] San Diego State Univ, Dept Comp Sci, San Diego, CA 92182 USA
[4] Bar Ilan Univ, Fac Engn, IL-5290002 Ramat Gan, Israel
[5] Univ Calif San Diego, Dept Elect & Comp Engn, La Jolla, CA 92093 USA
关键词
EXPECTATION-MAXIMIZATION ALGORITHM; NONNEGATIVE MATRIX FACTORIZATION; WAVE-GUIDE INVARIANT; SOURCE LOCALIZATION; SPEECH DEREVERBERATION; NEURAL-NETWORK; SOUND-SPEED; SOURCE SEPARATION; ONLINE DEREVERBERATION; NOISE-REDUCTION;
D O I
10.1121/1.5133944
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Acoustic data provide scientific and engineering insights in fields ranging from biology and communications to ocean and Earth science. We survey the recent advances and transformative potential of machine learning (ML), including deep learning, in the field of acoustics. ML is a broad family of techniques, which are often based in statistics, for automatically detecting and utilizing patterns in data. Relative to conventional acoustics and signal processing, ML is data-driven. Given sufficient training data, ML can discover complex relationships between features and desired labels or actions, or between features themselves. With large volumes of training data, ML can discover models describing complex acoustic phenomena such as human speech and reverberation. ML in acoustics is rapidly developing with compelling results and significant future promise. We first introduce ML, then highlight ML developments in four acoustics research areas: source localization in speech processing, source localization in ocean acoustics, bioacoustics, and environmental sounds in everyday scenes. (C) 2019 Acoustical Society of America.
引用
收藏
页码:3590 / 3628
页数:39
相关论文
共 298 条
[1]   Automated classification of bird and amphibian calls using machine learning: A comparison of methods [J].
Acevedo, Miguel A. ;
Corrada-Bravo, Carlos J. ;
Corrada-Bravo, Hector ;
Villanueva-Rivera, Luis J. ;
Aide, T. Mitchell .
ECOLOGICAL INFORMATICS, 2009, 4 (04) :206-214
[2]   Sound Event Localization and Detection of Overlapping Sources Using Convolutional Recurrent Neural Networks [J].
Adavanne, Sharath ;
Politis, Archontis ;
Nikunen, Joonas ;
Virtanen, Tuomas .
IEEE JOURNAL OF SELECTED TOPICS IN SIGNAL PROCESSING, 2019, 13 (01) :34-48
[3]   K-SVD: An algorithm for designing overcomplete dictionaries for sparse representation [J].
Aharon, Michal ;
Elad, Michael ;
Bruckstein, Alfred .
IEEE TRANSACTIONS ON SIGNAL PROCESSING, 2006, 54 (11) :4311-4322
[4]   Baseline-free guided wave damage detection with surrogate data and dictionary learning [J].
Alguri, K. Supreet ;
Melville, Joseph ;
Harley, Joel B. .
JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 2018, 143 (06) :3807-3818
[5]   IMAGE METHOD FOR EFFICIENTLY SIMULATING SMALL-ROOM ACOUSTICS [J].
ALLEN, JB ;
BERKLEY, DA .
JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 1979, 65 (04) :943-950
[6]  
[Anonymous], IEEE INT WORKSH COMP
[7]  
[Anonymous], SPEECH PROCESSING MO
[8]  
[Anonymous], 2019, SCI REP
[9]  
[Anonymous], 2014, ADAPTIVE FILTER THEO
[10]  
[Anonymous], 2015, PROC IEEE ACE CHALLE