Machine learning-based self-powered acoustic sensor for speaker recognition

被引:139
作者
Han, Jae Hyun [1 ]
Bae, Kang Min [2 ]
Hong, Seong Kwang [1 ]
Park, Hyunsin [2 ]
Kwak, Jun-Hyuk [4 ]
Wang, Hee Seung [1 ]
Joe, Daniel Juhyung [1 ]
Park, Jung Hwan [1 ]
Jung, Young Hoon [1 ]
Hur, Shin [3 ]
Yoo, Chang D. [2 ]
Lee, Keon Jae [1 ]
机构
[1] Korea Adv Inst Sci & Technol, Dept Mat Sci & Engn, 291 Daehak Ro, Daejeon 34141, South Korea
[2] Korea Adv Inst Sci & Technol, Dept Elect Engn, 291 Daehak Ro, Daejeon 34141, South Korea
[3] KIMM, Dept Nat Inspired Nanoconvergence Syst, 156 Gajeongbuk Ro, Daejeon 34103, South Korea
[4] CAMM, 156 Gajeongbuk Ro, Daejeon 34103, South Korea
基金
新加坡国家研究基金会;
关键词
Flexible piezoelectric; Self-powered; Acoustic sensor; Machine learning algorithm; Speaker recognition; HUMAN VOICE; NANOSENSORS;
D O I
10.1016/j.nanoen.2018.09.030
中图分类号
O64 [物理化学(理论化学)、化学物理学];
学科分类号
070304 ; 081704 ;
摘要
Herein, we report a new platform of machine learning-based speaker recognition via the flexible piezoelectric acoustic sensor (f-PAS) with a highly sensitive multi-resonant frequency band. The resonant self-powered f-PAS was fabricated by mimicking the operating mechanism of the basilar membrane in the human cochlear. The f-PAS acquired abundant voice information from the multi-channel sound inputs. The standard TIDIGITS dataset were recorded by the f-PAS and converted to frequency components by using a Fast Fourier Transform (FFT) and a Short-Time Fourier Transform (STFT). The machine learning based Gaussian Mixture Model (GMM) was designed by utilizing the most highest and second highest sensitivity data among multi-channel outputs, exhibiting outstanding speaker recognition rate of 97.5% with error rate reduction of 75% compared to that of the reference MEMS microphone.
引用
收藏
页码:658 / 665
页数:8
相关论文
共 43 条
[1]   SHORT-TERM SPECTRAL ANALYSIS, SYNTHESIS, AND MODIFICATION BY DISCRETE FOURIER-TRANSFORM [J].
ALLEN, JB .
IEEE TRANSACTIONS ON ACOUSTICS SPEECH AND SIGNAL PROCESSING, 1977, 25 (03) :235-238
[2]   Voice-selective areas in human auditory cortex [J].
Belin, P ;
Zatorre, RJ ;
Lafaille, P ;
Ahad, P ;
Pike, B .
NATURE, 2000, 403 (6767) :309-312
[3]   Self-cleaning surfaces - virtual realities [J].
Blossey, R .
NATURE MATERIALS, 2003, 2 (05) :301-306
[4]  
Boyer R. S., 1991, Automated Reasoning: Essays in Honor of Woody Bledsoe, P105, DOI [DOI 10.1007/978-94-011-3488-05, DOI 10.1007/978-94-011-3488-0_5]
[5]   Speaker recognition: A tutorial [J].
Campbell, JP .
PROCEEDINGS OF THE IEEE, 1997, 85 (09) :1437-1462
[6]   Underdetermined Convolutive BSS: Bayes Risk Minimization Based on a Mixture of Super-Gaussian Posterior Approximation [J].
Cho, Janghoon ;
Yoo, Chang D. .
IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2015, 23 (05) :828-839
[7]   Laser-induced phase separation of silicon carbide [J].
Choi, Insung ;
Jeong, Hu Young ;
Shin, Hyeyoung ;
Kang, Gyeongwon ;
Byun, Myunghwan ;
Kim, Hyungjun ;
Chitu, Adrian M. ;
Im, James S. ;
Ruoff, Rodney S. ;
Choi, Sung-Yool ;
Lee, Keon Jae .
NATURE COMMUNICATIONS, 2016, 7
[8]   Underdetermined High-Resolution DOA Estimation: A 2ρth-Order Source-Signal/Noise Subspace Constrained Optimization [J].
Choi, Jin Ho ;
Yoo, Chang D. .
IEEE TRANSACTIONS ON SIGNAL PROCESSING, 2015, 63 (07) :1858-1873
[9]  
Egusa S, 2010, NAT MATER, V9, P643, DOI [10.1038/NMAT2792, 10.1038/nmat2792]
[10]   Who Is Saying "What"? Brain-Based Decoding of Human Voice and Speech [J].
Formisano, Elia ;
De Martino, Federico ;
Bonte, Milene ;
Goebel, Rainer .
SCIENCE, 2008, 322 (5903) :970-973