Mic2Mic: Using Cycle-Consistent Generative Adversarial Networks to Overcome Microphone Variability in Speech Systems

被引：39

作者：

Mathur, Akhil ^{[1
,2
]}

Isopoussu, Anton ^{[1
]}

Kawsar, Fahim ^{[1
]}

Berthouze, Nadia ^{[2
]}

Lane, Nicholas D. ^{[3
]}

机构：

[1] Nokia Bell Labs, Murray Hill, NJ 07974 USA

[2] UCL, London, England

[3] Univ Oxford, Oxford, England

来源：

IPSN '19: PROCEEDINGS OF THE 2019 INTERNATIONAL CONFERENCE ON INFORMATION PROCESSING IN SENSOR NETWORKS | 2019年

关键词：

GAN; speech models; microphone variability; robustness;

D O I：

10.1145/3302506.3310398

中图分类号：

TP301 [理论、方法];

学科分类号：

081202 ;

摘要：

Mobile and embedded devices are increasingly using microphones and audio-based computational models to infer user context. A major challenge in building systems that combine audio models with commodity microphones is to guarantee their accuracy and robustness in the real-world. Besides many environmental dynamics, a primary factor that impacts the robustness of audio models is microphone variability. In this work, we propose Mic2Mic - a machine-learned system component - which resides in the inference pipeline of audio models and at real-time reduces the variability in audio data caused by microphone-specific factors. Two key considerations for the design of Mic2Mic were: a) to decouple the problem of microphone variability from the audio task, and b) put minimal burden on end-users to provide training data. With these in mind, we apply the principles of cycle-consistent generative adversarial networks (CycleGANs) to learn Mic2Mic using unlabeled and unpaired data collected from different microphones. Our experiments show that Mic2Mic can recover between 66% to 89% of the accuracy lost due to microphone variability for two common audio tasks.

引用

页码：169 / 180

页数：12

共 41 条

[1]

Amft O, 2005, LECT NOTES COMPUT SC, V3660, P56

[2] Features and classifiers for emotion recognition from speech: a survey from 2000 to 2011 [J].

Anagnostopoulos, Christos-Nikolaos ;

Iliou, Theodoros ;

Giannoukos, Ioannis .

ARTIFICIAL INTELLIGENCE REVIEW, 2015, 43 (02) :155-177

[3]

[Anonymous], SPEECH ENHANCEMENT B

[4]

[Anonymous], PERV HLTH C WORKSH, DOI DOI 10.1109/PCTHEALTH.2006.361624

[5]

[Anonymous], DO GOOD RES ACTIVITY

[6]

[Anonymous], UBICOMP 14

[7]

[Anonymous], 2017, arXiv preprint arXiv:1705.01908

[8]

[Anonymous], UBICOMP 13

[9]

[Anonymous], 2018, IPSN

[10]

[Anonymous], 2017, ACOUSTIC MODELING GO

← 1 2 3 4 5 →