A Light-Weight Replay Detection Framework For Voice Controlled IoT Devices

被引：34

作者：

Malik, Khalid Mahmood ^{[1
]}

Javed, Ali ^{[1
]}

Malik, Hafiz ^{[2
]}

Irtaza, Aun ^{[2
]}

机构：

[1] Oakland Univ, Dept Comp Sci & Engn, Rochester, MI 48309 USA

[2] Univ Michigan, Elect & Comp Engn Dept, Dearborn, MI 48128 USA

来源：

IEEE JOURNAL OF SELECTED TOPICS IN SIGNAL PROCESSING | 2020年 / 14卷 / 05期

基金：

美国国家科学基金会;

关键词：

Feature extraction; Google; Mel frequency cepstral coefficient; Deep learning; Internet of Things; Acoustic ternary patterns; audio replay detection; audio spoofing dataset; gammatone cepstral coefficients; voice-controlled devices; CEPSTRAL COEFFICIENTS; CLASSIFICATION; FEATURES; ATTACK;

D O I：

10.1109/JSTSP.2020.2999828

中图分类号：

TM [电工技术]; TN [电子技术、通信技术];

学科分类号：

0808 ; 0809 ;

摘要：

The growing number of voice-controlled devices (VCDs), i.e. Google Home, Amazon Alexa, etc., has resulted in automation of home appliances, smart gadgets, and next generation vehicles, etc. However, VCDs and voice-activated services i.e. chatbots are vulnerable to audio replay attacks. Our vulnerability analysis of VCDs shows that these replays could be exploited in multi-hop scenarios to maliciously access the devices/nodes attached to the Internet of Things. To protect these VCDs and voice-activated services, there is an urgent need to develop reliable and computationally efficient solutions to detect the replay attacks. This paper models replay attacks as a nonlinear process that introduces higher-order harmonic distortions. To detect these harmonic distortions, we propose the acoustic ternary patterns-gammatone cepstral coefficient (ATP-GTCC) features that are capable of capturing distortions due to replay attacks. Error correcting output codes model is used to train a multi-class SVM classifier using the proposed ATP-GTCC feature space and tested for voice replay attack detection. Performance of the proposed framework is evaluated on ASVspoof 2019 dataset, and our own created voice spoofing detection corpus (VSDC) consisting of bona-fide, first-order replay (replayed once), and second-order replay (replayed twice) audio recordings. Experimental results signify that the proposed audio replay detection framework reliably detects both first and second-order replay attacks and can be used in resource constrained devices.

引用

页码：982 / 996

页数：15

共 46 条

[1] HARMONIC AND INTERMODULATION DISTORTION IN CARBON MICROPHONES [J].

ABUELMA'ATTI, MT .

APPLIED ACOUSTICS, 1990, 31 (04) :233-243

[2] Fall detection through acoustic Local Ternary Patterns [J].

Adnan, Syed M. ;

Irtaza, Aun ;

Aziz, Sumair ;

Ullah, M. Obaid ;

Javed, Ali ;

Mahmood, Muhammad Tariq .

APPLIED ACOUSTICS, 2018, 140 :296-300

[3]

Badaskar S., U.S. Patent, Patent No. [9,547,647, 9547647]

[4]

Bakar B, 2018, SIG PROCESS COMMUN

[5]

Balamurali B. T., 2019, ARXIV 1905 12439

[6]

Brockbank R.A., 1945, J I ELECT ENG 3, V92, P45

[7] The Role of Perspective Cues in RSVP [J].

Brown, Joshua ;

Witkowski, Mark ;

Mardell, James ;

Wittenburg, Kent ;

Spence, Robert .

2017 21ST INTERNATIONAL CONFERENCE INFORMATION VISUALISATION (IV), 2017, :29-34

[8] Countermeasures for Automatic Speaker Verification Replay Spoofing Attack : On Data Augmentation, Feature Representation, Classification and Fusion [J].

Cai, Weicheng ;

Cai, Danwei ;

Liu, Wenbo ;

Li, Gang ;

Li, Ming .

18TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2017), VOLS 1-6: SITUATED INTERACTION, 2017, :17-21

[9] ResNet and Model Fusion for Automatic Spoofing Detection [J].

Chen, Zhuxin ;

Xie, Zhifeng ;

Zhang, Weibin ;

Xu, Xiangmin .

18TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2017), VOLS 1-6: SITUATED INTERACTION, 2017, :102-106

[10]

Cooper D., 2013, THESIS

← 1 2 3 4 5 →