Double Compressed Wideband AMR Speech Detection Using Deep Neural Networks

被引：0

作者：

Buker, Aykut ^{[1
]}

Hanilci, Cemal ^{[1
]}

机构：

[1] Bursa Tech Univ, Dept Elect & Elect Engn, TR-16310 Bursa, Turkiye

来源：

CIRCUITS SYSTEMS AND SIGNAL PROCESSING | 2024年 / 43卷 / 7期

关键词：

Audio forensics; Wideband AMR codec; Double compressed AMR detection; Deep neural networks;

D O I：

10.1007/s00034-024-02668-4

中图分类号：

TM [电工技术]; TN [电子技术、通信技术];

学科分类号：

0808 ; 0809 ;

摘要：

Detecting double compressed (DC) speech signals is an important audio forensics task since it is highly related to the integrity and the authenticity of the recording. Adaptive multi-rate (AMR) speech codec is a popular audio compression technique specifically optimized for speech signals and it is a standard audio recording format in the vast majority of the smart phones. All of the previous studies addressing the detection of DC AMR signals report their findings for the speech signals compressed using the narrowband AMR codec (AMR-NB). Meanwhile, wideband AMR codec (AMR-WB) has been used by several mobile phone manufacturers, but DC AMR-WB speech signal detection performance remains unknown. To the best of our knowledge, this is the first study focusing on detecting the DC signals compressed using the AMR-WB speech codec. To this end, we propose three different deep neural network-based DC AMR-WB signal detection systems where the spectrogram representations of the speech signals are used as the input features. Experimental results conducted on TIMIT database provide several important findings regarding the DC AMR-WB speech detection. Firstly, DC AMR-WB detection is found to be a more challenging task than detecting the AMR-NB signals. For example, convolutional neural network (CNN)-based system yields 74.83% and 99.93% detection rates on AMR-WB and AMR-NB coded signals, respectively. Secondly, capturing the temporal information using long short-term memory (LSTM) network with the DC AMR-WB signal detection accuracy of 86.25% is found to be superior to the CNN system. Thirdly, combining the deep feature representations learned by CNN and LSTM networks further improves the performance. Fourthly, the detection rates are found to deteriorate when the signals are first encoded using different audio codecs prior to AMR-WB compression. Finally, applying score level or decision level fusion to the proposed three systems improves the detection rates, in general.

引用

页码：4528 / 4546

页数：19

共 50 条

[31] Automatic Recognition of Kazakh Speech Using Deep Neural Networks
Mamyrbayev, Orken
Turdalyuly, Mussa
Mekebayev, Nurbapa
Alimhan, Keylan
Kydyrbekova, Aizat
Turdalykyzy, Tolganay
INTELLIGENT INFORMATION AND DATABASE SYSTEMS, ACIIDS 2019, PT II, 2019, 11432 : 465 - 474
[32] Binaural Speech Intelligibility Estimation Using Deep Neural Networks
Kondo, Kazuhiro
Taira, Kazuya
Kobayashi, Yosuke
19TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2018), VOLS 1-6: SPEECH RESEARCH FOR EMERGING MARKETS IN MULTILINGUAL SOCIETIES, 2018, : 1858 - 1862
[33] STATISTICAL PARAMETRIC SPEECH SYNTHESIS USING DEEP NEURAL NETWORKS
Zen, Heiga
Senior, Andrew
Schuster, Mike
2013 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2013, : 7962 - 7966
[34] PERCEPTUALLY GUIDED SPEECH ENHANCEMENT USING DEEP NEURAL NETWORKS
Zhao, Yan
Xu, Buye
Giri, Ritwik
Zhang, Tao
2018 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2018, : 5074 - 5078
[35] Speech Recognition Using Deep Neural Networks: A Systematic Review
Nassif, Ali Bou
Shahin, Ismail
Attili, Imtinan
Azzeh, Mohammad
Shaalan, Khaled
IEEE ACCESS, 2019, 7 : 19143 - 19165
[36] Enhancing analysis of diadochokinetic speech using deep neural networks
Segal-Feldman, Yael
Hitczenko, Kasia
Goldrick, Matthew
Buchwald, Adam
Roberts, Angela
Keshet, Joseph
COMPUTER SPEECH AND LANGUAGE, 2025, 90
[37] Event Detection and Classification Using Deep Compressed Convolutional Neural Network
Swapnika, K.
Vasumathi, D.
INTERNATIONAL JOURNAL OF ADVANCED COMPUTER SCIENCE AND APPLICATIONS, 2022, 13 (12) : 312 - 322
[38] Speech Emotion Recognition using Convolution Neural Networks and Deep Stride Convolutional Neural Networks
Wani, Taiba Majid
Gunawan, Teddy Surya
Qadri, Syed Asif Ahmad
Mansor, Hasmah
Kartiwi, Mira
Ismail, Nanang
PROCEEDING OF 2020 6TH INTERNATIONAL CONFERENCE ON WIRELESS AND TELEMATICS (ICWT), 2020,
[39] Event Detection and Classification Using Deep Compressed Convolutional Neural Network
Swapnika, K.
Vasumathi, D.
International Journal of Advanced Computer Science and Applications, 2022, 13 (12): : 312 - 322
[40] The Representation of Speech in Deep Neural Networks
Scharenborg, Odette
van der Gouw, Nikki
Larson, Martha
Marchiori, Elena
MULTIMEDIA MODELING, MMM 2019, PT II, 2019, 11296 : 194 - 205

← 1 2 3 4 5 →