Packet loss concealment method based on hidden Markov model and decision tree for AMR-WB codec

被引：3

作者：

Gueham, Tarek ^{[1
]}

Merazka, Fatiha ^{[1
]}

机构：

[1] USTHB Univ, Telecommun Dept, LIS Lab, Algiers, Algeria

来源：

MULTIMEDIA TOOLS AND APPLICATIONS | 2024年 / 83卷 / 04期

关键词：

VoIP; Packet loss concealment; HMM; Decision tree; HMDT model; Machine learning; WB-PESQ; EMBSD; MUSHRA; VOICE QUALITY; SPEECH; RECONSTRUCTION; RECOVERY;

D O I：

10.1007/s11042-023-15914-9

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Packet loss concealment (PLC) techniques are utilized to improve the quality of Voice over IP (VoIP) communications by reconstructing missing speech packets. Hidden Markov Model (HMM)-based PLC methods have shown a significant improvement over traditional methods by tracking the statistical evolution of speech signals. However, these methods may result in perceptually disturbing artifacts that arise from the structure of the HMM model. In this study, we introduce HMM-based PLC methods and investigate the impact of Markovian assumptions on their performance, specifically pitch fluctuation and gain mismatching. We then propose a new PLC method implemented on the G.722.2 codec, which incorporates the HMM and Decision Tree (DT) architecture. Our proposed architecture avoids the HMM's emissions dependencies assumption by using a DT layer, resulting in more accurate speech packet regeneration and natural transitions between the synthesized and concealed speech signals. The proposed method uses HMM and DT to track the statistical evolution of speech signals and accurately predict/estimate lost speech packets by exploiting the surrounding received speech packets. The proposed method is evaluated using mathematical proofs, objective, and subjective metrics, with results showing a considerable enhancement in speech quality compared to conventional PLC methods, achieving a Perceptual Evaluation of Speech Quality (PESQ) score higher than 3 at a 20% packet loss ratio.

引用

页码：11261 / 11297

页数：37

共 41 条

[1] Decision tree-based acoustic models for speech recognition [J].

Akamine, Masami ;

Ajmera, Jitendra .

EURASIP JOURNAL ON AUDIO SPEECH AND MUSIC PROCESSING, 2012,

[2]

[Anonymous], 2003, G7222 ITU T

[3] HMM-Based Reconstruction of Unreliable Spectrographic Data for Noise Robust Speech Recognition [J].

Borgstroem, Bengt J. ;

Alwan, Abeer .

IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2010, 18 (06) :1612-1623

[4] Fast recovery for a CELP-Like speech codec after a frame erasure [J].

Chibani, Mohamed ;

Lefebvre, Roch ;

Gournay, Philippe .

IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2007, 15 (08) :2485-2495

[5]

Circus Drake, 2012, COMPUT COMMUN, V28, P582

[6]

Dube P., 2002, NETWORKING 2002. Networking Technologies, Services, and Protocols

[7]

Performance of Computer and Communication Networks

[8]

Mobile and Wireless Communications. Second International IFIP-TC6 Networking Conference. Proceedings (Lecture Notes in Computer Science Vol.2345), P226

[9]

Franzese M., 2019, ENCY BIOINFORMATICS, V1, P706, DOI DOI 10.1016/B978-0-12-809633-8.20358-0

[10] pyAudioAnalysis: An Open-Source Python']Python Library for Audio Signal Analysis [J].

Giannakopoulos, Theodoros .

PLOS ONE, 2015, 10 (12)

← 1 2 3 4 5 →