Speech Enhancement Using LinkNet Architecture

被引：1

作者：

Patel, Anuj ^{[2
]}

Prasad, G. Satya ^{[1
]}

Chandra, Sabyasachi ^{[1
]}

Bharati, Puja ^{[1
]}

Das Mandal, Shyamal Kumar ^{[1
]}

机构：

[1] Indian Inst Technol, Kharagpur, W Bengal, India

[2] Pandit Deendayal Energy Univ, Gandhinagar, India

来源：

SPEECH AND COMPUTER, SPECOM 2023, PT I | 2023年 / 14338卷

关键词：

Speech enhancement; Deep learning; Convolutional neural networks; Auto encoder; UNET; LinkNet-speech;

D O I：

10.1007/978-3-031-48309-7_21

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

Speech enhancement techniques play a vital role in enhancing the clarity and overall quality of audio signals, addressing issues like background noise, reverberation, and channel impairments that often degrade speech intelligibility. Neural network models, including DNNs, CNNs, RNNs, and VAEs, have demonstrated their effectiveness in improving speech quality by decoding noisy speech inputs, capturing intricate patterns, and extracting relevant information. Evaluation metrics like PESQ and STOI are commonly employed to assess the performance of speech enhancement algorithms. STOI measures the understandability of enhanced speech using short-time spectral information, while PESQ evaluates the subjective quality of enhanced speech compared to the original clean speech. Moreover, recent advancements in speech enhancement research have shown that employing LinkNet, a specific neural network architecture, can significantly surpass the efficiency of other models. LinkNet has demonstrated superior performance in enhancing speech signals by effectively mitigating noise, reducing artifacts, and enhancing the overall intelligibility of the output. Its architecture incorporates innovative techniques that facilitate the extraction of meaningful features from noisy speech inputs, leading to remarkable results in terms of speech quality improvement. By leveraging LinkNet, researchers and practitioners can further advance the field of speech enhancement and achieve outstanding outcomes in terms of audio clarity and intelligibility.

引用

页码：245 / 257

页数：13

共 18 条

[1]

Berouti M., 1979, ICASSP 79. 1979 IEEE International Conference on Acoustics, Speech and Signal Processing, P208

[2]

Chaurasia A, 2017, 2017 IEEE VISUAL COMMUNICATIONS AND IMAGE PROCESSING (VCIP)

[3] SPEECH ENHANCEMENT FROM NOISE - A REGENERATIVE APPROACH [J].

DENDRINOS, M ;

BAKAMIDIS, S ;

CARAYANNIS, G .

SPEECH COMMUNICATION, 1991, 10 (01) :45-57

[4]

Dong JF, 2018, 2018 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), P3006, DOI 10.1109/ICASSP.2018.8462085

[5] STATISTICAL-MODEL-BASED SPEECH ENHANCEMENT SYSTEMS [J].

EPHRAIM, Y .

PROCEEDINGS OF THE IEEE, 1992, 80 (10) :1526-1555

[6] A SIGNAL SUBSPACE APPROACH FOR SPEECH ENHANCEMENT [J].

EPHRAIM, Y ;

VANTREES, HL .

IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, 1995, 3 (04) :251-266

[7]

Ioffe S, 2015, PR MACH LEARN RES, V37, P448

[8]

LIM JS, 1978, IEEE T ACOUST SPEECH, V26, P197, DOI 10.1109/TASSP.1978.1163086

[9]

Lu XG, 2013, INTERSPEECH, P436

[10]

Nair V., 2010, P 27 INT C MACH LEAR, P807

← 1 2 →