Spoken Language Identification Using Rhythmic Categorization: Syllable-Timed and Stress-Timed

被引：0

作者：

Dey, Spandan ^{[1
]}

Saha, Goutam ^{[1
]}

机构：

[1] Indian Inst Technol Kharagpur, Dept E&ECE, Kharagpur, W Bengal, India

来源：

2024 INTERNATIONAL CONFERENCE ON SIGNAL PROCESSING AND COMMUNICATIONS, SPCOM 2024 | 2024年

关键词：

Spoken language identification; syllable-timed language; stress-timed language; wav2vec; 2.0; VoxLingua107;

D O I：

10.1109/SPCOM60851.2024.10631595

中图分类号：

TM [电工技术]; TN [电子技术、通信技术];

学科分类号：

0808 ; 0809 ;

摘要：

Recent spoken language identification (LID) systems have relied on acoustic features or self-supervised representations. These features are also useful for other speech-based classification tasks, such as speaker and emotion recognition. However, these features are not specific to LID and may not capture all of the information relevant to distinguishing between different languages. Following this, our study seeks to enhance LID system performance by incorporating language-specific features. Specifically, we investigate the rhythmic categorization of world languages into syllable-timed and stress-timed classes. We choose seven syllable-timed and seven stress-timed languages across different language families from the VoxLingua107 corpus. We then focus on extracting the acoustic correlates of speech rhythm that can distinguish between syllable-timed and stress-timed languages. We use the mel-frequency cepstral coefficient (MFCC) features and wav2vec 2.0 XLSR extracted representations with the ECAPA-TDNN architecture along with post-pooling embedding fusion using the proposed rhythm metrics to first develop a pre-stage language rhythm classifier. The embeddings derived from this classifier are then utilized during the training of the 14-class language classifier. Our results demonstrate that effective integration of linguistic information yields substantial LID performance improvement, reducing the equal error rate (EER) to 0.89%.

引用

页数：5

共 34 条

[1]

Abercrombie D., 2019, Elements of General Phonetics

[2] Language Identification: A Tutorial [J].