COLOMBIAN DIALECT RECOGNITION BASED ON INFORMATION EXTRACTED FROM SPEECH AND TEXT SIGNALS

被引：4

作者：

Escobar-Grisales, D. ^{[1
]}

Rios-Urrego, C. D. ^{[1
]}

Lopez-Santander, D. A. ^{[1
]}

Gallo-Aristizabal, J. D. ^{[1
]}

Vasquez-Correa, J. C. ^{[1
,2
,3
]}

Noeth, E. ^{[2
]}

Orozco-Arroyave, J. R. ^{[1
,2
]}

机构：

[1] Univ Antioquia UdeA, Fac Engn, GITA Lab, Medellin, Colombia

[2] Friedrich Alexander Univ Erlangen Nurnberg, Pattern Recognit Lab, Nurnberg, Germany

[3] Pratech Grp, Medellin, Colombia

来源：

2021 IEEE AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING WORKSHOP (ASRU) | 2021年

关键词：

Dialect classification; Speech; Text; Customer Service; Acoustics; Language processing; LANGUAGE;

D O I：

10.1109/ASRU51503.2021.9687890

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Dialect recognition is useful in many industrial sectors, particularly with the aim of allowing a better interaction between customers and providers. The core idea is to improve or customize marketing and customer service strategies, depending on the geographic location, birthplace and culture. This study proposes different models to automatically discriminate between two Colombian dialects: "Antioquefio" and "Bogotano", to the best of our knowledge this is the first work of Colombian dialect recognition based on real conversations from customer service centers. The proposed strategy consists of independent analyses, using information from speech recordings and their corresponding transliterations. On the one hand, classical approaches are used to model speech including prosody features, Mel frequency cepstral coefficients and the mean Hilbert envelope coefficients. For text models, Word2Vec and bidirectional encoding representations from transformer embeddings are considered. On the other hand, a deep learning approach is applied by considering convolutional neural networks, which are trained using spectrograms and embedding matrices for speech and text, respectively. The implemented deep learning models seem to be more promising than the classical ones for the addressed problem. Further experiments will be considered to validate this claim in a wider spectrum of methods.

引用

页码：556 / 563

页数：8

共 23 条

[21] Automated Recognition of Imagined Commands From EEG Signals Using Multivariate Fast and Adaptive Empirical Mode Decomposition Based Method
Dash, Shaswati
Tripathy, Rajesh Kumar
Panda, Ganapati
Pachori, Ram Bilas
IEEE SENSORS LETTERS, 2022, 6 (02)
[22] Deep learning-based multimodal emotion recognition from audio, visual, and text modalities: A systematic review of recent advancements and future prospects
Zhang, Shiqing
Yang, Yijiao
Chen, Chen
Zhang, Xingnan
Leng, Qingming
Zhao, Xiaoming
EXPERT SYSTEMS WITH APPLICATIONS, 2024, 237
[23] A text-mining approach to obtain detailed treatment information from free-text fields in population-based cancer registries: A study of non-small cell lung cancer in California
Maguire, Frances B.
Morris, Cyllene R.
Parikh-Patel, Arti
Cress, Rosemary D.
Keegan, Theresa H. M.
Li, Chin-Shang
Lin, Patrick S.
Kizer, Kenneth W.
PLOS ONE, 2019, 14 (02):

← 1 2 3 →