Colombian Dialect Recognition from Call-Center Conversations Using Fusion Strategies

被引:1
作者
Escobar-Grisales, D. [1 ]
Rios-Urrego, C. D. [1 ]
Gallo-Aristizabal, J. D. [1 ]
Lopez-Santander, D. A. [1 ]
Calvo-Ariza, N. R. [1 ]
Noth, Elmar [2 ]
Orozco-Arroyave, J. R. [1 ,2 ]
机构
[1] Univ Antioquia UdeA, GITA Lab, Fac Engn, Medellin, Colombia
[2] Friedrich Alexander Univ Erlangen Nurnberg, Pattern Recognit Lab, Erlangen, Germany
来源
APPLIED COMPUTER SCIENCES IN ENGINEERING, WEA 2022 | 2022年 / 1685卷
关键词
Dialect recognition; Customer service; Speech analysis; Text analysis; Fusion strategies; SPEECH; CLASSIFICATION;
D O I
10.1007/978-3-031-20611-5_5
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Automatic dialect recognition is a challenging problem with many applications in different fields. Particularly, in customer service, it is used to improve the interaction between customers and providers or to segment offers such that products and services are exposed to the group of greatest interest. This study proposes a bi-modal analysis to discriminate between two Colombian dialects ("Antioqueno" and "Bogotano") using text and speech signals generated in real call-center conversations. First, we evaluated uni-modal strategies considering classical and deep approaches to analyze speech recordings and their corresponding transliterations (text). Then, different fusion strategies were considered to combine the information from speech and text in different stages of the methodology. In early fusion, the uni-modal feature vectors are concatenated and used as input to generate a new model. In late fusion, the scores resulting from the uni-modal classifications are concatenated to form the features vectors to train a support vector machine to predict the dialect. Furthermore, a joint fusion strategy was tested to enhance the characterization process of one mode based on the information from the other one. The results indicate that bi-modal approaches using the late fusion strategy outperform uni-modal approaches by up to 9%. To the best of our knowledge, this is the first work of Colombian dialect recognition based on call-center conversations using a bi-modal approach. Future experiments will consider other fusion strategies where information from text and speech are synchronously merged to improve the dialect classification.
引用
收藏
页码:54 / 65
页数:12
相关论文
共 27 条
[1]  
Akhtyamova L., 2017, CLEF WORKING NOTES
[2]  
Al-Azani S, 2019, 2019 IEEE SYMPOSIUM SERIES ON COMPUTATIONAL INTELLIGENCE (IEEE SSCI 2019), P2470, DOI 10.1109/SSCI44817.2019.9003031
[3]  
[Anonymous], 2013, Short Papers
[4]  
Ben Abdallah N, 2020, PROCEEDINGS OF THE 12TH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION (LREC 2020), P6405
[5]  
Chittaragi NB, 2017, INT CONF CONTEMP, P178
[6]   COLOMBIAN DIALECT RECOGNITION BASED ON INFORMATION EXTRACTED FROM SPEECH AND TEXT SIGNALS [J].
Escobar-Grisales, D. ;
Rios-Urrego, C. D. ;
Lopez-Santander, D. A. ;
Gallo-Aristizabal, J. D. ;
Vasquez-Correa, J. C. ;
Noeth, E. ;
Orozco-Arroyave, J. R. .
2021 IEEE AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING WORKSHOP (ASRU), 2021, :556-563
[7]  
Escobar-Grisales Daniel, 2021, TecnoL., V24, P212, DOI 10.22430/22565337.2166
[8]  
Hall DL, 1997, P IEEE, V85, P6, DOI 10.1109/ISCAS.1998.705329
[9]   Identity Mappings in Deep Residual Networks [J].
He, Kaiming ;
Zhang, Xiangyu ;
Ren, Shaoqing ;
Sun, Jian .
COMPUTER VISION - ECCV 2016, PT IV, 2016, 9908 :630-645
[10]  
Kuncheva L.I., 2014, Combining Pattern Classifiers: Methods and Algorithms, V2nd ed.