Intent Identification in Unattended Customer Queries Using an Unsupervised Approach

被引:1
作者
Rebelo, Hugo D. [1 ]
de Oliveira, Lucas A. F. [2 ]
Almeida, Gustavo M. [3 ]
Sotomayor, Cesar A. M. [1 ]
Rochocz, Geraldo L. [1 ]
Melo, Willian E. D. [4 ]
机构
[1] Passeio Corp, Radix Engn & Software, R Passeio 38,Tower 2, BR-20021290 Rio De Janeiro, RJ, Brazil
[2] Radix Engn & Software, R Santa Rita Durao 444, BR-30140110 Belo Horizonte, MG, Brazil
[3] Univ Fed Minas Gerais, Sch Engn, Dept Chem Engn, Av Antonio Carlos 6627, BR-31270901 Belo Horizonte, MG, Brazil
[4] Cemig Distribut SA, Av Barbacena 1200, BR-30190924 Belo Horizonte, MG, Brazil
关键词
Customer behaviour; customer intent; unsupervised learning; information system; text analytics; ML; LATENT; BUSINESS; REVIEWS; QUALITY;
D O I
10.1142/S0219649221500374
中图分类号
G25 [图书馆学、图书馆事业]; G35 [情报学、情报工作];
学科分类号
1205 ; 120501 ;
摘要
Customer's satisfaction is crucial for companies worldwide. An integrated strategy composes omnichannel communication systems, in which chabot is widely used. This system is supervised, and the key point is that the required training data are originally unlabelled. Labelling data manually is unfeasible mainly nowadays due to the considerable volume. Moreover, customer behaviour is often hidden in the data even for experts. This work proposes a methodology to find unknown entities and intents automatically using unsupervised learning. This is based on natural language processing (NLP) for text data preparation and on machine learning (ML) for clustering model identification. Several combinations for preprocessing, vectorisation, dimensionality reduction and clustering techniques, were investigated. The case study refers to a Brazilian electric energy company, with a data set of failed customer queries, that is, not met by the company for any reason. They correspond to about 30% (4,044 queries) of the original data set. The best identified intent model employed stemming for preprocessing, word frequency analysis for vectorisation, latent Dirichlet allocation (LDA) for dimensionality reduction, and mini-batch k-means for clustering. This system was able to allocate 62% of the failed queries in one of the seven found intents. For instance, this new labelled data can be used for the training of NLP-based chatbots contributing to a greater generalisation capacity, and ultimately, to increase customer satisfaction.
引用
收藏
页数:26
相关论文
共 50 条
  • [1] Abdul-Kader SA, 2015, INT J ADV COMPUT SC, V6, P72
  • [2] An information-theoretic perspective of tf-idf measures
    Aizawa, A
    [J]. INFORMATION PROCESSING & MANAGEMENT, 2003, 39 (01) : 45 - 65
  • [3] [Anonymous], 2014, NATURAL LANGUAGE PRO
  • [4] A survey of ontology learning techniques and applications
    Asim, Muhammad Nabeel
    Wasim, Muhammad
    Khan, Muhammad Usman Ghani
    Mahmood, Waqar
    Abbasi, Hafiza Mahnoor
    [J]. DATABASE-THE JOURNAL OF BIOLOGICAL DATABASES AND CURATION, 2018,
  • [5] Bellman R., 1961, Adaptive Control Processes: a Guided Tour, DOI DOI 10.1515/9781400874668
  • [6] Latent Dirichlet allocation
    Blei, DM
    Ng, AY
    Jordan, MI
    [J]. JOURNAL OF MACHINE LEARNING RESEARCH, 2003, 3 (4-5) : 993 - 1022
  • [7] Bradesko L., 2011, P SLOV LANG TECHN SO, P1
  • [8] On the resemblance and containment of documents
    Broder, AZ
    [J]. COMPRESSION AND COMPLEXITY OF SEQUENCES 1997 - PROCEEDINGS, 1998, : 21 - 29
  • [9] A comparison of PCA, KPCA and ICA for dimensionality reduction in support vector machine
    Cao, LJ
    Chua, KS
    Chong, WK
    Lee, HP
    Gu, QM
    [J]. NEUROCOMPUTING, 2003, 55 (1-2) : 321 - 336
  • [10] Cavnar WB, 1994, P SDAIR 94 3 ANN S D, P161