Automatic Stop List Generation for Clustering Recognition Results of Call Center Recordings

被引:0
|
作者
Popova, Svetlana [1 ,3 ]
Krivosheeva, Tatiana [2 ]
Korenevsky, Maxim [2 ]
机构
[1] St Petersburg State Univ, St Petersburg 199034, Russia
[2] STC innovat Ltd, Petersburg, VA USA
[3] Scrol, St Petersburg, Russia
来源
SPEECH AND COMPUTER | 2014年 / 8773卷
关键词
clustering; stop words; stop list generation; ASR;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The paper deals with the problem of automatic stop list generation for processing recognition results of call center recordings, in particular for the purpose of clustering. We propose and test a supervised domain dependent method of automatic stop list generation. The method is based on finding words whose removal increases the dissimilarity between documents in different clusters, and decreases dissimilarity between documents within the same cluster. This approach is shown to be efficient for clustering recognition results of recordings with different quality, both on datasets that contain the same topics as the training dataset, and on datasets containing other topics.
引用
收藏
页码:137 / 144
页数:8
相关论文
共 2 条
  • [1] Robust Automatic Speech Recognition for Call Center Applications
    Felipe Parra-Gallego, Luis
    Arias-Vergara, Tomas
    Orozco Arroyave, Juan Rafael
    APPLIED COMPUTER SCIENCES IN ENGINEERING, WEA 2021, 2021, 1431 : 72 - 83
  • [2] Semi-automatic ground truth generation using unsupervised clustering and limited manual labeling: Application to handwritten character recognition
    Vajda, Szilard
    Rangoni, Yves
    Cecotti, Hubert
    PATTERN RECOGNITION LETTERS, 2015, 58 : 23 - 28