Sounds of speech based spoken document categorization: A subword representation method

被引:0
|
作者
Qu, WD [1 ]
Shirai, K [1 ]
机构
[1] Waseda Univ, Sch Sci & Engn, Tokyo 1698555, Japan
来源
关键词
spoken document categorization; subword unit representations; sleeping experts algorithins; machine learning;
D O I
暂无
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
In this paper, we explore a method to the problem of spoken document categorization, which is the task of automatically assigning spoken documents into a set of predetermined categories. To categorize spoken documents, subword unit representations are used as an alternative to word units generated by either keyword spotting or large vocabulary continuous speech recognition (LVCSR). An advantage of using subword acoustic unit representations to spoken document categorization is that it does not require prior knowledge about the contents of the spoken documents and addresses the out of vocabulary (OOV) problem. Moreover, this method works in reliance on the sounds of speech rather than exact orthography. The use of subword units instead of words allows, approximate matching on inaccurate transcriptions, makes "sounds-like" spoken document categorization possible. We also explore the performance of our method when the training set contains both perfect and errorful phonetic transcriptions, and hope the classifiers can learn from the confusion characteristics of recognizer and pronunciation variants of words to improve the robustness of whole system. Our experiments based on both artificial and real corrupted data sets show that the proposed method is more effective and robust than the word based method.
引用
收藏
页码:1175 / 1184
页数:10
相关论文
共 50 条
  • [1] Sounds of speech based spoken document categorization: A subword representation method
    Qu, Weidong
    Shirai, Katsuhiko
    IEICE Transactions on Information and Systems, 2004, E87-D (05) : 1175 - 1184
  • [2] Subword-based approaches for spoken document retrieval
    Ng, K
    Zue, VW
    SPEECH COMMUNICATION, 2000, 32 (03) : 157 - 186
  • [3] OPEN VOCABULARY SPOKEN DOCUMENT RETRIEVAL BY SUBWORD SEQUENCE OBTAINED FROM SPEECH RECOGNIZER
    Kuriki, Go
    Itoh, Yoshiaki
    Kojima, Kazunori
    Ishigame, Masaaki
    Tanaka, Kazuyo
    Lee, Shi-wook
    2008 IEEE WORKSHOP ON SPOKEN LANGUAGE TECHNOLOGY: SLT 2008, PROCEEDINGS, 2008, : 301 - +
  • [4] Open-Vocabulary Spoken Document Retrieval based on new subword models and subword phonetic similarity
    Iwata, Kohei
    Itoh, Yoshiaki
    Kojima, Kazunori
    Ishigame, Masaaki
    Tanaka, Kazuyo
    Lee, Shi-wook
    INTERSPEECH 2006 AND 9TH INTERNATIONAL CONFERENCE ON SPOKEN LANGUAGE PROCESSING, VOLS 1-5, 2006, : 325 - +
  • [5] The frequency of occurrence of speech sounds in spoken English
    French, NR
    Koenig, W
    JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 1929, 1 (01): : 110 - 120
  • [6] PSYCHOLOGICAL REPRESENTATION OF SPEECH SOUNDS
    LACKNER, JR
    GOLDSTEIN, LM
    QUARTERLY JOURNAL OF EXPERIMENTAL PSYCHOLOGY, 1975, 27 (MAY): : 173 - 185
  • [7] Speech sounds and their representation for diagnosis
    Ptok, M.
    HNO, 2009, 57 (10) : 1057 - 1063
  • [8] Speech segmentation and spoken document processing
    University of Washington, Washington, DC, United States
    不详
    不详
    不详
    不详
    不详
    不详
    不详
    不详
    不详
    不详
    不详
    IEEE Signal Process Mag, 2008, 3 (59-69):
  • [9] An Integration Method of Retrieval Results using Plural Subword Models for Vocabulary-free Spoken Document Retrieval
    Itoh, Yoshiaki
    Iwata, Kohei
    Kojima, Kazunori
    Ishigame, Masaaki
    Tanaka, Kazuyo
    Lee, Shi-wook
    INTERSPEECH 2007: 8TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION, VOLS 1-4, 2007, : 581 - +
  • [10] Combining multiple subword representations for open-vocabulary spoken document retrieval
    Lee, SW
    Tanaka, K
    Itoh, Y
    2005 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS 1-5: SPEECH PROCESSING, 2005, : 505 - 508