Sounds of speech based spoken document categorization: A subword representation method

被引：0

作者：

Qu, WD ^{[1
]}

Shirai, K ^{[1
]}

机构：

[1] Waseda Univ, Sch Sci & Engn, Tokyo 1698555, Japan

来源：

IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS | 2004年 / E87D卷 / 05期

关键词：

spoken document categorization; subword unit representations; sleeping experts algorithins; machine learning;

D O I：

暂无

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

In this paper, we explore a method to the problem of spoken document categorization, which is the task of automatically assigning spoken documents into a set of predetermined categories. To categorize spoken documents, subword unit representations are used as an alternative to word units generated by either keyword spotting or large vocabulary continuous speech recognition (LVCSR). An advantage of using subword acoustic unit representations to spoken document categorization is that it does not require prior knowledge about the contents of the spoken documents and addresses the out of vocabulary (OOV) problem. Moreover, this method works in reliance on the sounds of speech rather than exact orthography. The use of subword units instead of words allows, approximate matching on inaccurate transcriptions, makes "sounds-like" spoken document categorization possible. We also explore the performance of our method when the training set contains both perfect and errorful phonetic transcriptions, and hope the classifiers can learn from the confusion characteristics of recognizer and pronunciation variants of words to improve the robustness of whole system. Our experiments based on both artificial and real corrupted data sets show that the proposed method is more effective and robust than the word based method.

引用

页码：1175 / 1184

页数：10

共 50 条

[1] Sounds of speech based spoken document categorization: A subword representation method
Qu, Weidong
Shirai, Katsuhiko
IEICE Transactions on Information and Systems, 2004, E87-D (05) : 1175 - 1184
[2] Subword-based approaches for spoken document retrieval
Ng, K
Zue, VW
SPEECH COMMUNICATION, 2000, 32 (03) : 157 - 186
[3] OPEN VOCABULARY SPOKEN DOCUMENT RETRIEVAL BY SUBWORD SEQUENCE OBTAINED FROM SPEECH RECOGNIZER
Kuriki, Go
Itoh, Yoshiaki
Kojima, Kazunori
Ishigame, Masaaki
Tanaka, Kazuyo
Lee, Shi-wook
2008 IEEE WORKSHOP ON SPOKEN LANGUAGE TECHNOLOGY: SLT 2008, PROCEEDINGS, 2008, : 301 - +
[4] Open-Vocabulary Spoken Document Retrieval based on new subword models and subword phonetic similarity
Iwata, Kohei
Itoh, Yoshiaki
Kojima, Kazunori
Ishigame, Masaaki
Tanaka, Kazuyo
Lee, Shi-wook
INTERSPEECH 2006 AND 9TH INTERNATIONAL CONFERENCE ON SPOKEN LANGUAGE PROCESSING, VOLS 1-5, 2006, : 325 - +
[5] The frequency of occurrence of speech sounds in spoken English
French, NR
Koenig, W
JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 1929, 1 (01): : 110 - 120
[6] PSYCHOLOGICAL REPRESENTATION OF SPEECH SOUNDS
LACKNER, JR
GOLDSTEIN, LM
QUARTERLY JOURNAL OF EXPERIMENTAL PSYCHOLOGY, 1975, 27 (MAY): : 173 - 185
[7] Speech sounds and their representation for diagnosis
Ptok, M.
HNO, 2009, 57 (10) : 1057 - 1063
[8] Speech segmentation and spoken document processing
University of Washington, Washington, DC, United States
不详
不详
不详
不详
不详
不详
不详
不详
不详
不详
不详
IEEE Signal Process Mag, 2008, 3 (59-69):
[9] An Integration Method of Retrieval Results using Plural Subword Models for Vocabulary-free Spoken Document Retrieval
Itoh, Yoshiaki
Iwata, Kohei
Kojima, Kazunori
Ishigame, Masaaki
Tanaka, Kazuyo
Lee, Shi-wook
INTERSPEECH 2007: 8TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION, VOLS 1-4, 2007, : 581 - +
[10] Combining multiple subword representations for open-vocabulary spoken document retrieval
Lee, SW
Tanaka, K
Itoh, Y
2005 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS 1-5: SPEECH PROCESSING, 2005, : 505 - 508

← 1 2 3 4 5 →