Spoken document retrieval using both word-based and syllable-based document spaces with latent semantic indexing

被引:0
|
作者
Ichikawa, Ken [1 ]
Tsuge, Satoru [2 ]
Kitaoka, Norihide [1 ]
Takeda, Kazuya [1 ]
Kita, Kenji [3 ]
机构
[1] Nagoya Univ, Nagoya, Aichi 4648601, Japan
[2] Daido Univ, Nagoya, Aichi, Japan
[3] Univ Tokushima, Tokushima, Japan
来源
2013 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA) | 2013年
基金
日本科学技术振兴机构;
关键词
D O I
暂无
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
In this paper, we propose a spoken document retrieval method using vector space models in multiple document spaces. First we construct multiple document vector spaces, one of which is based on continuous-word speech recognition results and the other on continuous-syllable speech recognition results. Query expansion is also applied to the word-based document space. We proposed to apply latent semantic indexing (LSI) not only to the word-based space but also to the syllable-based space, to reduce dimensionality of the spaces using implicitly defined semantics. Finally, we combine the distances and compare the distance between the query and the available documents in various spaces to rank the documents. In this procedure, we propose to model the document by hyperplane. To evaluate our proposed method, we conducted spoken document retrieval experiments using the NTCIR-9 SpokenDoc data set. The results showed that using the combination of the distances, and using LSI on the syllable-based document space, improved retrieval performance.
引用
收藏
页数:5
相关论文
共 50 条
  • [1] Using Latent Semantic Indexing for Morph-based Spoken Document Retrieval
    Turunen, Ville T.
    Kurimo, Mikko
    INTERSPEECH 2006 AND 9TH INTERNATIONAL CONFERENCE ON SPOKEN LANGUAGE PROCESSING, VOLS 1-5, 2006, : 341 - 344
  • [2] Syllable-based Chinese text/spoken document retrieval using text/speech queries
    Bai, BR
    Chen, BL
    Wang, HM
    INTERNATIONAL JOURNAL OF PATTERN RECOGNITION AND ARTIFICIAL INTELLIGENCE, 2000, 14 (05) : 603 - 616
  • [3] Framework for document retrieval using latent semantic indexing
    Phadnis, Neelam
    Gadge, Jayant
    International Journal of Computers and Applications, 2014, 94 (14) : 37 - 41
  • [4] Spoken Document Retrieval Based on Confusion Network with Syllable Fragments
    Lei, Zhang
    Gotoh, Yoshihiko
    Khan, Muhammad Usman Ghani
    INTERNATIONAL JOURNAL OF ADVANCED ROBOTIC SYSTEMS, 2012, 9
  • [5] Mandarin spoken document retrieval based on syllable lattice matching
    Wang, HM
    PATTERN RECOGNITION LETTERS, 2000, 21 (6-7) : 615 - 624
  • [6] Document Classification Method based on Latent Semantic Indexing
    Kim, Jeong-Joon
    Lee, Yong-Soo
    Moon, Jin-Yong
    Park, Jeong-Min
    INTERNATIONAL JOURNAL OF GRID AND DISTRIBUTED COMPUTING, 2018, 11 (04): : 97 - 112
  • [7] Word image based latent semantic indexing for conceptual querying in document image databases
    Banerjee, Sameek
    Harit, Gaurav
    Chaudhury, Santanu
    ICDAR 2007: NINTH INTERNATIONAL CONFERENCE ON DOCUMENT ANALYSIS AND RECOGNITION, VOLS I AND II, PROCEEDINGS, 2007, : 1208 - 1212
  • [8] Posterior probability based indexing method for Chinese spoken document retrieval
    Zheng, Tie-Ran
    Han, Ji-Qing
    Harbin Gongye Daxue Xuebao/Journal of Harbin Institute of Technology, 2009, 41 (08): : 97 - 102
  • [9] A New Syllable-lattice Based Approach for Mandarin Spoken Document Retrieval
    Zhang, Lei
    Gao, Yunxia
    Xiang, Xuezhi
    Lu, Dong
    2009 INTERNATIONAL CONFERENCE ON WIRELESS COMMUNICATIONS AND SIGNAL PROCESSING (WCSP 2009), 2009, : 1175 - 1178
  • [10] Chinese spoken document retrieval based on syllable neighbor posterior probability matrix
    Zheng, Tieran
    Han, Jiqing
    2008 INTERNATIONAL CONFERENCE ON AUDIO, LANGUAGE AND IMAGE PROCESSING, VOLS 1 AND 2, PROCEEDINGS, 2008, : 1209 - 1213