Combination of Key Information Extracting with Spoken Document Classification Based on Lattice

被引:0
作者
Zhang, Lei [1 ]
Zhang, Zhuo [1 ]
Xiang, Xue-zhi [1 ]
机构
[1] Harbin Engn Univ, Informat & Commun Engn Coll, Harbin, Peoples R China
来源
COMPUTER SCIENCE FOR ENVIRONMENTAL ENGINEERING AND ECOINFORMATICS, PT 2 | 2011年 / 159卷
关键词
Spoken Document Classification; Key Information Extraction; Lattice; RETRIEVAL;
D O I
暂无
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
Traditionally. the query words in spoken document classification are generated by manual. Here, based on CHI. TFIDF and maximum poster probability (MPP) features, key information extraction is combined with spoken document classification system, where different class has different topic. From the extraction, the weights of the same key word in each topic may be distinct. These weights which reveal the relationship between the word and topic can be taken part in spoken document classification system. Additionally, in the classification system, document length information is adopted when no query is found. The whole classification system is based on lattice, which has more information than 1-best result in speech recognition system. Among CHI, TFIDF and MPP, the system performance of MPP is a little worse than the others. CHI is a little better than TFIDF when the key words number is increasing. Experiments show that when the system is combined weight and document length information, the best performance can achieve 0.769 MAP.
引用
收藏
页码:236 / 241
页数:6
相关论文
共 11 条
  • [1] Blanco R, 2008, LECT NOTES COMPUT SC, V4956, P394
  • [2] Chang YL, 2009, INT CONF ACOUST SPEE, P1689, DOI 10.1109/ICASSP.2009.4959927
  • [3] CHEN B, 2004, ACM T ASIAN LANGUAGE, V3, P128
  • [4] Chen BL, 2000, INT CONF ACOUST SPEE, P1771, DOI 10.1109/ICASSP.2000.862096
  • [5] Topic identification from audio recordings using word and phone recognition lattices
    Hazen, Timothy J.
    Richardson, Fred
    Margolis, Anna
    [J]. 2007 IEEE WORKSHOP ON AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING, VOLS 1 AND 2, 2007, : 659 - 664
  • [6] Malin Jane T., 2007, 2007 IEEE Aerospace Conference, P1, DOI 10.1109/AERO.2007.352806
  • [7] IMPROVED LATTICE-BASED SPOKEN DOCUMENT RETRIEVAL BY DIRECTLY LEARNING FROM THE EVALUATION MEASURES
    Meng, Chao-hong
    Lee, Hung-yi
    Lee, Lin-shan
    [J]. 2009 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS 1- 8, PROCEEDINGS, 2009, : 4893 - +
  • [8] EFFICIENT SUBWORD LATTICE RETRIEVAL FOR GERMAN SPOKEN TERM DETECTION
    Mertens, Timo
    Schneider, Daniel
    [J]. 2009 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS 1- 8, PROCEEDINGS, 2009, : 4885 - +
  • [9] Yang Y- M, 1997, P ICML 14, P12
  • [10] A study of smoothing methods for language models applied to information retrieval
    Zhai, CX
    Lafferty, J
    [J]. ACM TRANSACTIONS ON INFORMATION SYSTEMS, 2004, 22 (02) : 179 - 214