Efficient Corpus Creation Method for NLU Using Interview with Probing Questions

被引:0
作者
Shima, Kazuaki [1 ]
Homma, Takeshi [2 ]
Motohashi, Masataka [1 ]
Ikeshita, Rintaro [2 ]
Kokubo, Hiroaki [2 ]
Obuchi, Yasunari [3 ]
She, Jinhua [3 ]
机构
[1] Clarion Co Ltd, Chuo Ku, 7-2 Shintoshin, Saitama, Saitama 3300081, Japan
[2] Hitachi Ltd, Res & Dev Grp, 1-280 Higashi Koigakubo, Kokubunji, Tokyo 1858601, Japan
[3] Tokyo Univ Technol, 1404-1 Katakura, Hachioji, Tokyo 1920982, Japan
关键词
interview; natural language understanding; corpus; probing; morpheme; INTERFACE;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
This paper presents an efficient method to build a corpus to train natural language understanding (NLU) modules. Conventional corpus creation methods involve a common cycle: a subject is given a specific situation where the subject operates a device by voice, and then the subject speaks one utterance to execute the task. In these methods, many subjects are required in order to build a large-scale corpus, which causes a problem of increasing lead time and financial cost. To solve this problem, we propose to incorporate a "probing question" into the cycle. Specifically, after a subject speaks one utterance, the subject is asked to think of alternative utterances to execute the same task. In this way, we obtain many utterances from a small number of subjects. An evaluation of the proposed method applied to interview-based corpus creation shows that the proposed method reduces the number of subjects by 41% while maintaining morphological diversity in a corpus and morphological coverage for user utterances spoken to commercial devices. It also shows that the proposed method reduces the total time for interviewing subjects by 36% compared with the conventional method. We conclude that the proposed method can be used to build a useful corpus while reducing lead time and financial cost.
引用
收藏
页码:947 / 955
页数:9
相关论文
共 17 条
  • [1] Almond: The Architecture of an Open, Crowdsourced, Privacy-Preserving, Programmable Virtual Assistant
    Campagna, Giovanni
    Ramesh, Rakesh
    Xu, Silei
    Fischer, Michael
    Lam, Monica S.
    [J]. PROCEEDINGS OF THE 26TH INTERNATIONAL CONFERENCE ON WORLD WIDE WEB (WWW'17), 2017, : 341 - 350
  • [2] Coucke A., 2018, ARXIV180510190 818
  • [3] Goto J, 2004, IEICE T INF SYST, VE87D, P1397
  • [4] Hemphill C. T., 1990, P WORKSH SPEECH NAT, P96
  • [5] HIRSCHMAN L, 1992, SPEECH AND NATURAL LANGUAGE, P7
  • [6] In-Vehicle Voice Interface with Improved Utterance Classification Accuracy Using Off-the-Shelf Cloud Speech Recognizer
    Homma, Takeshi
    Obuchi, Yasunari
    Shima, Kazuaki
    Ikeshita, Rintaro
    Kokubo, Hiroaki
    Matsumoto, Takuya
    [J]. IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2018, E101D (12): : 3123 - 3137
  • [7] Homma T, 2016, IEEE W SP LANG TECH, P369, DOI 10.1109/SLT.2016.7846291
  • [8] Kim YB, 2018, PROCEEDINGS OF THE 56TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL), VOL 1, P2214
  • [9] Kurata G., 2010, IEICE T INF SYST, VJ93-D, P2107
  • [10] Roulston K.J., 2012, The sage encyclopedia of qualitative research methods, P582, DOI DOI 10.4135/9781412963909