When, Where, Who, What or Why? A Hybrid Model to Question Answering Systems

被引:3
作者
Cortes, Eduardo G. [1 ]
Woloszyn, Vinicius [1 ]
Barone, Dante A. C. [1 ]
机构
[1] Fed Univ Rio Grande Do Sul UFRGS, Inst Informat, PPGC, Caixa Postal 15-064, BR-91501970 Porto Alegre, RS, Brazil
来源
COMPUTATIONAL PROCESSING OF THE PORTUGUESE LANGUAGE, PROPOR 2018 | 2018年 / 11122卷
关键词
Question answering; Question classification; Word embedding; CLASSIFICATION; CLEF;
D O I
10.1007/978-3-319-99722-3_14
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Question Answering Systems is a field of Information Retrieval and Natural Language Processing that automatically answers questions posed by humans in a natural language. One of the main steps of these systems is the Question Classification, where the system tries to identify the type of question (i.e. if it is related to a person, time or a location) facilitate the generation of a precise answer. Machine learning techniques are commonly employed in tasks where the text is represented as a vector of features, such as bag-of-words, Term Frequency-Inverse Document Frequency (TF-IDF) or word embeddings. However, the quality of results produced by supervised algorithms is dependent on the existence of a large, domain-dependent training dataset which sometimes is unavailable due to labor-intense of manual annotation of datasets. Normally, word embedding presents a related better performance on small training sets, while bag-of-words and TF-IDF presents better results on large training sets. In this work, we propose a hybrid model that combines TF-IDF and word embedding in order to provide the answer type to text questions using small and large training sets. Our experiments using the Portuguese language, using several different sizes of training sets, showed that the proposed hybrid model statistically outperforms bag-of-words, TF-IDF, and word embedding approaches.
引用
收藏
页码:136 / 146
页数:11
相关论文
共 28 条
  • [1] Amaral C, 2008, LECT NOTES COMPUT SC, V5152, P364, DOI 10.1007/978-3-540-85760-0_46
  • [2] [Anonymous], 2001, P INT C MACH LEARN I
  • [3] Building a Question-Answering Corpus Using Social Media and News Articles
    Cavalin, Paulo
    Figueiredo, Flavio
    de Bayser, Maira
    Moyano, Luis
    Candello, Heloisa
    Appel, Ana
    Souza, Renan
    [J]. COMPUTATIONAL PROCESSING OF THE PORTUGUESE LANGUAGE (PROPOR 2016), 2016, 9727 : 353 - 358
  • [4] Christopher Shulby, 2017, P 11 BRAZ S INF HUM, P122
  • [5] dos Santos H.D., 2018, IEEE J BIOMED HEALTH
  • [6] Freitas C., 2010, LREC CITESEER
  • [7] A Comparative Evaluation of QA Systems over List Questions
    Goncalves, Patricia Nunes
    Branco, Antonio Horta
    [J]. COMPUTATIONAL PROCESSING OF THE PORTUGUESE LANGUAGE (PROPOR 2016), 2016, 9727 : 115 - 121
  • [8] Hovy Eduard., 2002, Proceedings of the Second International Conference on Human Language Technology Research, HLT '02, P247
  • [9] Huang Zhiheng, 2008, P 2008 C EMPIRICAL M
  • [10] Jinzhong Xu, 2012, 2012 Sixth International Conference on Internet Computing for Science and Engineering, P31, DOI 10.1109/ICICSE.2012.49