When, Where, Who, What or Why? A Hybrid Model to Question Answering Systems

被引：3

作者：

Cortes, Eduardo G. ^{[1
]}

Woloszyn, Vinicius ^{[1
]}

Barone, Dante A. C. ^{[1
]}

机构：

[1] Fed Univ Rio Grande Do Sul UFRGS, Inst Informat, PPGC, Caixa Postal 15-064, BR-91501970 Porto Alegre, RS, Brazil

来源：

COMPUTATIONAL PROCESSING OF THE PORTUGUESE LANGUAGE, PROPOR 2018 | 2018年 / 11122卷

关键词：

Question answering; Question classification; Word embedding; CLASSIFICATION; CLEF;

D O I：

10.1007/978-3-319-99722-3_14

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Question Answering Systems is a field of Information Retrieval and Natural Language Processing that automatically answers questions posed by humans in a natural language. One of the main steps of these systems is the Question Classification, where the system tries to identify the type of question (i.e. if it is related to a person, time or a location) facilitate the generation of a precise answer. Machine learning techniques are commonly employed in tasks where the text is represented as a vector of features, such as bag-of-words, Term Frequency-Inverse Document Frequency (TF-IDF) or word embeddings. However, the quality of results produced by supervised algorithms is dependent on the existence of a large, domain-dependent training dataset which sometimes is unavailable due to labor-intense of manual annotation of datasets. Normally, word embedding presents a related better performance on small training sets, while bag-of-words and TF-IDF presents better results on large training sets. In this work, we propose a hybrid model that combines TF-IDF and word embedding in order to provide the answer type to text questions using small and large training sets. Our experiments using the Portuguese language, using several different sizes of training sets, showed that the proposed hybrid model statistically outperforms bag-of-words, TF-IDF, and word embedding approaches.

引用

页码：136 / 146

页数：11

共 28 条

[1] Amaral C, 2008, LECT NOTES COMPUT SC, V5152, P364, DOI 10.1007/978-3-540-85760-0_46
[2] [Anonymous], 2001, P INT C MACH LEARN I
[3] Building a Question-Answering Corpus Using Social Media and News Articles
Cavalin, Paulo
Figueiredo, Flavio
de Bayser, Maira
Moyano, Luis
Candello, Heloisa
Appel, Ana
Souza, Renan
[J]. COMPUTATIONAL PROCESSING OF THE PORTUGUESE LANGUAGE (PROPOR 2016), 2016, 9727 : 353 - 358
[4] Christopher Shulby, 2017, P 11 BRAZ S INF HUM, P122
[5] dos Santos H.D., 2018, IEEE J BIOMED HEALTH
[6] Freitas C., 2010, LREC CITESEER
[7] A Comparative Evaluation of QA Systems over List Questions
Goncalves, Patricia Nunes
Branco, Antonio Horta
[J]. COMPUTATIONAL PROCESSING OF THE PORTUGUESE LANGUAGE (PROPOR 2016), 2016, 9727 : 115 - 121
[8] Hovy Eduard., 2002, Proceedings of the Second International Conference on Human Language Technology Research, HLT '02, P247
[9] Huang Zhiheng, 2008, P 2008 C EMPIRICAL M
[10] Jinzhong Xu, 2012, 2012 Sixth International Conference on Internet Computing for Science and Engineering, P31, DOI 10.1109/ICICSE.2012.49

← 1 2 3 →