Building a deep learning-based QA system from a CQA dataset

被引：2

作者：

Jin, Sol ^{[1
]}

Lian, Xu ^{[1
]}

Jung, Hanearl ^{[1
]}

Park, Jinsoo ^{[1
]}

Suh, Jihae ^{[2
]}

机构：

[1] Seoul Natl Univ, Coll Business Adm, Seoul, South Korea

[2] Seoul Natl Univ Sci & Technol, Coll Business Adm, Seoul, South Korea

来源：

DECISION SUPPORT SYSTEMS | 2023年 / 175卷

关键词：

Question answering (QA) system; Community question answering (CQA); BERT; T5; DECISION-SUPPORT; QUESTION; ANSWERS;

D O I：

10.1016/j.dss.2023.114038

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

A man-made machine-reading comprehension (MRC) dataset is necessary to train the answer extraction part of existing Question Answering (QA) systems. However, a high-quality and well-structured dataset with question-paragraph-answer pairs is not usually found in the real world. Furthermore, updating or building an MRC dataset is a challenging and costly affair. To address these shortcomings, we propose a QA system that uses a large-scale English Community Question Answering (CQA) dataset (i.e., Stack Exchange) composed of 3,081,834 question-answer pairs. The QA system adopts a classifier-retriever-summarizer structure design. The question classifier and the answer retriever part are based on a Bidirectional Encoder Representations from Transformers (BERT) Natural Language Processing (NLP) model by Google, and the summarizer part introduces a deep learning-based Text-to-Text Transfer Transformer (T5) model to summarize the long answers. We instantiated the proposed QA system with 140 topics from the CQA dataset (including topics such as biology, law, politics, etc.) and conducted human and automatic evaluations. Our system presented encouraging results, considering that it provides high-quality answers to the questions in the test set and satisfied the requirements to develop a QA system without MRC datasets. Our results show the potential of building automatic and high-performance QA systems without being limited by man-made datasets, a significant step forward in the research of open-domain or specific-domain QA systems.

引用

页数：12

共 50 条

[21] Deep Learning-Based Building Footprint Extraction With Missing Annotations
Kang, Jian
Fernandez-Beltran, Ruben
Sun, Xian
Ni, Jingen
Plaza, Antonio
IEEE GEOSCIENCE AND REMOTE SENSING LETTERS, 2022, 19
[22] Deep Learning-Based Building Extraction from Remote Sensing Images: A Comprehensive Review
Luo, Lin
Li, Pengpeng
Yan, Xuesong
ENERGIES, 2021, 14 (23)
[23] A Dataset Generation Tool for Deep learning-based Motion Planning in Complex Environments
Sarwar, Muhammad Usman
Sohail, Moman
Ud Din, Muhayy
Rosell, Jan
Qazi, Wajahat M.
2021 26TH IEEE INTERNATIONAL CONFERENCE ON EMERGING TECHNOLOGIES AND FACTORY AUTOMATION (ETFA), 2021,
[24] CAS Landslide Dataset: A Large-Scale and Multisensor Dataset for Deep Learning-Based Landslide Detection
Xu, Yulin
Ouyang, Chaojun
Xu, Qingsong
Wang, Dongpo
Zhao, Bo
Luo, Yutao
SCIENTIFIC DATA, 2024, 11 (01)
[25] CAS Landslide Dataset: A Large-Scale and Multisensor Dataset for Deep Learning-Based Landslide Detection
Yulin Xu
Chaojun Ouyang
Qingsong Xu
Dongpo Wang
Bo Zhao
Yutao Luo
Scientific Data, 11
[26] DeepICLogo: A Novel Benchmark Dataset for Deep Learning-Based IC Logo Detection
Ghosh, Shajib
Craig, Patrick
Julia, Jake
Varshney, Nitin
Dalir, Hamed
Asadizanjani, Navid
2023 IEEE PHYSICAL ASSURANCE AND INSPECTION OF ELECTRONICS, PAINE, 2023, : 66 - 73
[27] DeepBase: A Deep Learning-based Daily Baseflow Dataset across the United States
Ghaneei, Parnian
Moradkhani, Hamid
SCIENTIFIC DATA, 2025, 12 (01)
[28] A Benchmark Dataset and Deep Learning-Based Image Reconstruction for Electrical Capacitance Tomography
Zheng, Jin
Li, Jinku
Li, Yi
Peng, Lihui
SENSORS, 2018, 18 (11)
[29] Dynamic Dataset Augmentation for Deep Learning-based Oracle Bone Inscriptions Recognition
Yue, Xuebin
Li, Hengyi
Fujikawa, Yoshiyuki
Meng, Lin
ACM JOURNAL ON COMPUTING AND CULTURAL HERITAGE, 2022, 15 (04):
[30] Refining dataset curation methods for deep learning-based automated tuberculosis screening
Kim, Tae Kyung
Yi, Paul H.
Hager, Gregory D.
Lin, Cheng Ting
JOURNAL OF THORACIC DISEASE, 2020, 12 (09) : 5078 - 5085

← 1 2 3 4 5 →