Building a deep learning-based QA system from a CQA dataset

被引:2
|
作者
Jin, Sol [1 ]
Lian, Xu [1 ]
Jung, Hanearl [1 ]
Park, Jinsoo [1 ]
Suh, Jihae [2 ]
机构
[1] Seoul Natl Univ, Coll Business Adm, Seoul, South Korea
[2] Seoul Natl Univ Sci & Technol, Coll Business Adm, Seoul, South Korea
关键词
Question answering (QA) system; Community question answering (CQA); BERT; T5; DECISION-SUPPORT; QUESTION; ANSWERS;
D O I
10.1016/j.dss.2023.114038
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
A man-made machine-reading comprehension (MRC) dataset is necessary to train the answer extraction part of existing Question Answering (QA) systems. However, a high-quality and well-structured dataset with question-paragraph-answer pairs is not usually found in the real world. Furthermore, updating or building an MRC dataset is a challenging and costly affair. To address these shortcomings, we propose a QA system that uses a large-scale English Community Question Answering (CQA) dataset (i.e., Stack Exchange) composed of 3,081,834 question-answer pairs. The QA system adopts a classifier-retriever-summarizer structure design. The question classifier and the answer retriever part are based on a Bidirectional Encoder Representations from Transformers (BERT) Natural Language Processing (NLP) model by Google, and the summarizer part introduces a deep learning-based Text-to-Text Transfer Transformer (T5) model to summarize the long answers. We instantiated the proposed QA system with 140 topics from the CQA dataset (including topics such as biology, law, politics, etc.) and conducted human and automatic evaluations. Our system presented encouraging results, considering that it provides high-quality answers to the questions in the test set and satisfied the requirements to develop a QA system without MRC datasets. Our results show the potential of building automatic and high-performance QA systems without being limited by man-made datasets, a significant step forward in the research of open-domain or specific-domain QA systems.
引用
收藏
页数:12
相关论文
共 50 条
  • [21] Deep Learning-Based Building Footprint Extraction With Missing Annotations
    Kang, Jian
    Fernandez-Beltran, Ruben
    Sun, Xian
    Ni, Jingen
    Plaza, Antonio
    IEEE GEOSCIENCE AND REMOTE SENSING LETTERS, 2022, 19
  • [22] Deep Learning-Based Building Extraction from Remote Sensing Images: A Comprehensive Review
    Luo, Lin
    Li, Pengpeng
    Yan, Xuesong
    ENERGIES, 2021, 14 (23)
  • [23] A Dataset Generation Tool for Deep learning-based Motion Planning in Complex Environments
    Sarwar, Muhammad Usman
    Sohail, Moman
    Ud Din, Muhayy
    Rosell, Jan
    Qazi, Wajahat M.
    2021 26TH IEEE INTERNATIONAL CONFERENCE ON EMERGING TECHNOLOGIES AND FACTORY AUTOMATION (ETFA), 2021,
  • [24] CAS Landslide Dataset: A Large-Scale and Multisensor Dataset for Deep Learning-Based Landslide Detection
    Xu, Yulin
    Ouyang, Chaojun
    Xu, Qingsong
    Wang, Dongpo
    Zhao, Bo
    Luo, Yutao
    SCIENTIFIC DATA, 2024, 11 (01)
  • [25] CAS Landslide Dataset: A Large-Scale and Multisensor Dataset for Deep Learning-Based Landslide Detection
    Yulin Xu
    Chaojun Ouyang
    Qingsong Xu
    Dongpo Wang
    Bo Zhao
    Yutao Luo
    Scientific Data, 11
  • [26] DeepICLogo: A Novel Benchmark Dataset for Deep Learning-Based IC Logo Detection
    Ghosh, Shajib
    Craig, Patrick
    Julia, Jake
    Varshney, Nitin
    Dalir, Hamed
    Asadizanjani, Navid
    2023 IEEE PHYSICAL ASSURANCE AND INSPECTION OF ELECTRONICS, PAINE, 2023, : 66 - 73
  • [27] DeepBase: A Deep Learning-based Daily Baseflow Dataset across the United States
    Ghaneei, Parnian
    Moradkhani, Hamid
    SCIENTIFIC DATA, 2025, 12 (01)
  • [28] A Benchmark Dataset and Deep Learning-Based Image Reconstruction for Electrical Capacitance Tomography
    Zheng, Jin
    Li, Jinku
    Li, Yi
    Peng, Lihui
    SENSORS, 2018, 18 (11)
  • [29] Dynamic Dataset Augmentation for Deep Learning-based Oracle Bone Inscriptions Recognition
    Yue, Xuebin
    Li, Hengyi
    Fujikawa, Yoshiyuki
    Meng, Lin
    ACM JOURNAL ON COMPUTING AND CULTURAL HERITAGE, 2022, 15 (04):
  • [30] Refining dataset curation methods for deep learning-based automated tuberculosis screening
    Kim, Tae Kyung
    Yi, Paul H.
    Hager, Gregory D.
    Lin, Cheng Ting
    JOURNAL OF THORACIC DISEASE, 2020, 12 (09) : 5078 - 5085