Building a benchmark dataset for the Kurdish news question answering

被引:0
|
作者
Saeed, Ari M. [1 ]
机构
[1] Univ Halabja, Coll Sci, Comp Sci Dept, Halabja, Kurdistan Regio, Iraq
来源
DATA IN BRIEF | 2024年 / 57卷
关键词
Kurdish question answering system; Kurdish news dataset; Data mining; Text pre-processing; Machine learning;
D O I
10.1016/j.dib.2024.110916
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
07 ; 0710 ; 09 ;
摘要
This article presents the Kurdish News Question Answering Dataset (KNQAD). The texts are collected from various Kurdish news websites. The ParsHub software is used to extract data from different fields of news, such as social news, religion, sports, science, and economy. The dataset consists of 15,002 news paragraphs with question-answer pairs. For each news paragraph, one or more question-answer pairs are manually created based on the content of the paragraphs. The dataset is pre-processed by cleaning and normalizing the data. During the cleaning process, special characters and stop words are removed, and stemming is used as a normalization step. The distribution of each question type is presented in the KNQAD. Moreover, the complexity of the QA problem is analyzed in the KNQAD by using lexical similarity techniques between questions and answers. (c) 2024 The Author(s). Published by Elsevier Inc. This is an open access article under the CC BY license (http://creativecommons.org/licenses/by/4.0/ )
引用
收藏
页数:12
相关论文
共 50 条
  • [41] JAPANESE NAMED ENTITY RECOGNITION FOR QUESTION ANSWERING SYSTEM
    Liu, Ye
    Ren, Fuji
    2011 IEEE INTERNATIONAL CONFERENCE ON CLOUD COMPUTING AND INTELLIGENCE SYSTEMS, 2011, : 402 - 406
  • [42] Visualization Question Answering using Introspective Program Synthesis
    Chen, Yanju
    Yan, Xifeng
    Feng, Yu
    PROCEEDINGS OF THE 43RD ACM SIGPLAN INTERNATIONAL CONFERENCE ON PROGRAMMING LANGUAGE DESIGN AND IMPLEMENTATION (PLDI '22), 2022, : 137 - 151
  • [43] Parameterized Spatial SQL Translation for Geographic Question Answering
    Chen, Wei
    2014 IEEE INTERNATIONAL CONFERENCE ON SEMANTIC COMPUTING (ICSC), 2014, : 23 - 27
  • [44] A machine learning approach for Indonesian question answering system
    Purwarianti, Ayu
    Tsuchiya, Masatoshi
    Nakagawa, Seiichi
    PROCEEDINGS OF THE IASTED INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND APPLICATIONS, 2007, : 537 - +
  • [45] Automatic Question Answering From Large ESG Reports
    Parikh, Pulkit
    Penfield, Julia
    INTERNATIONAL JOURNAL OF DATA WAREHOUSING AND MINING, 2024, 20 (01)
  • [46] Predicting answer acceptability for question-answering system
    Roy, Pradeep Kumar
    INTERNATIONAL JOURNAL ON DIGITAL LIBRARIES, 2024, 25 (04) : 555 - 568
  • [47] Question Answering System using Machine Learning Techniques
    Dobrescu, Alexandra-Maria
    Radu, Serban
    VISION 2025: EDUCATION EXCELLENCE AND MANAGEMENT OF INNOVATIONS THROUGH SUSTAINABLE ECONOMIC COMPETITIVE ADVANTAGE, 2019, : 10226 - 10237
  • [48] A 3D INDOOR-OUTDOOR BENCHMARK DATASET FOR LoD3 BUILDING POINT CLOUD SEMANTIC SEGMENTATION
    Cao, Y.
    Scaioni, M.
    2ND GEOBENCH WORKSHOP ON EVALUATION AND BENCHMARKING OF SENSORS, SYSTEMS AND GEOSPATIAL DATA IN PHOTOGRAMMETRY AND REMOTE SENSING, VOL. 48-1, 2023, : 31 - 37
  • [49] A Sesotho news headlines dataset for sentiment analysis
    Mokhosi, Refuoe
    Shivachi, Casper-Shikali
    Sethobane, Matello
    DATA IN BRIEF, 2024, 54
  • [50] Exploring the Ideal Depth of Neural Network when Predicting Question Deletion on Community Question Answering
    Ghosh, Souvick
    Ghosh, Satanu
    PROCEEDINGS OF THE 11TH ANNUAL MEETING OF THE FORUM FOR INFORMATION RETRIEVAL EVALUATION (FIRE 2019), 2019, : 52 - 55