Building a benchmark dataset for the Kurdish news question answering

被引:0
|
作者
Saeed, Ari M. [1 ]
机构
[1] Univ Halabja, Coll Sci, Comp Sci Dept, Halabja, Kurdistan Regio, Iraq
来源
DATA IN BRIEF | 2024年 / 57卷
关键词
Kurdish question answering system; Kurdish news dataset; Data mining; Text pre-processing; Machine learning;
D O I
10.1016/j.dib.2024.110916
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
07 ; 0710 ; 09 ;
摘要
This article presents the Kurdish News Question Answering Dataset (KNQAD). The texts are collected from various Kurdish news websites. The ParsHub software is used to extract data from different fields of news, such as social news, religion, sports, science, and economy. The dataset consists of 15,002 news paragraphs with question-answer pairs. For each news paragraph, one or more question-answer pairs are manually created based on the content of the paragraphs. The dataset is pre-processed by cleaning and normalizing the data. During the cleaning process, special characters and stop words are removed, and stemming is used as a normalization step. The distribution of each question type is presented in the KNQAD. Moreover, the complexity of the QA problem is analyzed in the KNQAD by using lexical similarity techniques between questions and answers. (c) 2024 The Author(s). Published by Elsevier Inc. This is an open access article under the CC BY license (http://creativecommons.org/licenses/by/4.0/ )
引用
收藏
页数:12
相关论文
共 50 条
  • [21] Neural factoid geospatial question answering
    Li, Haonan
    Hamzei, Ehsan
    Majic, Ivan
    Hua, Hua
    Renz, Jochen
    Tomko, Martin
    Vasardani, Maria
    Winter, Stephan
    Baldwin, Timothy
    JOURNAL OF SPATIAL INFORMATION SCIENCE, 2021, (23): : 65 - 90
  • [22] Question Answering for Not Yet Semantic Web
    Konopik, Miloslav
    Rohlik, Ondrej
    TEXT, SPEECH AND DIALOGUE, 2010, 6231 : 125 - 132
  • [23] Is this question going to be closed? Answering question closibility on Stack Exchange
    Roy, Pradeep Kumar
    Singh, Jyoti Prakash
    Banerjee, Snehasish
    JOURNAL OF INFORMATION SCIENCE, 2024, 50 (05) : 1291 - 1307
  • [24] Mexican Experience in Spanish Question Answering
    Montes y Gomez, Manuel
    Villasenor Pineda, Luis
    Lopez, Aurelio Lopez
    COMPUTACION Y SISTEMAS, 2008, 12 (01): : 40 - 64
  • [25] Question Answering over Knowledge Bases
    Siciliani, Lucia
    SEMANTIC WEB: ESWC 2018 SATELLITE EVENTS, 2018, 11155 : 283 - 293
  • [26] A Synergistic Framework for Geographic Question Answering
    Chen, Wei
    Fosler-Lussier, Eric
    Xiao, Ningchuan
    Raje, Satyajeet
    Ramnath, Rajiv
    Sui, Daniel
    2013 IEEE SEVENTH INTERNATIONAL CONFERENCE ON SEMANTIC COMPUTING (ICSC 2013), 2013, : 94 - 99
  • [27] The Adressa Dataset for News Recommendation
    Gulla, Jon Atle
    Zhang, Lemei
    Liu, Peng
    Ozgobek, Ozlem
    Su, Xiaomeng
    2017 IEEE/WIC/ACM INTERNATIONAL CONFERENCE ON WEB INTELLIGENCE (WI 2017), 2017, : 1042 - 1048
  • [28] A Hybrid Approach for Question Classification in Persian Automatic Question Answering Systems
    Sherkat, Ehsan
    Farhoodi, Mojgan
    2014 4TH INTERNATIONAL CONFERENCE ON COMPUTER AND KNOWLEDGE ENGINEERING (ICCKE), 2014, : 279 - 284
  • [29] Incorporation of question segregation procedures in visual question-answering models
    Chowdhury, Souvik
    Soni, Badal
    Phukan, Doli
    INTERNATIONAL JOURNAL OF COMPUTING SCIENCE AND MATHEMATICS, 2024, 20 (02) : 99 - 108
  • [30] FallVision: A benchmark video dataset for fall detection
    Rahman, Nakiba Nuren
    Mahi, Abu Bakar Siddique
    Mistry, Durjoy
    Al Masud, Shah Murtaza Rashid
    Saha, Aloke Kumar
    Rahman, Rashik
    Islam, Md. Rajibul
    DATA IN BRIEF, 2025, 59