QSST: A Quranic Semantic Search Tool based on word embedding

被引:11
|
作者
Mohamed, Ensaf Hussein [1 ]
Shokry, Eyad Mohamed [1 ]
机构
[1] Helwan Univ, Fac Comp & Artificial Intelligence, Comp Sci Dept, Cairo, Egypt
关键词
Information Retrieval; Word Embedding; Concept-based Search; Ontology; Semantic Search; Arabic Natural Language Processing; Holy Quran;
D O I
10.1016/j.jksuci.2020.01.004
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Retrieving information from the Quran is an important field for Quran scholars and Arabic researchers. There are two types of Quran searching techniques: semantic or concept-based and keyword-based. Concept-based search is a challenging task, especially in a complex corpus such as Quran. This paper presents a concept-based searching tool (QSST) for the Holy Quran. It consists of four phases. In the first phase, the Quran dataset is built by manually annotating Quran verses based on the ontology of Mushaf Al-Tajweed. The second phase is word Embedding, this phase generates features' vectors for words by training a Continuous Bag of Words (CBOW) architecture on large Quranic and Classic Arabic corpus. The third phase includes calculating the features' vectors of both input query and Quranic topics. Finally, retrieving the most relevant verses by computing the cosine similarity between both topic and query vectors. The performance of the proposed QSST is measured by comparing results against Mushaf Al-Tajweed. Then, precision, recall, and F-score are computed and their percentages were 76.91%, 72.23% 69.28% respectively. In addition, the results are evaluated by three Islamic experts and the average precision was 91.95%. Finally, QSST results are compared with the recent existing tools; QSST outperformed them. (C) 2020 The Authors. Production and hosting by Elsevier B.V. on behalf of King Saud University.
引用
收藏
页码:934 / 945
页数:12
相关论文
共 50 条
  • [21] Indra: A Word Embedding and Semantic Relatedness Server
    Sales, Juliano Efson
    Souza, Leonardo
    Barzegar, Siamak
    Davis, Brian
    Freitas, Andre
    Handschuh, Siegfried
    PROCEEDINGS OF THE ELEVENTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION (LREC 2018), 2018, : 1326 - 1332
  • [22] Combining Word Embedding and Semantic Lexicon for Chinese Word Similarity Computation
    Pei, Jiahuan
    Zhang, Cong
    Huang, Degen
    Ma, Jianjun
    NATURAL LANGUAGE UNDERSTANDING AND INTELLIGENT APPLICATIONS (NLPCC 2016), 2016, 10102 : 766 - 777
  • [23] Building a semantic search tool
    Johnston, OO
    CANADIAN JOURNAL OF INFORMATION AND LIBRARY SCIENCE-REVUE CANADIENNE DES SCIENCES DE L INFORMATION ET DE BIBLIOTHECONOMIE, 2005, 29 (03): : 376 - 376
  • [24] Word Embedding Evaluation in Downstream Tasks and Semantic Analogies
    Santos, Joaquim
    Consoli, Bernardo
    Vieira, Renata
    PROCEEDINGS OF THE 12TH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION (LREC 2020), 2020, : 4828 - 4834
  • [25] Understanding the semantic change of Hangeul using word embedding
    Sun, Hyunseok
    Lee, Yung-Seop
    Lim, Changwon
    KOREAN JOURNAL OF APPLIED STATISTICS, 2021, 34 (03) : 295 - 308
  • [26] Improving word and Sense Embedding with Hierarchical Semantic Relations
    Shiue, Yow-Ting
    Ma, Wei-Yun
    2017 INTERNATIONAL CONFERENCE ON ASIAN LANGUAGE PROCESSING (IALP), 2017, : 350 - 353
  • [27] Deep Visual Semantic Embedding with Text Data Augmentation and Word Embedding Initialization
    He, Hai
    Yang, Haibo
    MATHEMATICAL PROBLEMS IN ENGINEERING, 2021, 2021
  • [28] Tag-based Video Retrieval by Embedding Semantic Content in a Continuous Word Space
    Agharwal, Arnav
    Kovvuri, Rama
    Nevatia, Ram
    Snoek, Cees G. M.
    2016 IEEE WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION (WACV 2016), 2016,
  • [29] Automated Short-Answer Grading using Semantic Similarity based on Word Embedding
    Lubis, Fetty Fitriyanti
    Mutaqin
    Putri, Atina
    Waskita, Dana
    Sulistyaningtyas, Tri
    Arman, Arry Akhmad
    Rosmansyah, Yusep
    INTERNATIONAL JOURNAL OF TECHNOLOGY, 2021, 12 (03) : 571 - 581
  • [30] Exploring Semantic Similarity Measure Based on Word Embedding Representation for Arabic Passages Retrieval
    Lahbari, Imane
    El Alaoui, Said Ouatik
    ADVANCED INTELLIGENT SYSTEMS FOR SUSTAINABLE DEVELOPMENT (AI2SD'2020), VOL 2, 2022, 1418 : 978 - 989