Incorporating Structural Information in Scientific Document Retrieval

被引:0
作者
Norouzi, Farzaneh [1 ]
Azimzadeh, Fatemeh [2 ]
机构
[1] Univ Sci & Culture, Software Engn, Tehran, Iran
[2] Sci Informat Database SID ACECR, Tehran, Iran
来源
2018 4TH INTERNATIONAL CONFERENCE ON WEB RESEARCH (ICWR) | 2018年
关键词
Metadata Extraction; Information Retrieval; Graph Data Model; Structural Data;
D O I
暂无
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
With the daily-increasing development of science, various methods have been designed to more and better retrieve the scientific documents based on the need and search of users. For some documents in the various scientific databases, no complete information exists and the users have to observe the inside of a document in order to catch up with its metadata inclusion the authors, their affiliations, the references cited and etc. Therefore, presence of a method based on extracting the information based on the available structural and geometrical properties in a document can assist the recovery of related and required documents. In addition, the available pitfall in the relational data based is the lack of direct and indirect relationships between the availabilities of each system for which a graph-oriented database can establish the relations between these availabilities. In this respect, after extracting metadata using the geometrical properties of document and using a graph-oriented model, the relations between various documents' availabilities such as authors, conferences, subjects and keywords and etc. are modeled in order to retrieve the information more effectively. The extracted data are refined and stored in the graph model and will be available for a user via a web-based user interface. To produce the results of each search, the related documents will be retrieved based on the graph relations and be weighed according to the rate of relatedness of each document and the number of references. In order to evaluate the proposed method, PubMed Database is used. The results of experiments show the proposed methods outperformed 60% in contrast to the PubMed Database search engine in terms of the retrieved documents. Furthermore, based in the F-measure, and nDCG-measure of proposed method considerably outperformed the PubMed Database search engine in terms of the quality of retrieved documents.
引用
收藏
页码:103 / 110
页数:8
相关论文
共 18 条
  • [1] Carlson A, 2010, P 24 AAAI
  • [2] Choudhury Sutanay, 2017, ICDE
  • [3] Constantin A., 2013, ACM S DOC ENG
  • [4] Cortez E., 2007, ACM IEEE JOINT C DIG
  • [5] Reference metadata extraction using a hierarchical knowledge representation framework
    Day, Min-Yuh
    Tsai, Richard Tzong-Han
    Sung, Cheng-Lung
    Hsieh, Chiu-Chen
    Lee, Cheng-Wei
    Wu, Shih-Hung
    Wu, Kun-Pin
    ong, Chorng-Shy Ong
    Hsu, Wen-Lian
    [J]. DECISION SUPPORT SYSTEMS, 2007, 43 (01) : 152 - 167
  • [6] Knowledge Vault: A Web-Scale Approach to Probabilistic Knowledge Fusion
    Dong, Xin Luna
    Gabrilovich, Evgeniy
    Heitz, Geremy
    Horn, Wilko
    Lao, Ni
    Murphy, Kevin
    Strohmann, Thomas
    Sun, Shaohua
    Zhang, Wei
    [J]. PROCEEDINGS OF THE 20TH ACM SIGKDD INTERNATIONAL CONFERENCE ON KNOWLEDGE DISCOVERY AND DATA MINING (KDD'14), 2014, : 601 - 610
  • [7] Ehrlinger L., 2016, JOINT P POST DEM TRA
  • [8] Giuffrida G., 2000, ACM 2000. Digital Libraries. Proceedings of the Fifth ACM Conference on Digital Libraries, P77, DOI 10.1145/336597.336639
  • [9] Han H., 2003, ACM IEEE 2003 JOINT
  • [10] Hetzner E., 2008, ACM IEEE JOINT C DIG