Incorporating Structural Information in Scientific Document Retrieval

被引：0

作者：

Norouzi, Farzaneh ^{[1
]}

Azimzadeh, Fatemeh ^{[2
]}

机构：

[1] Univ Sci & Culture, Software Engn, Tehran, Iran

[2] Sci Informat Database SID ACECR, Tehran, Iran

来源：

2018 4TH INTERNATIONAL CONFERENCE ON WEB RESEARCH (ICWR) | 2018年

关键词：

Metadata Extraction; Information Retrieval; Graph Data Model; Structural Data;

D O I：

暂无

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

With the daily-increasing development of science, various methods have been designed to more and better retrieve the scientific documents based on the need and search of users. For some documents in the various scientific databases, no complete information exists and the users have to observe the inside of a document in order to catch up with its metadata inclusion the authors, their affiliations, the references cited and etc. Therefore, presence of a method based on extracting the information based on the available structural and geometrical properties in a document can assist the recovery of related and required documents. In addition, the available pitfall in the relational data based is the lack of direct and indirect relationships between the availabilities of each system for which a graph-oriented database can establish the relations between these availabilities. In this respect, after extracting metadata using the geometrical properties of document and using a graph-oriented model, the relations between various documents' availabilities such as authors, conferences, subjects and keywords and etc. are modeled in order to retrieve the information more effectively. The extracted data are refined and stored in the graph model and will be available for a user via a web-based user interface. To produce the results of each search, the related documents will be retrieved based on the graph relations and be weighed according to the rate of relatedness of each document and the number of references. In order to evaluate the proposed method, PubMed Database is used. The results of experiments show the proposed methods outperformed 60% in contrast to the PubMed Database search engine in terms of the retrieved documents. Furthermore, based in the F-measure, and nDCG-measure of proposed method considerably outperformed the PubMed Database search engine in terms of the quality of retrieved documents.

引用

页码：103 / 110

页数：8

共 18 条

[1] Carlson A, 2010, P 24 AAAI
[2] Choudhury Sutanay, 2017, ICDE
[3] Constantin A., 2013, ACM S DOC ENG
[4] Cortez E., 2007, ACM IEEE JOINT C DIG
[5] Reference metadata extraction using a hierarchical knowledge representation framework
Day, Min-Yuh
Tsai, Richard Tzong-Han
Sung, Cheng-Lung
Hsieh, Chiu-Chen
Lee, Cheng-Wei
Wu, Shih-Hung
Wu, Kun-Pin
ong, Chorng-Shy Ong
Hsu, Wen-Lian
[J]. DECISION SUPPORT SYSTEMS, 2007, 43 (01) : 152 - 167
[6] Knowledge Vault: A Web-Scale Approach to Probabilistic Knowledge Fusion
Dong, Xin Luna
Gabrilovich, Evgeniy
Heitz, Geremy
Horn, Wilko
Lao, Ni
Murphy, Kevin
Strohmann, Thomas
Sun, Shaohua
Zhang, Wei
[J]. PROCEEDINGS OF THE 20TH ACM SIGKDD INTERNATIONAL CONFERENCE ON KNOWLEDGE DISCOVERY AND DATA MINING (KDD'14), 2014, : 601 - 610
[7] Ehrlinger L., 2016, JOINT P POST DEM TRA
[8] Giuffrida G., 2000, ACM 2000. Digital Libraries. Proceedings of the Fifth ACM Conference on Digital Libraries, P77, DOI 10.1145/336597.336639
[9] Han H., 2003, ACM IEEE 2003 JOINT
[10] Hetzner E., 2008, ACM IEEE JOINT C DIG

← 1 2 →