A Reference Paper Collection System Using Web Scraping

被引：1

作者：

Naing, Inzali ^{[1
]}

Aung, Soe Thandar ^{[1
]}

Wai, Khaing Hsu ^{[1
]}

Funabiki, Nobuo ^{[1
]}

机构：

[1] Okayama Univ, Dept Informat & Commun Syst, Okayama 7008530, Japan

来源：

ELECTRONICS | 2024年 / 13卷 / 14期

关键词：

web scraping; Google Scholar; data collection; Bert; Selenium; flask framework; Angular;

D O I：

10.3390/electronics13142700

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Collecting reference papers from the Internet is one of the most important activities for progressing research and writing papers about their results. Unfortunately, the current process using Google Scholar may not be efficient, since a lot of paper files cannot be accessed directly by the user. Even if they are accessible, their effectiveness needs to be checked manually. In this paper, we propose a reference paper collection system using web scraping to automate paper collections from websites. This system can collect or monitor data from the Internet, which is considered as the environment, using Selenium, a popular web scraping software, as the sensor; this examines the similarity against the search target by comparing the keywords using the Bert model. The Bert model is a deep learning model for natural language processing (NLP) that can understand context by analyzing the relationships between words in a sentence bidirectionally. The Python Flask is adopted at the web application server, where Angular is used for data presentations. For the evaluation, we measured the performance, investigated the accuracy, and asked members of our laboratory to use the proposed method and provide their feedback. Their results confirm the method's effectiveness.

引用

页数：18

共 50 条

[41] Evaluate a Personalized Multi Agent System through Social Networks: Web scraping
Trifa, Amal
Sbai, Aroua Hedhili
Chaari, Wided Lejouad
2017 IEEE 26TH INTERNATIONAL CONFERENCE ON ENABLING TECHNOLOGIES - INFRASTRUCTURE FOR COLLABORATIVE ENTERPRISES (WETICE), 2017, : 18 - 20
[42] Usage of Web Scraping in the Pharmaceutical Sector
Dahiya R.
Nidhi
Kumari K.
Kumari S.
Agarwal N.
EAI Endorsed Transactions on Pervasive Health and Technology, 2023, 9 (01)
[43] A Web Metric Collection and Reporting System
Malhotra, Ruchika
Sharma, Anjali
PROCEEDING OF THE THIRD INTERNATIONAL SYMPOSIUM ON WOMEN IN COMPUTING AND INFORMATICS (WCI-2015), 2015, : 661 - 667
[44] Using Surveys and Web-Scraping to Select Tools for Software Testing Consultancy
Raulamo-Jurvanen, Paivi
Kakkonen, Kari
Mantyla, Mika
PRODUCT-FOCUSED SOFTWARE PROCESS IMPROVEMENT (PROFES 2016), 2016, 10027 : 285 - 300
[45] Generalized Variable Conversion Using K-means Clustering and Web Scraping
Modarresi, Kourosh
Munir, Abdurrahman
COMPUTATIONAL SCIENCE - ICCS 2018, PT II, 2018, 10861 : 247 - 258
[46] Social Media Web Scraping using Social Media Developers API and Regex
Dewi, Lusiana Citra
Meiliana
Chandra, Alvin
4TH INTERNATIONAL CONFERENCE ON COMPUTER SCIENCE AND COMPUTATIONAL INTELLIGENCE (ICCSCI 2019) : ENABLING COLLABORATION TO ESCALATE IMPACT OF RESEARCH RESULTS FOR SOCIETY, 2019, 157 : 444 - 449
[47] Research Note: Scraping Financial Data from the Web Using the R Language
Krotov, Vlad
Tennyson, Matthew
JOURNAL OF EMERGING TECHNOLOGIES IN ACCOUNTING, 2018, 15 (01) : 169 - 181
[48] Extraction of Meaningful Information from Unstructured Clinical Notes Using Web Scraping
Varshini, K. Sukanya
Uthra, R. Annie
JOURNAL OF CIRCUITS SYSTEMS AND COMPUTERS, 2023, 32 (03)
[49] Taiwan Stock Tape Reading Periodically Using Web Scraping Technology with GUI
Lin, Chun-Feng
Yang, Sheng-Chih
APPLIED SYSTEM INNOVATION, 2022, 5 (01)
[50] Teaching Tip Scaffolding in Business Analytics Education: Using Python for Web Scraping
Jeyaraj, Anand
Journal of Information Systems Education, 2024, 35 (04) : 438 - 450

← 1 2 3 4 5 →